Presentation ismir

Monaural score-informed source
separation for classical music using
convolutional neural networks
Universitat Pompeu Fabra, Barcelona, Music Technology Group
Marius Miron, Jordi Janer, Emilia Gomez

About me
Marius Miron
mariusmiron.com
github.com/nkundiushuti
PhD candidate, UPF, MTG (graduating February 2018)
signal processing, deep learning, audio source separation
2

Classical music source separation
3

4

5

6

7

(J,T,F)
=
with
(J,T,F) (J,T,F)
(T,F)
(T,F)
Neural
network
Deep learning source separation
8
0.5
0.1
0.2
0.2
Soft-mask M example
element at (1,50,:):

Data generation method
10
Tempo
Dynamics
Timbre
Local timing
Audio
Isolated
Instruments
Miron et al,(2017). Generating data to train neural networks for classical music source separation
SMC 2017

Data generation method
11
Miron et al,(2017). Generating data to train neural networks for classical music source separation
SMC 2017

Method
12
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training

Method
13
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training

Method
14
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training

Method
15
Score-based binary matrices

Method
16
Score-based
soft masks

Method
17
STFT magnitude spectrogram

Method
18
Score-ﬁltered
spectrum

Convolutional neural networks
19
conv1
f(1,30)
s(1,4)
conv2
f(20,1)
s(1,1)
dense1
256
inverse
conv2
inverse
conv1
(J,T,F)
(J,T,F)
(30,T,F)1
(30,T,F)11(30,T,F)1
(30,T,F)1 1
dense2
30xTxF1 1
=
with
(J,T,F) (J,T,F)
(T,F)
Chandna et al,(2017). Monoaural audio source separation using deep convolutional neural networks.
LVA/ICA,258-266.

Dataset
Bach10 Dataset
10 Bach chorales, 20-40 seconds each
Perfectly and automatically aligned scores
20
Duan, Z. and Pardo B.,(2011), Bach10 dataset

Experiments
CNN autoencoder NMF
21
Multi-source ﬁlter model

Score informed

Trained on RWC
vs
Miron et al,(2015), Improving score-informed source separation for classical music through note reﬁnement, ISMIR
Score informed

Trained on renditions

synthesised with RWC

Experiments
PA
22
Automatically aligned score

Tolerance window around

- Onsets

- Oﬀsets
vsPerfectly aligned score
Duan, Z. and Pardo B.,(2011), Bach10 dataset

Results
Test dataset: Bach10
23
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136

Results
24

Results
25

Results
26

Results
27

Results
How much data we need?
28
bootstrapping
standard

Results
How much data we need?
29
bootstrapping
standard

Demo
30
https://youtu.be/c0xJIJrp5w8

Questions?
Code:
https://github.com/MTG/DeepConvSep
Data: .wav and .mat
31

Method
32
trained
model
audio
rendition
aligned
score
magnitude
spectrum
score-based
soft-masks
score-
filtered
spectrum
separated
sources
phase
spectrum
Separation

Music source separation
33
SISEC MUS 2016
DUR GRA3 GRA2 GRA1 HUA IBM JEO1 JEO2 KON1 KAM1 KAM2 MRN NUG1 NUG2 NUG3 NUG4 OZE RAF1 RAF2 RAF3 STO1 STO2 UHL1 UHL2 UHL3
20
15
10
5
0
5
10
15
20
score
metric = SDR | target_name = vocals

Music source separation
Deep learning methods
34
DUR GRA3 GRA2 GRA1 HUA IBM JEO1 JEO2 KON1 KAM1 KAM2 MRN NUG1 NUG2 NUG3 NUG4 OZE RAF1 RAF2 RAF3 STO1 STO2 UHL1 UHL2 UHL3
20
15
10
5
0
5
10
15
20
score
metric = SDR | target_name = vocals

Deep learning source separation
conv1
f(1,30)
s(1,4)
conv2
f(20,1)
s(1,1)
dense1
256
inverse
conv2
inverse
conv1
(1,T,F) (J,T,F)
(30,T,F)1
Jx(30,T,F)11Jx(30,T,F)1
(30,T,F)1 1
dense2
Jx30xTxF1 1
=
with
(J,T,F) (J,T,F)
(T,F)
35
Chandna et al,(2017). Monoaural audio source separation using deep convolutional neural networks.
LVA/ICA,258-266.

Method
36
Miron, M. and Slizovskaia, O.,(2017), Convolutional neural networks for audio processing: starting pack, PyData2017
https://github.com/nkundiushuti/pydata2017bcn/blob/master/DataPreprocessing_results.ipynb

Synthesis 2
37
Bach10 RWC
Concatenative synthesis
- Tempo: 80, 100, 120;
- Dynamics: piano, mezzo, forte;
- Timbre: 3 musicians, various styles;
- Local timing: 0, 0.1, 0.2 s.
Goto et al,(2002), RWC Music database, ISMIR

Presentation ismir

More Related Content

Recently uploaded (20)

Featured (20)

Presentation ismir