Monaural score-informed source
separation for classical music using
convolutional neural networks
Universitat Pompeu Fabra, Barcelona, Music Technology Group
Marius Miron, Jordi Janer, Emilia Gomez
About me
Marius Miron
mariusmiron.com
github.com/nkundiushuti
PhD candidate, UPF, MTG (graduating February 2018)
signal processing, deep learning, audio source separation
2
Classical music source separation
3
Classical music source separation
4
Classical music source separation
5
Classical music source separation
6
Classical music source separation
7
(J,T,F)
=
with
(J,T,F) (J,T,F)
(T,F)
(T,F)
Neural
network
Deep learning source separation
8
0.5
0.1
0.2
0.2
Soft-mask M example
element at (1,50,:):
Context
9
Data generation method
10
Tempo
Dynamics
Timbre
Local timing
Audio
Isolated
Instruments
Miron et al,(2017). Generating data to train neural networks for classical music source separation
SMC 2017
Data generation method
11
Miron et al,(2017). Generating data to train neural networks for classical music source separation
SMC 2017
Method
12
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training
Method
13
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training
Method
14
audio
synthesis
original
scores
score-based
soft-masks
magnitude
spectrum
score-
filtered
spectrum
data
processing
CNN
training
Training
Method
15
Score-based binary matrices
Method
16
Score-based
soft masks
Method
17
STFT magnitude spectrogram
Method
18
Score-filtered
spectrum
Convolutional neural networks
19
conv1
f(1,30)
s(1,4)
conv2
f(20,1)
s(1,1)
dense1
256
inverse
conv2
inverse
conv1
(J,T,F)
(J,T,F)
(30,T,F)1
(30,T,F)11(30,T,F)1
(30,T,F)1 1
dense2
30xTxF1 1
=
with
(J,T,F) (J,T,F)
(T,F)
Chandna et al,(2017). Monoaural audio source separation using deep convolutional neural networks.
LVA/ICA,258-266.
Dataset
Bach10 Dataset
10 Bach chorales, 20-40 seconds each
Perfectly and automatically aligned scores
20
Duan, Z. and Pardo B.,(2011), Bach10 dataset
Experiments
CNN autoencoder NMF
21
Multi-source filter model

Score informed 

Trained on RWC
vs
Miron et al,(2015), Improving score-informed source separation for classical music through note refinement, ISMIR
Score informed

Trained on renditions

synthesised with RWC
Experiments
PA
22
Automatically aligned score

Tolerance window around

- Onsets

- Offsets
vsPerfectly aligned score
Duan, Z. and Pardo B.,(2011), Bach10 dataset
Results
Test dataset: Bach10
23
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Results
Test dataset: Bach10
24
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Results
Test dataset: Bach10
25
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Results
Test dataset: Bach10
26
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Results
Test dataset: Bach10
27
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Results
How much data we need?
28
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
bootstrapping
standard
Results
How much data we need?
29
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
bootstrapping
standard
Demo
30
https://youtu.be/c0xJIJrp5w8
Questions?
Code:
https://github.com/MTG/DeepConvSep
Data: .wav and .mat
31
DOIDOI 10.5281/zenodo.100913610.5281/zenodo.1009136
Method
32
trained
model
audio
rendition
aligned
score
magnitude
spectrum
score-based
soft-masks
score-
filtered
spectrum
separated
sources
phase
spectrum
Separation
Music source separation
33
SISEC MUS 2016
DUR GRA3 GRA2 GRA1 HUA IBM JEO1 JEO2 KON1 KAM1 KAM2 MRN NUG1 NUG2 NUG3 NUG4 OZE RAF1 RAF2 RAF3 STO1 STO2 UHL1 UHL2 UHL3
20
15
10
5
0
5
10
15
20
score
metric = SDR | target_name = vocals
Music source separation
Deep learning methods
34
DUR GRA3 GRA2 GRA1 HUA IBM JEO1 JEO2 KON1 KAM1 KAM2 MRN NUG1 NUG2 NUG3 NUG4 OZE RAF1 RAF2 RAF3 STO1 STO2 UHL1 UHL2 UHL3
20
15
10
5
0
5
10
15
20
score
metric = SDR | target_name = vocals
Deep learning source separation
conv1
f(1,30)
s(1,4)
conv2
f(20,1)
s(1,1)
dense1
256
inverse
conv2
inverse
conv1
(1,T,F) (J,T,F)
(30,T,F)1
Jx(30,T,F)11Jx(30,T,F)1
(30,T,F)1 1
dense2
Jx30xTxF1 1
=
with
(J,T,F) (J,T,F)
(T,F)
35
Chandna et al,(2017). Monoaural audio source separation using deep convolutional neural networks.
LVA/ICA,258-266.
Method
36
Miron, M. and Slizovskaia, O.,(2017), Convolutional neural networks for audio processing: starting pack, PyData2017
https://github.com/nkundiushuti/pydata2017bcn/blob/master/DataPreprocessing_results.ipynb
Synthesis 2
37
Bach10 RWC
Concatenative synthesis
- Tempo: 80, 100, 120;
- Dynamics: piano, mezzo, forte;
- Timbre: 3 musicians, various styles;
- Local timing: 0, 0.1, 0.2 s.
Goto et al,(2002), RWC Music database, ISMIR

More Related Content

PDF
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
PDF
Why machine learning may lead to unfairness
PDF
PhD Thesis Marius Miron - Source Separation Methods for Orchestral Music
PDF
Presentation mml
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
PDF
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Why machine learning may lead to unfairness
PhD Thesis Marius Miron - Source Separation Methods for Orchestral Music
Presentation mml
2024 Trend Updates: What Really Works In SEO & Content Marketing
Storytelling For The Web: Integrate Storytelling in your Design Process
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...

Recently uploaded (20)

PPTX
Internet of Everything -Basic concepts details
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Configure Apache Mutual Authentication
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
STKI Israel Market Study 2025 version august
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Five Habits of High-Impact Board Members
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
Internet of Everything -Basic concepts details
The influence of sentiment analysis in enhancing early warning system model f...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Flame analysis and combustion estimation using large language and vision assi...
CloudStack 4.21: First Look Webinar slides
OpenACC and Open Hackathons Monthly Highlights July 2025
Comparative analysis of machine learning models for fake news detection in so...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Configure Apache Mutual Authentication
A review of recent deep learning applications in wood surface defect identifi...
TEXTILE technology diploma scope and career opportunities
A contest of sentiment analysis: k-nearest neighbor versus neural network
A proposed approach for plagiarism detection in Myanmar Unicode text
STKI Israel Market Study 2025 version august
Getting started with AI Agents and Multi-Agent Systems
Five Habits of High-Impact Board Members
Developing a website for English-speaking practice to English as a foreign la...
NewMind AI Weekly Chronicles – August ’25 Week III
UiPath Agentic Automation session 1: RPA to Agents
Improvisation in detection of pomegranate leaf disease using transfer learni...
Ad
Ad

Presentation ismir