SlideShare a Scribd company logo
class SelfIntro自己紹介:
1
def __init__ (私):
私.名 = 'Renyuan Lyu, 呂 仁園'
私.職業 = 'University Professor, 大学の先生'
私.研究分野 = 'Speech Recognition, 音声認識'
私.職場 = 'Chang Gung Univ (CGU), 長庚大學'
私.国 = 'TAIWAN, 台灣’
私.誇り = '''
Pycon JP speaker (2015~2017, 2019 ),
カラオケさん'''
https://youtu.be/O1-9Yv9cB8Q
2
https://youtu.be/cUewj2kRrbk?t=2434
Lightning Talks at PyCon JP 2016, 2017
Real-time Pitch Detection
and Speech Recognition
in Python
via Pyaudio, Pygame & Vpython
Renyuan Lyu (呂仁園),
Chang Gung University (長庚大學),
TAIWAN (台灣)
@ Pycon JP 2019 3
The System
4
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
https://youtu.be/XF3oGwEsPac
The System
Singing
Voice
Multilingual
Lyric Transcription
(Speech Recognition)
Pitch
Detection
(Melody Recognition)
Lyrics (歌詞)
“Twinkle Twinkle Little Star”
“きらきらひかる”
“一閃一閃亮晶晶”
Pitch (musical notes, 音符)
“C C G G A A C –”
5
Data (Voice) acquisition
• Audio Signal Processing
• samplingRate= 16000 samples/sec,
• bitsPerSample= 16 bits/sample = 2 bytes/sample
• channelNumber= 3 (L, R, humming)
• Frame-wise short-time processing
Frame01
Frame02
6
Digital Signal Processing:
Spectrogram
• A spectrogram is
• a visual representation
• of the spectrum
• of frequencies
• of a signal
• as it varies with time.
• using Fast Fourier Transform
• FFT
7
https://youtu.be/bCRL5yw8fXA
A Real-time Spectrogram
http://friture.org/
8
https://youtu.be/1sbtXqZaGXE
• Friture is a program in PYTHON
designed to analyze audio input in
real-time.
• It displays audio data as a scope, a
spectrum analyzer, or with a rolling
2D spectrogram.
• I found this program in 2012~2013
and was totally convinced that I can
transfer into the PYTHON world to
continue my career.
Using Audacity
to get audio signal
9
https://youtu.be/o9DF9SVdcVo
The first step to do audio signal processing
is to get some audio signal by yourself
and play with it.
WAVE PCM
soundfile format
(.wav)
• http://soundfile.sapp.org
/doc/WaveFormat/
10
• Compared with text data,
audio data is much bigger,
and it is usually stored in
binary form.
• Being familiar with the data
format is crucial to process it.
“See” the audio signal in the raw format
11
Extract audio header information
12
Visualize the audio signal in waveform
• As long as you can visualize the
audio signal, you can make sure
you read them in a correct way,
• and then you can do further
processing via advanced signal
processing algorithms
• like Pitch Detection and Speech
Recognition.
13
Human aided pitch tracking
by Humming
• Pitch Detection for real music
signal is not easy by itself.
• To simplify the task, I use
some TRICK….
• I hum the song and record it in
another channel, while listening
the music.
• I use this “clean” humming
voice to detect the pitch.
14
Multi-Threading Programming
15
def init(self):
self.錄音線= threading.Thread(target= self.錄音線程)
self.能量線= threading.Thread(target= self.f1_能量)
self.基頻線= threading.Thread(target= self.f4_基頻)
self.語音辨認線= threading.Thread(target= self.f6_語音辨認)
def start(self):
self.錄音線.start()
self.能量線.start()
self.基頻線.start()
self.語音辨認線.start()
• For a Realtime system,
the multi-threading
programming is crucial,
• At least, an independent
thread for data
acquisition is necessary.
audio recording “Thread”
16
A circular buffer
to store the real-time
audio signal
17
I set a buffer in RAM to store 16 sec of voice,
It is of size 16*16000*2*3= 1,536,000 bytes
Pitch Detection Algorithm
18
• Zoom a speech signal into scale of .01 sec, We
can visualize there are periodic patterns.
• the duration of a periodic pattern is called
the “pitch period”.
• For the A-440 note, the pitch period =
1/440 = .0023 sec
• A traditionally popular pitch detection
algorithm is based on auto-correlation
method.
Pitch Detection Thread
19
Pitch Sampling at slower intervals
20
Pitch Quantization
21
Speech Recognition
• http://shorturl.at/rxLM4
22
23
Speech Recognition
need Large-scale of Database
to train the system.
Nowadays, Deep-learning
algorithms play the major roles
and achieve the greatest
performance.
Speech Recognition in Python
24
https://pypi.org/project/SpeechRecognition/
Google has a great Speech Recognition API.
This API converts spoken text (microphone)
into written text (Python strings)
the ASR Thread
25
Get a segment (M frames) of speech ➔ x
Transform x into an “AudioData” and then
send it to Google Speech Recognition engine
to get a recognition output “text”.
To get speech data from a circular buffer is
quite an issue for implementation. !!
26
def 語音辨認(私):
辨= sr.Recognizer()
while self.語音辨認中==True:
#
# Get x as "singingVoice" to be 音
#
音= sr.AudioData(x, 私.取樣率, 私.樣本寬)
#
# Do ASR to get recognition Result as 文
#
try:
if lang=='ja':
文= 辨.recognize_google(音, language='ja')
elif lang=='en':
文= 辨.recognize_google(音, language='en')
elif lang= 'zh-TW'
文= 辨.recognize_google(音, language='zh-TW')
else:
私.文= '{} ({})'.format(文, lang)
except:
私.文= 'exceptionOccurs!!'
pass
return
Lyric Transcription
• Melodic voice (singing) recognition
• Timed Text Generation
• Need do Speech recognition and
segmentation
• Currently, it was done by human,
not yet by machine.
27
Kara OK
• Pitch Tracking
• Timed Text Displaying
28
https://youtu.be/F1_Xz1c5AEE
Final
Demo
29
https://youtu.be/0cdo6ZnBZc8
ご清聴ありがとうございました。
Thank you for listening.
感謝聆聽。
@ PyCon Jp 2019
Renyuan Lyu
From TAIWAN
30

More Related Content

PDF
Py conjp2019 renyuanlyu_3
Renyuan Lyu
 
PDF
PySynth : A toy pure python software synthesizer.
Ransui Iso
 
PPTX
Digital speech processing lecture1
Samiul Parag
 
PPTX
Fun with MATLAB
ritece
 
PDF
Saito2103slp
Yuki Saito
 
PDF
Une18apsipa
Yuki Saito
 
DOCX
Voice morphing document
himadrigupta
 
PPT
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Py conjp2019 renyuanlyu_3
Renyuan Lyu
 
PySynth : A toy pure python software synthesizer.
Ransui Iso
 
Digital speech processing lecture1
Samiul Parag
 
Fun with MATLAB
ritece
 
Saito2103slp
Yuki Saito
 
Une18apsipa
Yuki Saito
 
Voice morphing document
himadrigupta
 
Natural Language Processing made easy
Gopi Krishnan Nambiar
 

What's hot (18)

PPTX
How speech reorganization works
Muhammad Taqi
 
PPT
Multimedia
BUDNET
 
PDF
Speech Recognition No Code
Gerome Jan M. Llames
 
PDF
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
PDF
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Edureka!
 
ODP
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
Danilo J. S. Bellini
 
PPT
MPEG 4
tvutech
 
PDF
Aichroth audio forensics and automation
FIAT/IFTA
 
PPTX
Conditional generative model for audio
Keunwoo Choi
 
PPTX
Python programming | Fundamentals of Python programming
KrishnaMildain
 
PPT
Multimedia
philipsinter
 
PDF
GAN-based statistical speech synthesis (in Japanese)
Yuki Saito
 
PDF
Basic audio programming
Iulian-Nicu Şerbănoiu
 
PDF
Turtlebot Poster_Summer 2016
Ye Sung (Rebecca) Kim
 
PDF
Speech signal processing lizy
Lizy Abraham
 
PDF
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Keunwoo Choi
 
PPTX
LPC for Speech Recognition
Dr. Uday Saikia
 
How speech reorganization works
Muhammad Taqi
 
Multimedia
BUDNET
 
Speech Recognition No Code
Gerome Jan M. Llames
 
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Edureka!
 
(2014-05-24) [Taubaté Perl Mongers] AudioLazy Python DSP (Digital Signal Proc...
Danilo J. S. Bellini
 
MPEG 4
tvutech
 
Aichroth audio forensics and automation
FIAT/IFTA
 
Conditional generative model for audio
Keunwoo Choi
 
Python programming | Fundamentals of Python programming
KrishnaMildain
 
Multimedia
philipsinter
 
GAN-based statistical speech synthesis (in Japanese)
Yuki Saito
 
Basic audio programming
Iulian-Nicu Şerbănoiu
 
Turtlebot Poster_Summer 2016
Ye Sung (Rebecca) Kim
 
Speech signal processing lizy
Lizy Abraham
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Keunwoo Choi
 
LPC for Speech Recognition
Dr. Uday Saikia
 
Ad

Similar to Py conjp2019 renyuanlyu_3 (20)

PDF
Ry pyconjp2015 karaoke
Renyuan Lyu
 
PDF
Pycon apac 2014
Renyuan Lyu
 
PDF
Desktop assistant
PRASUNCHAKRABORTY21
 
PPTX
Audio Signal Processing Basics, mirtoolbox contains many useful audio process...
nisharobinrohit
 
PPTX
Automatic subtitle generation
tanyasaxena1611
 
DOCX
Sound recording glossary improved vershion 2
ThomasDowson123
 
DOCX
Ig2 task 1 re edit version
cameronbailey1996
 
DOCX
Ig2 task 1 work sheet
Adambailey-eccles
 
PPTX
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
PPT
PythonCourse_01_Intro.ppt Python introduction turorial for beginner.
sakchaisengsui
 
PPT
Python introduction turorial for beginner.
sakchaisengsui
 
PPT
Pod Series Audio10
Dan Cabrera
 
PPT
Django Python(2)
tomcoh
 
PDF
IV_WORKSHOP_NVIDIA-Audio_Processing
diegogee
 
DOCX
Sound recording glossary
Jakeyhyatt123
 
PPT
Pod Series Audio14
Dan Cabrera
 
PDF
Speech Recognition Using Python | Edureka
Edureka!
 
KEY
Podcasting
Craig Lawson
 
PPTX
Speech Dubbing Software
PushkarKumar8856
 
DOCX
Ig2 task 1 work sheet
hajohnson90
 
Ry pyconjp2015 karaoke
Renyuan Lyu
 
Pycon apac 2014
Renyuan Lyu
 
Desktop assistant
PRASUNCHAKRABORTY21
 
Audio Signal Processing Basics, mirtoolbox contains many useful audio process...
nisharobinrohit
 
Automatic subtitle generation
tanyasaxena1611
 
Sound recording glossary improved vershion 2
ThomasDowson123
 
Ig2 task 1 re edit version
cameronbailey1996
 
Ig2 task 1 work sheet
Adambailey-eccles
 
Nerd sniping myself into a rabbit hole... Streaming online audio to a Sonos s...
Maarten Balliauw
 
PythonCourse_01_Intro.ppt Python introduction turorial for beginner.
sakchaisengsui
 
Python introduction turorial for beginner.
sakchaisengsui
 
Pod Series Audio10
Dan Cabrera
 
Django Python(2)
tomcoh
 
IV_WORKSHOP_NVIDIA-Audio_Processing
diegogee
 
Sound recording glossary
Jakeyhyatt123
 
Pod Series Audio14
Dan Cabrera
 
Speech Recognition Using Python | Edureka
Edureka!
 
Podcasting
Craig Lawson
 
Speech Dubbing Software
PushkarKumar8856
 
Ig2 task 1 work sheet
hajohnson90
 
Ad

More from Renyuan Lyu (8)

PPTX
Py conjp2019 renyuanlyu_3
Renyuan Lyu
 
PDF
Lightning talk01 docx
Renyuan Lyu
 
PDF
Lightning talk01
Renyuan Lyu
 
PPTX
Pycon JP 2016 ---- Pitch Detection
Renyuan Lyu
 
PPTX
pycon jp 2016 ---- CguTranslate
Renyuan Lyu
 
PDF
pyconjp2015_talk_Translation of Python Program__
Renyuan Lyu
 
PDF
Ry pyconjp2015 turtle
Renyuan Lyu
 
PDF
教青少年寫程式
Renyuan Lyu
 
Py conjp2019 renyuanlyu_3
Renyuan Lyu
 
Lightning talk01 docx
Renyuan Lyu
 
Lightning talk01
Renyuan Lyu
 
Pycon JP 2016 ---- Pitch Detection
Renyuan Lyu
 
pycon jp 2016 ---- CguTranslate
Renyuan Lyu
 
pyconjp2015_talk_Translation of Python Program__
Renyuan Lyu
 
Ry pyconjp2015 turtle
Renyuan Lyu
 
教青少年寫程式
Renyuan Lyu
 

Recently uploaded (20)

PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
CDH. pptx
AneetaSharma15
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 

Py conjp2019 renyuanlyu_3