SlideShare a Scribd company logo
Build Your Own VR Display
Spatial Sound
Nitish Padmanaban
Stanford University
stanford.edu/class/ee267/
Overview
• What is sound?
• The human auditory system
• Stereophonic sound
• Spatial audio of point sound sources
• Recorded spatial audio
Zhong and Xie, “Head-Related Transfer Functions
and Virtual Auditory Display”
What is Sound?
• “Sound” is a pressure wave propagating in a medium
• Speed of sound is where c is velocity, is density of
medium and K is elastic bulk modulus
• In air, speed of sound is 340 m/s
• In water, speed of sound is 1,483 m/s
c = K
r r
Producing Sound
• Sound is longitudinal vibration
of air particles
• Speakers create wavefronts by
physically compressing the air,
much like one could a slinky
The Human Auditory System
pinna
Wikipedia
The Human Auditory System
pinna
cochlea
Wikipedia
• Hair receptor cells pick up
vibrations
The Human Auditory System
• Human hearing range:
~20–20,000 Hz
• Variation between
individuals
• Degrades with age
D. W. Robinson and R. S. Dadson, 1957
Hearing Threshold in Quiet
Stereophonic Sound
• Mainly captures differences between the ears:
• Inter-aural time difference
• Amplitude differences from path length
and scatter
Wikipedia
time
L
R
t + Dt
t
L R
Hello,
SIGGRAPH!
Stereo Panning
• Only uses the amplitude differences
• Relatively common in stereo audio tracks
• Works with any source of audio
Line of
sound
0
0.5
1
L R
0
0.5
1
L R
0
0.5
1
L R
Stereophonic Sound Recording
• Use two microphones
• A-B techniques captures
differences in time-of-arrival
• Other configurations work too,
capture differences in amplitude
Rode
Olympus
Wikipedia
X-Y technique
Stereophonic Sound Synthesis
L R
R
time
amplitude
L
time
amplitude• Ideal case: scaled & shifted Dirac peaks
• Shortcoming: many positions are identical
Input
time
amplitude
Input
Stereophonic Sound Synthesis
• In practice: the path lengths and scattering are more
complicated, includes scattering in the ear, shoulders etc.
R
time
amplitude
L
time
amplitude
R
time
amplitude
L
time
amplitude
Head-Related Impulse Response (HRIR)
• Captures temporal responses at all possible sound directions,
parameterized by azimuth and elevation
• Could also have a distance parameter
• Can be measured with two microphones in ears of mannequin &
speakers all around
Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display”
q
q f
L R
Head-Related Impulse Response (HRIR)
• CIPIC HRTF database: http://interface.cipic.ucdavis.edu/sound/hrtf.html
• Elevation: –45° to 230.625°, azimuth: –80° to 80°
• Need to interpolate between discretely sampled directions
V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF Database,” 2001
Head-Related Impulse Response (HRIR)
• Storing the HRIR
• Need one timeseries for each location
• Total of samples, where is the number of
samples for azimuth, elevation, and time, respectively
hrirL t;q,f( )
hrirR t;q,f( )
2×Nq ×Nf ×Nt Nq,f,t
Head-Related Impulse Response (HRIR)
Applying the HRIR:
• Given a mono sound source and its 3D position
L R
s t( )
s t( )
Head-Related Impulse Response (HRIR)
Applying the HRIR:
• Given a mono sound source and its 3D position
1. Compute relative to center of listener’s head
L R
q,f( )
s t( )
s t( )
q,f( )
Head-Related Impulse Response (HRIR)
Applying the HRIR:
• Given a mono sound source and its 3D position
1. Compute relative to center of listener’s head
2. Look up interpolated HRIR for left and right ear at these
angles
time
time
amplitude
hrirL t;q,f( )
hrirR t;q,f( )
amplitude
q,f( )
s t( )
Head-Related Impulse Response (HRIR)
Applying the HRIR:
• Given a mono sound source and its 3D position
1. Compute relative to center of listener’s head
2. Look up interpolated HRIR for left and right ear at these
angles
3. Convolve signal with HRIRs to get the sound
at each ear
time
time
amplitudeamplitude
sL t( )= hrirL t;q,f( )*s t( )
sR t( )= hrirR t;q,f( )*s t( )
hrirL t;q,f( )
hrirR t;q,f( )
q,f( )
s t( )
Head-Related Transfer Function (HRTF)
frequency
amplitude
hrtfL wt;q,f( )
hrtfR wt;q,f( )
amplitude
sL t( )= hrirL t;q,f( )*s t( )
sR t( )= hrirR t;q,f( )*s t( )
• HRTF is Fourier transform of HRIR! (you’ll find the term HRTF
more often that HRIR)
sL t( ) = F-1
hrtfL wt;q,f( )×F s t( ){ }{ }
sR t( ) = F-1
hrtfR wt;q,f( )× F s t( ){ }{ }
time
time
amplitude
hrirL t;q,f( )
hrirR t;q,f( )
frequency
• HRTF is Fourier transform of HRIR! (you’ll find the term HRTF
more often that HRIR)
• HRTF is complex-conjugate
symmetric (since the HRIR must
be real-valued)
Head-Related Transfer Function (HRTF)
frequency
amplitudeamplitude
frequency
hrtfL wt;q,f( )
hrtfR wt;q,f( )
sL t( )= hrirL t;q,f( )*s t( )
sR t( )= hrirR t;q,f( )*s t( )
sL t( ) = F-1
hrtfL wt;q,f( )×F s t( ){ }{ }
sR t( ) = F-1
hrtfR wt;q,f( )× F s t( ){ }{ }
Spatial Sound of N Point Sound Sources
L R
s2 t( )
• Superposition principle holds, so just sum the contributions of
each s1 t( )
q2
,f2
( )
q1
,f1
( )
sL t( ) = F-1
hrtfL wt;qi
,fi
( )× F si t( ){ }{ }
i=1
N
å
sR t( ) = F-1
hrtfR wt;qi
,fi
( )× F si t( ){ }{ }
i=1
N
å
Spatial Audio for VR
• VR/AR requires us to re-think audio, especially spatial audio!
• User’s head rotates freely  traditional surround sound
systems like 5.1 or even 9.2 surround isn’t sufficient
Spatial Audio for VR
Two primary approaches:
1. Real-time sound engine
• Render 3D sound sources via HRTF in real time, just
as discussed in the previous slides
• Used for games and synthetic virtual environments
• A lot of libraries available: FMOD, OpenAL, etc.
Spatial Audio for VR
Two primary approaches:
2. Spatial sound recorded from real environments
• Most widely used format now: Ambisonics
• Simple microphones exist
• Relatively easy mathematical model
• Only need 4 channels for starters
• Used in YouTube VR and many other platforms
Ambisonics
• Idea: represent sound incident at a point (i.e. the listener)
with some directional information
• Using all angles is impractical – need too many sound
channels (one for each direction)
• Some lower-order (in direction) components may be
sufficient  directional basis representation to the rescue!
q,f
Ambisonics – Spherical Harmonics
• Use spherical harmonics!
• Orthogonal basis functions on the surface of a sphere, i.e.
full-sphere surround sound
• Think Fourier transform equivalent on a sphere
Ambisonics – Spherical Harmonics
0th order
1st order
2nd order
3rd order
Wikipedia
Remember, these
representing functions on
a sphere’s surface
Ambisonics – Spherical Harmonics
1st order approximation
 4 channels: W, X, Y, Z
W
X Y Z
Wikipedia
Ambisonics – Recording
• Can record 4-channel Ambisonics via special microphone
• Same format supported by YouTube VR and other
platforms
http://www.oktava-shop.com/
Ambisonics – Rendered Sources
W = S ×
1
2
X = S ×cosq cosf
Y = S ×sinq cosf
Z = S ×sinf
• Can easily convert a point sound source, S, to the 4-
channel Ambisonics representation
• Given azimuth and elevation , compute W, X, Y, Z asq,f
omnidirectional component (angle-independent)
“stereo in x”
“stereo in y”
“stereo in z”
Ambisonics – Playing it Back
LF = 2W + X +Y( ) 8
LB = 2W - X +Y( ) 8
RF = 2W + X -Y( ) 8
RB = 2W - X -Y( ) 8
• Easiest way to render Ambisonics: convert W, X, Y, Z
channels into 4 virtual speaker positions
• For a regularly-spaced square setup, this results in
LF
LB
RF
R
L R
Ambisonics – Omnitone
• Javascript-based first-order Ambisonic decoder
Google, https://github.com/GoogleChrome/omnitone
References and Further Reading
• Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio
HRTF:
• Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on
Applications of Signal Processing to Audio and Electroacoustics
• download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html
Resources by Google:
• https://github.com/GoogleChrome/omnitone
• https://developers.google.com/vr/concepts/spatial-audio
• https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html
• http://googlechrome.github.io/omnitone/#home
• https://github.com/google/spatial-media/
References and Further Reading
• Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio
HRTF:
• Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on
Applications of Signal Processing to Audio and Electroacoustics
• download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html
Resources by Google:
• https://github.com/GoogleChrome/omnitone
• https://developers.google.com/vr/concepts/spatial-audio
• https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html
• http://googlechrome.github.io/omnitone/#home
• https://github.com/google/spatial-media/
Demo

More Related Content

What's hot (18)

PDF
4 ECHOES - General Presentation
IXSEA-DELPH
 
PPTX
Physics 101 learning object 5
Selinaaalo
 
PPTX
Echosounder
Joel Joy
 
PDF
Sonar Principles Asw Analysis
Jim Jenkins
 
PPT
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Bruce Wiggins
 
PPTX
sonar
jhcid
 
PDF
Report on sonar
Komal Verma
 
PDF
Phase noise
Surendra Bachina
 
PPTX
Radar and sonar subbu
subrahmanyam Subbu
 
PPTX
Sonar technology ppt
Richa Tripathi
 
PPTX
SONAR
kamal6902
 
PPT
Pres Wseas Amta Bucharest08
enricoreatti
 
PPTX
P. sai srikar ppt on sonar applications
9948879993
 
PPTX
Sonar
KhushiSingh110
 
PPTX
Sonar application (DSP)
Shahrin Ahammad
 
PPT
Sonar
PPThelperkid
 
PPTX
Sonar system
Arvin Moeini
 
4 ECHOES - General Presentation
IXSEA-DELPH
 
Physics 101 learning object 5
Selinaaalo
 
Echosounder
Joel Joy
 
Sonar Principles Asw Analysis
Jim Jenkins
 
Distance Coding And Performance Of The Mark 5 And St350 Soundfield Microphone...
Bruce Wiggins
 
sonar
jhcid
 
Report on sonar
Komal Verma
 
Phase noise
Surendra Bachina
 
Radar and sonar subbu
subrahmanyam Subbu
 
Sonar technology ppt
Richa Tripathi
 
SONAR
kamal6902
 
Pres Wseas Amta Bucharest08
enricoreatti
 
P. sai srikar ppt on sonar applications
9948879993
 
Sonar application (DSP)
Shahrin Ahammad
 
Sonar system
Arvin Moeini
 

Similar to Build Your Own VR Display Course - SIGGRAPH 2017: Part 4 (20)

PDF
COMP 4010 Lecture5 VR Audio and Tracking
Mark Billinghurst
 
PPTX
3 D Sound
adityas87
 
PPTX
Spatial Audio
Atrija Singh
 
PPTX
Spatial audio(19,24)
Arnab Banerjee
 
PDF
A Customizable Model of Head-Related Transfer Functions Based on Pinna Measur...
Waqas Tariq
 
PDF
Thesis Defense Presentation
brinehold
 
PPTX
Poster02
Sushrut Khirwadkar
 
PDF
Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris ...
Chris Pike
 
PDF
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Cemal Ardil
 
PDF
COMP 4010 - Lecture 2: VR Technology
Mark Billinghurst
 
PDF
OwnSurround HRTF Service for Professionals
Tomi Huttunen
 
PPTX
CPAD21- digitaI audio design-basics.pptx
arjayboyjemina
 
PDF
Adaptive Music and Interactive Audio
Adam Sporka
 
PPTX
Acoustics and basic audiometry
bethfernandezaud
 
KEY
Spatial Sound 3: Audio Rendering and Ambisonics
Richard Elen
 
PDF
Head-Mounted Display Visualizations to Support Sound Awareness for the Deaf a...
Dhruv Jain
 
PDF
가상현실을 위한 오디오 기술
Keunwoo Choi
 
PDF
IVE 2024 Short Course - Lecture 2 - Fundamentals of Perception
Mark Billinghurst
 
PPTX
Audio Essentials
FilmTVsound
 
PDF
Sound management by lt ns wickramasinghe b sc e & e
Kotelawala Defence University
 
COMP 4010 Lecture5 VR Audio and Tracking
Mark Billinghurst
 
3 D Sound
adityas87
 
Spatial Audio
Atrija Singh
 
Spatial audio(19,24)
Arnab Banerjee
 
A Customizable Model of Head-Related Transfer Functions Based on Pinna Measur...
Waqas Tariq
 
Thesis Defense Presentation
brinehold
 
Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris ...
Chris Pike
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Cemal Ardil
 
COMP 4010 - Lecture 2: VR Technology
Mark Billinghurst
 
OwnSurround HRTF Service for Professionals
Tomi Huttunen
 
CPAD21- digitaI audio design-basics.pptx
arjayboyjemina
 
Adaptive Music and Interactive Audio
Adam Sporka
 
Acoustics and basic audiometry
bethfernandezaud
 
Spatial Sound 3: Audio Rendering and Ambisonics
Richard Elen
 
Head-Mounted Display Visualizations to Support Sound Awareness for the Deaf a...
Dhruv Jain
 
가상현실을 위한 오디오 기술
Keunwoo Choi
 
IVE 2024 Short Course - Lecture 2 - Fundamentals of Perception
Mark Billinghurst
 
Audio Essentials
FilmTVsound
 
Sound management by lt ns wickramasinghe b sc e & e
Kotelawala Defence University
 
Ad

More from StanfordComputationalImaging (18)

PDF
Gaze-Contingent Ocular Parallax Rendering for Virtual Reality
StanfordComputationalImaging
 
PPTX
Autofocals: Evaluating Gaze-Contingent Eyeglasses for Presbyopes - Siggraph 2019
StanfordComputationalImaging
 
PPTX
Non-line-of-sight Imaging with Partial Occluders and Surface Normals | TOG 2019
StanfordComputationalImaging
 
PPTX
End-to-end Optimization of Cameras and Image Processing - SIGGRAPH 2018
StanfordComputationalImaging
 
PPTX
Computational Near-eye Displays with Focus Cues - SID 2017 Seminar
StanfordComputationalImaging
 
PPTX
Accommodation-invariant Computational Near-eye Displays - SIGGRAPH 2017
StanfordComputationalImaging
 
PPTX
Build Your Own VR Display Course - SIGGRAPH 2017: Part 5
StanfordComputationalImaging
 
PPTX
Build Your Own VR Display Course - SIGGRAPH 2017: Part 3
StanfordComputationalImaging
 
PPTX
Build Your Own VR Display Course - SIGGRAPH 2017: Part 2
StanfordComputationalImaging
 
PPTX
Build Your Own VR Display Course - SIGGRAPH 2017: Part 1
StanfordComputationalImaging
 
PPTX
VR2.0: Making Virtual Reality Better Than Reality?
StanfordComputationalImaging
 
PPTX
Multi-camera Time-of-Flight Systems | SIGGRAPH 2016
StanfordComputationalImaging
 
PPTX
ProxImaL | SIGGRAPH 2016
StanfordComputationalImaging
 
PPTX
Light Field, Focus-tunable, and Monovision Near-eye Displays | SID 2016
StanfordComputationalImaging
 
PPTX
Adaptive Spectral Projection
StanfordComputationalImaging
 
PPTX
The Light Field Stereoscope | SIGGRAPH 2015
StanfordComputationalImaging
 
PPTX
Compressive Light Field Projection @ SIGGRAPH 2014
StanfordComputationalImaging
 
PPTX
Vision-correcting Displays @ SIGGRAPH 2014
StanfordComputationalImaging
 
Gaze-Contingent Ocular Parallax Rendering for Virtual Reality
StanfordComputationalImaging
 
Autofocals: Evaluating Gaze-Contingent Eyeglasses for Presbyopes - Siggraph 2019
StanfordComputationalImaging
 
Non-line-of-sight Imaging with Partial Occluders and Surface Normals | TOG 2019
StanfordComputationalImaging
 
End-to-end Optimization of Cameras and Image Processing - SIGGRAPH 2018
StanfordComputationalImaging
 
Computational Near-eye Displays with Focus Cues - SID 2017 Seminar
StanfordComputationalImaging
 
Accommodation-invariant Computational Near-eye Displays - SIGGRAPH 2017
StanfordComputationalImaging
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 5
StanfordComputationalImaging
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 3
StanfordComputationalImaging
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 2
StanfordComputationalImaging
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 1
StanfordComputationalImaging
 
VR2.0: Making Virtual Reality Better Than Reality?
StanfordComputationalImaging
 
Multi-camera Time-of-Flight Systems | SIGGRAPH 2016
StanfordComputationalImaging
 
ProxImaL | SIGGRAPH 2016
StanfordComputationalImaging
 
Light Field, Focus-tunable, and Monovision Near-eye Displays | SID 2016
StanfordComputationalImaging
 
Adaptive Spectral Projection
StanfordComputationalImaging
 
The Light Field Stereoscope | SIGGRAPH 2015
StanfordComputationalImaging
 
Compressive Light Field Projection @ SIGGRAPH 2014
StanfordComputationalImaging
 
Vision-correcting Displays @ SIGGRAPH 2014
StanfordComputationalImaging
 
Ad

Recently uploaded (20)

PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Day2 B2 Best.pptx
helenjenefa1
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
MRRS Strength and Durability of Concrete
CivilMythili
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 

Build Your Own VR Display Course - SIGGRAPH 2017: Part 4

  • 1. Build Your Own VR Display Spatial Sound Nitish Padmanaban Stanford University stanford.edu/class/ee267/
  • 2. Overview • What is sound? • The human auditory system • Stereophonic sound • Spatial audio of point sound sources • Recorded spatial audio Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display”
  • 3. What is Sound? • “Sound” is a pressure wave propagating in a medium • Speed of sound is where c is velocity, is density of medium and K is elastic bulk modulus • In air, speed of sound is 340 m/s • In water, speed of sound is 1,483 m/s c = K r r
  • 4. Producing Sound • Sound is longitudinal vibration of air particles • Speakers create wavefronts by physically compressing the air, much like one could a slinky
  • 5. The Human Auditory System pinna Wikipedia
  • 6. The Human Auditory System pinna cochlea Wikipedia • Hair receptor cells pick up vibrations
  • 7. The Human Auditory System • Human hearing range: ~20–20,000 Hz • Variation between individuals • Degrades with age D. W. Robinson and R. S. Dadson, 1957 Hearing Threshold in Quiet
  • 8. Stereophonic Sound • Mainly captures differences between the ears: • Inter-aural time difference • Amplitude differences from path length and scatter Wikipedia time L R t + Dt t L R Hello, SIGGRAPH!
  • 9. Stereo Panning • Only uses the amplitude differences • Relatively common in stereo audio tracks • Works with any source of audio Line of sound 0 0.5 1 L R 0 0.5 1 L R 0 0.5 1 L R
  • 10. Stereophonic Sound Recording • Use two microphones • A-B techniques captures differences in time-of-arrival • Other configurations work too, capture differences in amplitude Rode Olympus Wikipedia X-Y technique
  • 11. Stereophonic Sound Synthesis L R R time amplitude L time amplitude• Ideal case: scaled & shifted Dirac peaks • Shortcoming: many positions are identical Input time amplitude Input
  • 12. Stereophonic Sound Synthesis • In practice: the path lengths and scattering are more complicated, includes scattering in the ear, shoulders etc. R time amplitude L time amplitude R time amplitude L time amplitude
  • 13. Head-Related Impulse Response (HRIR) • Captures temporal responses at all possible sound directions, parameterized by azimuth and elevation • Could also have a distance parameter • Can be measured with two microphones in ears of mannequin & speakers all around Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display” q q f L R
  • 14. Head-Related Impulse Response (HRIR) • CIPIC HRTF database: http://interface.cipic.ucdavis.edu/sound/hrtf.html • Elevation: –45° to 230.625°, azimuth: –80° to 80° • Need to interpolate between discretely sampled directions V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF Database,” 2001
  • 15. Head-Related Impulse Response (HRIR) • Storing the HRIR • Need one timeseries for each location • Total of samples, where is the number of samples for azimuth, elevation, and time, respectively hrirL t;q,f( ) hrirR t;q,f( ) 2×Nq ×Nf ×Nt Nq,f,t
  • 16. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position L R s t( ) s t( )
  • 17. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head L R q,f( ) s t( ) s t( ) q,f( )
  • 18. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head 2. Look up interpolated HRIR for left and right ear at these angles time time amplitude hrirL t;q,f( ) hrirR t;q,f( ) amplitude q,f( ) s t( )
  • 19. Head-Related Impulse Response (HRIR) Applying the HRIR: • Given a mono sound source and its 3D position 1. Compute relative to center of listener’s head 2. Look up interpolated HRIR for left and right ear at these angles 3. Convolve signal with HRIRs to get the sound at each ear time time amplitudeamplitude sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) hrirL t;q,f( ) hrirR t;q,f( ) q,f( ) s t( )
  • 20. Head-Related Transfer Function (HRTF) frequency amplitude hrtfL wt;q,f( ) hrtfR wt;q,f( ) amplitude sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) • HRTF is Fourier transform of HRIR! (you’ll find the term HRTF more often that HRIR) sL t( ) = F-1 hrtfL wt;q,f( )×F s t( ){ }{ } sR t( ) = F-1 hrtfR wt;q,f( )× F s t( ){ }{ } time time amplitude hrirL t;q,f( ) hrirR t;q,f( ) frequency
  • 21. • HRTF is Fourier transform of HRIR! (you’ll find the term HRTF more often that HRIR) • HRTF is complex-conjugate symmetric (since the HRIR must be real-valued) Head-Related Transfer Function (HRTF) frequency amplitudeamplitude frequency hrtfL wt;q,f( ) hrtfR wt;q,f( ) sL t( )= hrirL t;q,f( )*s t( ) sR t( )= hrirR t;q,f( )*s t( ) sL t( ) = F-1 hrtfL wt;q,f( )×F s t( ){ }{ } sR t( ) = F-1 hrtfR wt;q,f( )× F s t( ){ }{ }
  • 22. Spatial Sound of N Point Sound Sources L R s2 t( ) • Superposition principle holds, so just sum the contributions of each s1 t( ) q2 ,f2 ( ) q1 ,f1 ( ) sL t( ) = F-1 hrtfL wt;qi ,fi ( )× F si t( ){ }{ } i=1 N å sR t( ) = F-1 hrtfR wt;qi ,fi ( )× F si t( ){ }{ } i=1 N å
  • 23. Spatial Audio for VR • VR/AR requires us to re-think audio, especially spatial audio! • User’s head rotates freely  traditional surround sound systems like 5.1 or even 9.2 surround isn’t sufficient
  • 24. Spatial Audio for VR Two primary approaches: 1. Real-time sound engine • Render 3D sound sources via HRTF in real time, just as discussed in the previous slides • Used for games and synthetic virtual environments • A lot of libraries available: FMOD, OpenAL, etc.
  • 25. Spatial Audio for VR Two primary approaches: 2. Spatial sound recorded from real environments • Most widely used format now: Ambisonics • Simple microphones exist • Relatively easy mathematical model • Only need 4 channels for starters • Used in YouTube VR and many other platforms
  • 26. Ambisonics • Idea: represent sound incident at a point (i.e. the listener) with some directional information • Using all angles is impractical – need too many sound channels (one for each direction) • Some lower-order (in direction) components may be sufficient  directional basis representation to the rescue! q,f
  • 27. Ambisonics – Spherical Harmonics • Use spherical harmonics! • Orthogonal basis functions on the surface of a sphere, i.e. full-sphere surround sound • Think Fourier transform equivalent on a sphere
  • 28. Ambisonics – Spherical Harmonics 0th order 1st order 2nd order 3rd order Wikipedia Remember, these representing functions on a sphere’s surface
  • 29. Ambisonics – Spherical Harmonics 1st order approximation  4 channels: W, X, Y, Z W X Y Z Wikipedia
  • 30. Ambisonics – Recording • Can record 4-channel Ambisonics via special microphone • Same format supported by YouTube VR and other platforms http://www.oktava-shop.com/
  • 31. Ambisonics – Rendered Sources W = S × 1 2 X = S ×cosq cosf Y = S ×sinq cosf Z = S ×sinf • Can easily convert a point sound source, S, to the 4- channel Ambisonics representation • Given azimuth and elevation , compute W, X, Y, Z asq,f omnidirectional component (angle-independent) “stereo in x” “stereo in y” “stereo in z”
  • 32. Ambisonics – Playing it Back LF = 2W + X +Y( ) 8 LB = 2W - X +Y( ) 8 RF = 2W + X -Y( ) 8 RB = 2W - X -Y( ) 8 • Easiest way to render Ambisonics: convert W, X, Y, Z channels into 4 virtual speaker positions • For a regularly-spaced square setup, this results in LF LB RF R L R
  • 33. Ambisonics – Omnitone • Javascript-based first-order Ambisonic decoder Google, https://github.com/GoogleChrome/omnitone
  • 34. References and Further Reading • Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio HRTF: • Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics • download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html Resources by Google: • https://github.com/GoogleChrome/omnitone • https://developers.google.com/vr/concepts/spatial-audio • https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html • http://googlechrome.github.io/omnitone/#home • https://github.com/google/spatial-media/
  • 35. References and Further Reading • Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio HRTF: • Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics • download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html Resources by Google: • https://github.com/GoogleChrome/omnitone • https://developers.google.com/vr/concepts/spatial-audio • https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html • http://googlechrome.github.io/omnitone/#home • https://github.com/google/spatial-media/ Demo