Build Your Own VR Display Course - SIGGRAPH 2017: Part 4

Build Your Own VR Display
Spatial Sound
Nitish Padmanaban
Stanford University
stanford.edu/class/ee267/

Overview
• What is sound?
• The human auditory system
• Stereophonic sound
• Spatial audio of point sound sources
• Recorded spatial audio
Zhong and Xie, “Head-Related Transfer Functions
and Virtual Auditory Display”

What is Sound?
• “Sound” is a pressure wave propagating in a medium
• Speed of sound is where c is velocity, is density of
medium and K is elastic bulk modulus
• In air, speed of sound is 340 m/s
• In water, speed of sound is 1,483 m/s
c = K
r r

Producing Sound
• Sound is longitudinal vibration
of air particles
• Speakers create wavefronts by
physically compressing the air,
much like one could a slinky

The Human Auditory System
pinna
Wikipedia

pinna
cochlea
Wikipedia
• Hair receptor cells pick up
vibrations

• Human hearing range:
~20–20,000 Hz
• Variation between
individuals
• Degrades with age
D. W. Robinson and R. S. Dadson, 1957
Hearing Threshold in Quiet

Stereophonic Sound
• Mainly captures differences between the ears:
• Inter-aural time difference
• Amplitude differences from path length
and scatter
Wikipedia
time
L
R
t + Dt
t
L R
Hello,
SIGGRAPH!

Stereo Panning
• Only uses the amplitude differences
• Relatively common in stereo audio tracks
• Works with any source of audio
Line of
sound
0
0.5
1
L R
0
0.5
1
L R
0
0.5
1
L R

Stereophonic Sound Recording
• Use two microphones
• A-B techniques captures
differences in time-of-arrival
• Other configurations work too,
capture differences in amplitude
Rode
Olympus
Wikipedia
X-Y technique

Stereophonic Sound Synthesis
L R
R
time
amplitude
L
time
amplitude• Ideal case: scaled & shifted Dirac peaks
• Shortcoming: many positions are identical
Input
time
amplitude
Input

Stereophonic Sound Synthesis
• In practice: the path lengths and scattering are more
complicated, includes scattering in the ear, shoulders etc.
R
time
amplitude
L
time
amplitude
R
time
amplitude
L
time
amplitude

Head-Related Impulse Response (HRIR)
• Captures temporal responses at all possible sound directions,
parameterized by azimuth and elevation
• Could also have a distance parameter
• Can be measured with two microphones in ears of mannequin &
speakers all around
Zhong and Xie, “Head-Related Transfer Functions and Virtual Auditory Display”
q
q f
L R

• CIPIC HRTF database: http://interface.cipic.ucdavis.edu/sound/hrtf.html
• Elevation: –45° to 230.625°, azimuth: –80° to 80°
• Need to interpolate between discretely sampled directions
V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF Database,” 2001

• Storing the HRIR
• Need one timeseries for each location
• Total of samples, where is the number of
samples for azimuth, elevation, and time, respectively
hrirL t;q,f( )
hrirR t;q,f( )
2×Nq ×Nf ×Nt Nq,f,t

Applying the HRIR:
• Given a mono sound source and its 3D position
L R
s t( )
s t( )

Applying the HRIR:
1. Compute relative to center of listener’s head
L R
q,f( )
s t( )
s t( )
q,f( )

Applying the HRIR:
2. Look up interpolated HRIR for left and right ear at these
angles
time
time
amplitude
hrirL t;q,f( )
hrirR t;q,f( )
amplitude
q,f( )
s t( )

Applying the HRIR:
2. Look up interpolated HRIR for left and right ear at these
angles
3. Convolve signal with HRIRs to get the sound
at each ear
time
time
amplitudeamplitude
sL t( )= hrirL t;q,f( )*s t( )
sR t( )= hrirR t;q,f( )*s t( )
hrirL t;q,f( )
hrirR t;q,f( )
q,f( )
s t( )

Head-Related Transfer Function (HRTF)
frequency
amplitude
hrtfL wt;q,f( )
hrtfR wt;q,f( )
amplitude
• HRTF is Fourier transform of HRIR! (you’ll find the term HRTF
more often that HRIR)
sL t( ) = F-1
hrtfL wt;q,f( )×F s t( ){ }{ }
sR t( ) = F-1
hrtfR wt;q,f( )× F s t( ){ }{ }
time
time
amplitude
hrirL t;q,f( )
hrirR t;q,f( )
frequency

• HRTF is Fourier transform of HRIR! (you’ll find the term HRTF
more often that HRIR)
• HRTF is complex-conjugate
symmetric (since the HRIR must
be real-valued)
Head-Related Transfer Function (HRTF)
frequency
amplitudeamplitude
frequency
hrtfL wt;q,f( )
hrtfR wt;q,f( )
sL t( ) = F-1
hrtfL wt;q,f( )×F s t( ){ }{ }
sR t( ) = F-1
hrtfR wt;q,f( )× F s t( ){ }{ }

Spatial Sound of N Point Sound Sources
L R
s2 t( )
• Superposition principle holds, so just sum the contributions of
each s1 t( )
q2
,f2
( )
q1
,f1
( )
sL t( ) = F-1
hrtfL wt;qi
,fi
( )× F si t( ){ }{ }
i=1
N
å
sR t( ) = F-1
hrtfR wt;qi
,fi
( )× F si t( ){ }{ }
i=1
N
å

Spatial Audio for VR
• VR/AR requires us to re-think audio, especially spatial audio!
• User’s head rotates freely  traditional surround sound
systems like 5.1 or even 9.2 surround isn’t sufficient

Two primary approaches:
1. Real-time sound engine
• Render 3D sound sources via HRTF in real time, just
as discussed in the previous slides
• Used for games and synthetic virtual environments
• A lot of libraries available: FMOD, OpenAL, etc.

Two primary approaches:
2. Spatial sound recorded from real environments
• Most widely used format now: Ambisonics
• Simple microphones exist
• Relatively easy mathematical model
• Only need 4 channels for starters
• Used in YouTube VR and many other platforms

Ambisonics
• Idea: represent sound incident at a point (i.e. the listener)
with some directional information
• Using all angles is impractical – need too many sound
channels (one for each direction)
• Some lower-order (in direction) components may be
sufficient  directional basis representation to the rescue!
q,f

Ambisonics – Spherical Harmonics
• Use spherical harmonics!
• Orthogonal basis functions on the surface of a sphere, i.e.
full-sphere surround sound
• Think Fourier transform equivalent on a sphere

0th order
1st order
2nd order
3rd order
Wikipedia
Remember, these
representing functions on
a sphere’s surface

1st order approximation
 4 channels: W, X, Y, Z
W
X Y Z
Wikipedia

Ambisonics – Recording
• Can record 4-channel Ambisonics via special microphone
• Same format supported by YouTube VR and other
platforms
http://www.oktava-shop.com/

Ambisonics – Rendered Sources
W = S ×
1
2
X = S ×cosq cosf
Y = S ×sinq cosf
Z = S ×sinf
• Can easily convert a point sound source, S, to the 4-
channel Ambisonics representation
• Given azimuth and elevation , compute W, X, Y, Z asq,f
omnidirectional component (angle-independent)
“stereo in x”
“stereo in y”
“stereo in z”

Ambisonics – Playing it Back
LF = 2W + X +Y( ) 8
LB = 2W - X +Y( ) 8
RF = 2W + X -Y( ) 8
RB = 2W - X -Y( ) 8
• Easiest way to render Ambisonics: convert W, X, Y, Z
channels into 4 virtual speaker positions
• For a regularly-spaced square setup, this results in
LF
LB
RF
R
L R

Ambisonics – Omnitone
• Javascript-based first-order Ambisonic decoder
Google, https://github.com/GoogleChrome/omnitone

References and Further Reading
• Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio
HRTF:
• Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on
Applications of Signal Processing to Audio and Electroacoustics
• download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html
Resources by Google:
• https://github.com/GoogleChrome/omnitone
• https://developers.google.com/vr/concepts/spatial-audio
• https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html
• http://googlechrome.github.io/omnitone/#home
• https://github.com/google/spatial-media/

References and Further Reading
• Google’s take on spatial audio: https://developers.google.com/vr/concepts/spatial-audio
HRTF:
• Algazi, Duda, Thompson, Avendado “The CIPIC HRTF Database”, Proc. 2001 IEEE Workshop on
Applications of Signal Processing to Audio and Electroacoustics
• download CIPIC HRTF database here: http://interface.cipic.ucdavis.edu/sound/hrtf.html
Resources by Google:
• https://github.com/GoogleChrome/omnitone
• https://developers.google.com/vr/concepts/spatial-audio
• https://opensource.googleblog.com/2016/07/omnitone-spatial-audio-on-web.html
• http://googlechrome.github.io/omnitone/#home
• https://github.com/google/spatial-media/
Demo

Build Your Own VR Display Course - SIGGRAPH 2017: Part 4

More Related Content

What's hot (18)

Similar to Build Your Own VR Display Course - SIGGRAPH 2017: Part 4 (20)

More from StanfordComputationalImaging (18)

Recently uploaded (20)

Build Your Own VR Display Course - SIGGRAPH 2017: Part 4