Wavelets, HRTFs, and sound localization

Paul Hubbard, Kristin L. Umland, M. Cristina Pereyra

Wavelet-domain HRTF, non-standard form, D6 basis

Wavelet-domain HRTF (Head-Related Transfer Function), Non-Standard form, Daubechies-6 basis, thresholded at maximal row norm of 1.0e-7. See fig2.m for the code to generate this figure.

Synopsis

We implemented wavelet-domain convolution, in the non-standard matrix form developed by Beylkin et al. The code here will load a head-related transfer function (HRTF), convert it to the non-standard form (NSF) and the convolve it with a WAV-format audio file. The convolution is done in the wavelet domain, and the result is a (approximated) localized sound clip.

Err...what?

In plainer English - you can process audio such that it sounds as if its coming from a specific location in 3D space. This is really cool, especially for games and theater. This project was an effort to speed up the localization process using math tricks from people much smarter than I am. It didn't really work, but is still an interesting use of the wavelet transform. And hey, the filtering might work better for your application.

Please read the paper for more details. Also feel free to contact me (address below) if you have questions, comments, ideas, and so forth.

Publication Info

This has been accepted into the proceedings of the Wavelets X Conference, Aug 3-8 2003 in San Diego.

A copy of my presentation slides (PDF format) is now available.

Copyright

The paper copyright has been transferred to SPIE, so I cannot send you the LaTex source or PostScript versions. However, older versions are posted below, and the code and other supporting materials are all here.

Software Requirements to run the code

(URLs for these are found below)

MATLAB, v5 or greater
KEMAR HRTF set from MIT
WaveLab toolkit from Stanford
Source code below

In the interest of reproducible research, we are making all of the code available here. Please contact us if you use this code in your projects or research - we would very much like to hear of further work and/or applications in this area.

Matlab versions

This code was originally written for MATLAB v5.0/5.1. When I ported it to v6.5, I discovered that the 'flops' counter I used to estimate complexity was gone, replaced by an elapsed time tick/tock that was less useful. So the code as posted no longer computes flops, for which I apologize - you have to run it on v5.1 to see those.

What about MATLAB clones like Octave?

I've experimented off and on with porting the code to Octave, but have run into problems with the runtime. For example, there are no routines to read and write WAV files, though I could convert my audio clips into another format. Also, I get odd results with the KEMAR HRTF files, where Octave reads too few samples. Currently on hold, though I'd gladly accept any help.

The Source code

Sound files: Listen to the Results

We found the most useful test clip to be a Beethoven piano concerto (the complete reference is in the paper).

lvb.wav, original, 504kB
Localized to zero degrees elevation, 40 degrees azimuth, using MATLAB's 'convolve' function: ref_lvb.wav, 1006kB
Note the addition of artifacts and distortion - this is because the KEMAR HRTF set is, while free, in need of work.
And now, here is the same clip, localized in the wavelet domain. wd_lvb.wav, 1008kB
The parameters for this are: Daubechies D4 basis, all detail levels, epsilon (Maximal row norm error) set to 0.001
Some discernable differences from the normal convolution, a slight metallic artifact is present.
Same clip, better approximation (Epsilon reduced to 0.0001). wd_lvb_2.wav, 1008kB
Now indistinguishable from the standard convolution.

Previous versions of the paper

Paper, MS word97 format, as rejected by VR2000 Lacking in detail, but covers most of the essentials.

In the interests of, well, something, here are the reviews. Others talk about full disclosure, but we deliver!

Paper, VR00 version, PostScript formatted, for those of you not using MS products.

Contacting the authors

Paul Hubbard's email is [email protected]