• Virtual Acoustics: Localization Model & Numeric Simulations (LocaPhoto)

    LocaPhoto consisted of three parts: geometry acquisition, HRTF calculation, and HRTF evaluation by means of localization model.


    Geometry acquisition

    First, we have evaluated the potential of various 3-D scanners by comparing 3-D meshes obtained for some listeners (Reichinger et al, 2013). For the general means of comparison, we have created "reference" meshes by taking silicon impressions from listeners' ears and scanning them in a high-energy computer tomography scanner. While generally capable, not all 3-D scanners were able to obtain meshes of required quality, thus, limiting their application in practical end-user situations.

    Further, we were working on a procedure to generate 3-D meshes directly from 2-D photos by means of photogrammetric-reconstruction algorithms. Under selected conditions, we have obtained 3-D meshes allowing to calculate perceptually-valid HRTFs (publication under preparation).

    HRTF calculation

    While working on the geometry acquisition, we have developed, implemented, and evaluated a procedure to efficiently calculate HRTFs from a 3-D mesh. The software package Mesh2HRTF is based on a Blender plugin for mesh preparation, an executable application based on boundary-element methods, and Matlab tool for HRTF post-processing (Ziegelwanger et al., 2015a). The evaluation was done by comparing HRTFs calculated for reference meshes to acoustically measured HRTFs. Differences between various conditions were evaluated as model predictions and sound-localization experiments. We have shown that in the proximity of the ear canal, meshes with an average edge length of 1 mm or less are required. Also, we have shown that a small area as the virtual microphone used in the calculations yields best results (Ziegelwanger et al., 2015).

    In order to further improve the calculations, we have applied a non-uniform a-priori mesh grading to HRTF calculations. This method reduces the number of elements in the mesh down to 10 000 while still yielding perceptually-valid HRTFs (Ziegelwanger et al., 2016). With that method, HRTF calculations within less than an hour are achievable.

    HRTF evaluation

    Given the huge amount of parameters in the numerical calculations, hundreds of calculated HRTF sets had to be tested. The evaluation of HRTF quality is a complex task because it involves many percepts like directional sound localization, sound externalization, apparent source widening, distance perception, timbre changes, and others. Generally, one would like to have HRTFs generating virtual auditory scenes as realistic as natural scenes. While a model evaluating kind of "degree of realism" was out-of-reach, we focused on a very important and well-explored aspect: directional sound localization.

    For sound localization in the lateral dimension (left/right), there are not may aspects requiring HRTF individualization. The listener-specific ITD, as the interaural broadband difference between the sound's time-of-arrival, can contribute, though. Thus, we first created a 3-D model of time-of-arrival able to describe the ITD with a few parameters based on listener's HRTFs (Ziegelwanger and Majdak, 2014). 

    For sound localization in sagittal planes (top/down, front/back), individualization of HRTFs is a large issue. The whole process of sagittal-plane localization is still not completely understood, but the role of the dorsal cochlear nucleus (DCN) was known already at the beginning of LocaPhoto. Thus, in LocaPhoto, we have developed a model able to predict sagittal-plane sound localization performance, based on the spectral processing found in the DCN. It was rigorously evaluated in various conditions and was found to predict listener-specific localization performance quite well (Baumgartner et al., 2014).

    In LocaPhoto, this model allowed to evaluate many numerically calculated HRTFs. Also, it allowed to uncover surprising properties of human sound localization (Majdak et al., 2014). It is implemented in the Auditory Modeling Toolbox (Søndergaard and Majdak, 2013). It has been used for various evaluations (Baumgartner et al., 2013) like the positioning of loudspeakers in loudspeaker-based sound reproduction (Baumgartner and Majdak, 2015). And, it serves as a basis for a 3-D sound localization model (Altoe et al., 2014) and model addressing sensorineural hearing losses (Baumgartner et al., 2016).


    Austrian Science Fund (FWF, P 24124-N13)


    February 2012 - October 2016


    • Baumgartner, R., Majdak, P., Laback, B. (2016): Modeling the Effects of Sensorineural Hearing Loss on Sound Localization in the Median Plane, in: Trends in Hearing 20, 1-11.
    • Ziegelwanger, H., Kreuzer, W., Majdak, P. (2016): A priori mesh grading for the numerical calculation of the head-related transfer functions, in: Applied Acoustics 114, 99 - 110.  
    • Baumgartner, R., Majdak, P. (2015): Modeling Localization of Amplitude-Panned Virtual Sources in Sagittal Planes, in: J. Audio Eng. Soc 63, 562-569.
    • Ziegelwanger, H., Kreuzer, W., Majdak, P. (2015): Mesh2HRTF: An open-source software package for the numerical calculation of head-related transfer functions, in: Proceedings of the 22nd International Congress on Sound and Vibration (ICSV). Florence, Italy, 1-8.
    • Ziegelwanger, H., Majdak, P., Kreuzer, W. (2015): Numerical calculation of head-related transfer functions and sound localization: Microphone model and mesh discretization, in: The Journal of the Acoustical Society of America 138, 208-222.  
    • Altoè, A., Baumgartner, R., Majdak, P., Pulkki, V. (2014): Combining count-comparison and sagittal-plane localization models towards a three-dimensional representation of sound localization, in: Proceedings of the 7th Forum Acusticum. Krakow, Poland, 1-6.
    • Baumgartner, R., Majdak, P., Laback, B. (2014): Modeling Sound-Source Localization in Sagittal Planes for Human Listeners., in: The Journal of the Acoustical Society of America 136, 791-802.
    • Majdak, P., Baumgartner, R., Laback, B. (2014): Acoustic and non-acoustic factors in modeling listener-specific performance of sagittal-plane sound localization, in: Frontiers in Psychology 5, 319(1-10).
    • Baumgartner, R., Majdak, P., Laback, B. (2013): Assessment of sagittal-plane sound localization performance in spatial-audio applications, in: Blauert, J. (ed.), The Technology of Binaural Listening. Berlin-Heidelberg-New York (Springer), 93-119
    • Reichinger, A., Majdak, P., Sablatnig, R., Maierhofer, S. (2013): Evaluation of Methods for Optical 3-D Scanning of Human Pinnas, in: Proceedings of the 3D Vision Conference 2013, Third Joint 3DIM/3DPVT Conference. Seattle, WA, 390-397.
    • Søndergaard, P., Majdak, P. (2013): The Auditory Modeling Toolbox, in: Blauert, J. (ed.), The Technology of Binaural Listening. Berlin, Heidelberg, New York (Springer), 33-56

    Contact for more information:

    Piotr Majdak (Principle Investigator)

    Michael Mihocic (HRTF measurement)

  • Baumgartner et al. (2017a)

    Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.

    Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.

    The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017a): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.

    Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.

    SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017b). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in prep.).


    Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

    Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.


    • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017a): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
    • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017b): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
    • Deng, Yuqi, Choi, Inyong, Shinn-Cunningham, Barbara G., Baumgartner, Robert (2019): Impoverished auditory cues fail to engage brain networks controlling spatial selective attention, in: bioRxiv, 533117. (preprint)
    • Majdak, Piotr, Baumgartner, Robert, Jenny, Claudia (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)