Robert Baumgartner

  • The FWF project "Time-Frequency Implementation of HRTFs" has started.

    Principal Investigator: Damian Marelli

    Co-Applicants: Peter BalazsPiotr Majdak

  • AABBA is an intellectual open group of scientists collaborating on development and applications of models of human spatial hearing

    AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.

    AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).

    Structure

    • Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki

    • Members:

      • Aachen: Janina Fels, ITA, RWTH Aachen
      • Bochum: Dorothea Kolossa & Jens Blauert, Ruhr-Universität Bochum
      • Cardiff: John Culling, School of Psychology, Cardiff University
      • Copenhagen: Torsten Dau & Tobias May, DTU, Lyngby
      • Dresden: Ercan Altinsoy, TU Dresden
      • Ghent: Sarah Verhulst, Ghent University
      • Guangzhou: Bosun Xie, South China University of Technology, Guangzhou
      • Helsinki: Ville Pulkki & Nelli Salminen, Aalto University
      • Ilmenau: Alexander Raake, TU Ilmenau
      • Kosice: Norbert Kopčo, Safarik University, Košice
      • Lyon: Mathieu Lavandier, Université de Lyon
      • Munich I: Werner Hemmert, TUM München
      • Munich II: Bernhard Seeber, TUM München 
      • Oldenburg: Bernd Meyer, Carl von Ossietzky Universität Oldenburg
      • Oldenburg-Eindhoven: Steven van de Par & Armin Kohlrausch, Universität Oldenburg
      • Patras: John Mourjopoulos, University of Patras
      • Rostock: Sascha Spors, Universität Rostock
      • Sheffield: Guy Brown, The University of Sheffield
      • Tabriz: Masoud Geravanchizadeh, University of Tabriz
      • Toulouse: Patrick Danès, Université de Toulouse
      • Troy: Jonas Braasch, Rensselaer Polytechnic Institute, Troy
      • Vienna: Bernhard Laback & Robert Baumgartner, Austrian Academy of Sciences, Wien
      • The AMT (Umbrella Project): Piotr Majdak
    AABBA Group 2019
    AABBA group as of the 11th meeting 2019 in Vienna.

    Meetings

    Annual meetings are held at the beginning of each year:

    • 12th meeting: 16-17 January 2020, Vienna
    • 11th meeting: 19-20 February 2019, Vienna. Schedule.
    • 10th meeting: 30-31 January 2018, Vienna. Schedule. Group photo
    • 9th meeting: 27-28 February 2017, Vienna. Schedule.
    • 8th meeting: 21-22 January 2016, Vienna. Schedule.
    • 7th meeting: 22-23 February 2015, Berlin.
    • 6th meeting: 17-18 February 2014, Berlin.
    • 5th meeting: 24-25 January 2013, Berlin.
    • 4th meeting: 19-20 January 2012, Berlin.
    • 3rd meeting: 13-14 January 2011, Berlin.
    • 2nd meeting: 29-30 September 2009, Bochum.
    • 1st meeting: 23-26 March 2009, Rotterdam.

    Activities

    • Upcoming: Special Session "Binaural models: development and applications" at the ICA 2019, Aachen.
    • Special Session "Models and reproducible research" at the Acoustics'17 (EAA/ASA) 2017, Boston.
    • Structured Session "Applied Binaural Signal Processing" at the Forum Acusticum 2014, Krakòw.
    • Structured Session "The Technology of Binaural Listening & Understanding" at the ICA 2016, Buenos Aires.

    Contact person: Piotr Majdak

  • Virtual Acoustics: Localization Model & Numeric Simulations (LocaPhoto)

    LocaPhoto consisted of three parts: geometry acquisition, HRTF calculation, and HRTF evaluation by means of localization model.

    overview

    Geometry acquisition

    First, we have evaluated the potential of various 3-D scanners by comparing 3-D meshes obtained for some listeners (Reichinger et al, 2013). For the general means of comparison, we have created "reference" meshes by taking silicon impressions from listeners' ears and scanning them in a high-energy computer tomography scanner. While generally capable, not all 3-D scanners were able to obtain meshes of required quality, thus, limiting their application in practical end-user situations.

    Further, we were working on a procedure to generate 3-D meshes directly from 2-D photos by means of photogrammetric-reconstruction algorithms. Under selected conditions, we have obtained 3-D meshes allowing to calculate perceptually-valid HRTFs (publication under preparation).

    HRTF calculation

    While working on the geometry acquisition, we have developed, implemented, and evaluated a procedure to efficiently calculate HRTFs from a 3-D mesh. The software package Mesh2HRTF is based on a Blender plugin for mesh preparation, an executable application based on boundary-element methods, and Matlab tool for HRTF post-processing (Ziegelwanger et al., 2015a). The evaluation was done by comparing HRTFs calculated for reference meshes to acoustically measured HRTFs. Differences between various conditions were evaluated as model predictions and sound-localization experiments. We have shown that in the proximity of the ear canal, meshes with an average edge length of 1 mm or less are required. Also, we have shown that a small area as the virtual microphone used in the calculations yields best results (Ziegelwanger et al., 2015).

    In order to further improve the calculations, we have applied a non-uniform a-priori mesh grading to HRTF calculations. This method reduces the number of elements in the mesh down to 10 000 while still yielding perceptually-valid HRTFs (Ziegelwanger et al., 2016). With that method, HRTF calculations within less than an hour are achievable.

    HRTF evaluation

    Given the huge amount of parameters in the numerical calculations, hundreds of calculated HRTF sets had to be tested. The evaluation of HRTF quality is a complex task because it involves many percepts like directional sound localization, sound externalization, apparent source widening, distance perception, timbre changes, and others. Generally, one would like to have HRTFs generating virtual auditory scenes as realistic as natural scenes. While a model evaluating kind of "degree of realism" was out-of-reach, we focused on a very important and well-explored aspect: directional sound localization.

    For sound localization in the lateral dimension (left/right), there are not may aspects requiring HRTF individualization. The listener-specific ITD, as the interaural broadband difference between the sound's time-of-arrival, can contribute, though. Thus, we first created a 3-D model of time-of-arrival able to describe the ITD with a few parameters based on listener's HRTFs (Ziegelwanger and Majdak, 2014). 

    For sound localization in sagittal planes (top/down, front/back), individualization of HRTFs is a large issue. The whole process of sagittal-plane localization is still not completely understood, but the role of the dorsal cochlear nucleus (DCN) was known already at the beginning of LocaPhoto. Thus, in LocaPhoto, we have developed a model able to predict sagittal-plane sound localization performance, based on the spectral processing found in the DCN. It was rigorously evaluated in various conditions and was found to predict listener-specific localization performance quite well (Baumgartner et al., 2014).

    In LocaPhoto, this model allowed to evaluate many numerically calculated HRTFs. Also, it allowed to uncover surprising properties of human sound localization (Majdak et al., 2014). It is implemented in the Auditory Modeling Toolbox (Søndergaard and Majdak, 2013). It has been used for various evaluations (Baumgartner et al., 2013) like the positioning of loudspeakers in loudspeaker-based sound reproduction (Baumgartner and Majdak, 2015). And, it serves as a basis for a 3-D sound localization model (Altoe et al., 2014) and model addressing sensorineural hearing losses (Baumgartner et al., 2016).

    Funding:

    Austrian Science Fund (FWF, P 24124-N13)

    Duration:

    February 2012 - October 2016

    Publications:

    • Baumgartner, R., Majdak, P., Laback, B. (2016): Modeling the Effects of Sensorineural Hearing Loss on Sound Localization in the Median Plane, in: Trends in Hearing 20, 1-11.
    • Ziegelwanger, H., Kreuzer, W., Majdak, P. (2016): A priori mesh grading for the numerical calculation of the head-related transfer functions, in: Applied Acoustics 114, 99 - 110.  
    • Baumgartner, R., Majdak, P. (2015): Modeling Localization of Amplitude-Panned Virtual Sources in Sagittal Planes, in: J. Audio Eng. Soc 63, 562-569.
    • Ziegelwanger, H., Kreuzer, W., Majdak, P. (2015): Mesh2HRTF: An open-source software package for the numerical calculation of head-related transfer functions, in: Proceedings of the 22nd International Congress on Sound and Vibration (ICSV). Florence, Italy, 1-8.
    • Ziegelwanger, H., Majdak, P., Kreuzer, W. (2015): Numerical calculation of head-related transfer functions and sound localization: Microphone model and mesh discretization, in: The Journal of the Acoustical Society of America 138, 208-222.  
    • Altoè, A., Baumgartner, R., Majdak, P., Pulkki, V. (2014): Combining count-comparison and sagittal-plane localization models towards a three-dimensional representation of sound localization, in: Proceedings of the 7th Forum Acusticum. Krakow, Poland, 1-6.
    • Baumgartner, R., Majdak, P., Laback, B. (2014): Modeling Sound-Source Localization in Sagittal Planes for Human Listeners., in: The Journal of the Acoustical Society of America 136, 791-802.
    • Majdak, P., Baumgartner, R., Laback, B. (2014): Acoustic and non-acoustic factors in modeling listener-specific performance of sagittal-plane sound localization, in: Frontiers in Psychology 5, 319(1-10).
    • Baumgartner, R., Majdak, P., Laback, B. (2013): Assessment of sagittal-plane sound localization performance in spatial-audio applications, in: Blauert, J. (ed.), The Technology of Binaural Listening. Berlin-Heidelberg-New York (Springer), 93-119
    • Reichinger, A., Majdak, P., Sablatnig, R., Maierhofer, S. (2013): Evaluation of Methods for Optical 3-D Scanning of Human Pinnas, in: Proceedings of the 3D Vision Conference 2013, Third Joint 3DIM/3DPVT Conference. Seattle, WA, 390-397.
    • Søndergaard, P., Majdak, P. (2013): The Auditory Modeling Toolbox, in: Blauert, J. (ed.), The Technology of Binaural Listening. Berlin, Heidelberg, New York (Springer), 33-56

    Contact for more information:

    Piotr Majdak (Principle Investigator)

    Michael Mihocic (HRTF measurement)

  • Baumgartner et al. (2017a)

    Räumliches Hören ist wichtig, um die Umgebung ständig auf interessante oder gefährliche Geräusche zu überwachen und gezielt die Aufmerksam auf sie richten zu können. Die räumliche Trennung der beiden Ohren und die komplexe Geometrie des menschlichen Körpers liefern akustische Information über den Ort einer Schallquelle. Je nach Schalleinfallsrichtung verändert v.a. die Ohrmuschel das Klangspektrum, bevor der Schall das Trommelfell erreicht. Da die Ohrmuschel sehr individuell geformt ist (mehr noch als ein Fingerabdruck), ist auch deren Klangfärbung sehr individuell. Für die künstliche Erzeugung realistischer Hörwahrnehmungen muss diese Individualität so präzise wie nötig abgebildet werden, wobei bisher nicht geklärt ist, was wirklich nötig ist. SpExCue hat deshalb nach elektrophysiologischen Maßen und Vorhersagemodellen geforscht, die abbilden können, wie räumlich realistisch („externalisiert“) eine virtuelle Quelle empfunden wird.

    Da künstliche Quellen vorzugsweise im Kopf wahrgenommen werden, eignete sich die Untersuchung dieser Klangspektren zugleich zur Erforschung einer Verzerrung in der Hörwahrnehmung: Schallereignisse, die sich dem Hörer annähern, werden intensiver wahrgenommen als jene, die sich vom Hörer entfernen. Frühere Studien zeigten diese Verzerrung ausschließlich durch Lautheitsänderungen (zunehmende/abnehmende Lautheit wurde verwendet um sich nähernde/entfernende Schallereignisse zu simulieren). Es war daher unklar, ob die Verzerrung wirklich auf Wahrnehmungsunterschiede gegenüber der Bewegungsrichtung oder nur auf die unterschiedlichen Lautstärken zurück zu führen sind. Unsere Studie konnte nachweisen, dass räumliche Änderungen der Klangfarbe diese Verzerrungen (auf Verhaltensebene und elektrophysiologisch) auch bei gleichbleibender Lautstärke hervorrufen können und somit von einer allgemeinen Wahrnehmungsverzerrung auszugehen ist.

    Des Weiteren untersuchte SpExCue, wie die Kombination verschiedener räumlicher Hörinformation die Aufmerksamkeitskontrolle in einer Spracherkennungsaufgabe mit gleichzeitigen Sprechern, wie z.B. bei einer Cocktailparty, beeinflusst. Wir fanden heraus, dass natürliche Kombinationen räumlicher Hörinformation mehr Gehinraktivität in Vorbereitung auf das Testsignal herrufen und dadurch die neurale Verarbeitung der zu folgenden Sprache optimiert wird.

    SpExCue verglich außerdem verschiedene Ansätze von Berechnungsmodellen, die darauf abzielen, die räumliche Wahrnehmung von Klangänderungen vorherzusagen. Obwohl viele frühere experimentelle Ergebnisse von mindestens einem der Modellansätze vorhergesagt werden konnten, konnte keines von ihnen all diese Ergebnisse erklären. Um das zukünftige Erstellen von allgemeingültigeren Berechnungsmodellen für den räumlichen Hörsinn zu unterstützen, haben wir abschließend ein konzeptionelles kognitives Modell dafür entwickelt.

    Funding

    Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

    Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.

    Publications

    • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017a): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
    • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017b): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
    • Deng, Yuqi, Choi, Inyong, Shinn-Cunningham, Barbara G., Baumgartner, Robert (2019): Impoverished auditory cues fail to engage brain networks controlling spatial selective attention, in: bioRxiv, 533117. (preprint)
    • Majdak, Piotr, Baumgartner, Robert, Jenny, Claudia (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)
  • Objectives:

    In the context of binaural virtual acoustics, a sound source is positioned in a free-field 3-D space around the listener by filtering it via head-related transfer functions (HRTFs). In a real-time application, numerous HRTFs need to be processed. The long impulse responses of the HRTFs require a high computational power, which is difficult to directly implement on current processors in situations involving more than a few simultaneous sources.

    Technically speaking, an HRTF is a linear time-invariant (LTI) system. An LTI system can be implemented in the time domain by direct convolution or recursive filtering. This approach is computationally inefficient. A computationally efficient approach consists of implementing the system in the frequency domain; however, this approach is not suitable for real-time applications since a very large delay is introduced. A compromise solution of both approaches is provided by a family of segmented-FFT methods, which permits a trade-off between latency and computational complexity. As an alternative, the sub-band method can be applied as a technique to represent linear systems in the time-frequency domain. Recent work has showed that the sub-band method offers an even better tradeoff between latency and computational complexity than segmented-FFT methods. However, the sub-band analysis is still mathematically challenging and its optimum configuration is dependant on the application under consideration.

    Methods:

    TF-VA involves developing and investigating new techniques for configuring the sub-band method by using advanced optimization methods in a functional analysis context. As a result, an optimization technique that minimizes the computational complexity of the sub-band method will be obtained.

    Two approaches will be considered: The first approach designs the time-frequency transform for minimizing the complexity of each HRTF. In the second approach, we will design a unique time-frequency transform, which will be used for a joint implementation of all HRTFs of a listener. This will permit an efficient implementation of interpolation techniques while moving sources spatially in real-time. The results will be evaluated in subjective localization experiments and in terms of localization models.

    Status:

    • Main participator: Damian Marelli (University of Newcastle, Australia)
    • Co-applicants: Peter Balazs, Piotr Majdak
    • Project begin: November 2011
    • Funding: Lise-Meitner-Programm of the Austrian Science Fund (FWF) [M 1230-N13]