EAP

  • Objective:

    Head-related transfer functions (HRTFs) describe sound transmission from the free field to a place in the ear canal in terms of linear time-invariant systems. They contain spectral and temporal features that vary according to the sound direction. Differences among subjects requires the measuring of subjects' individual HRTFs for studies on localization in virtual environments. In this project, a system for HRTF measurement was developed and installed in the semi-anechoic room at the Austrian Academy of Sciences.

    Method:

    Measurement of an HRTF was considered a system identification of the electro-acoustic chain: sound source-room-HRTF-microphone. The sounds in the ear canals were captured using in-ear microphones. The direction of the sound source was varied horizontally by rotating the subject on a turntable, and vertically by accessing one of the 22 loudspeakers positioned in the median plane. An optimized form of system identification with sweeps, the multiple exponential sweep method (MESM), was used for the measurement of transfer functions with satisfactory signal-to-noise ratios occurring within a reasonable amount of time. Subjects' positions were tracked during the measurement to ensure sufficient measurement accuracy. Measurement of headphone transfer functions was included in the HRTF measurement procedure. This allows equalization of headphone influence during the presentation of virtual stimuli.

    Results:

    Multi-channel audio equipment has been installed in the semi-anechoic room, giving access to recording and stimuli presentation via 24 channels simultaneously.

    The multiple exponential sweep method was developed, allowing fast transfer function measurement of weakly non-linear time invariant systems for multiple sources.

    The measurement procedure was developed and a database of HRTFs was created. Until now, HRTF data for over 20 subjects had not been available to create virtual stimuli and present them via headphones.

    To virtually position sounds in space, the HRTFs are used for filtering free-field sounds. This results in virtual acoustic stimuli (VAS). To create VAS and present them via headphones, applications called Virtual Sound Positioning (VSP) and Loca (Part of our ExpSuite Software Project) have been implemented. It allows virtual sound positioning in a free-field environment using both stationary and moving sound sources

  • Objective:

    In this project, head-related transfer functions (HRTFs) are measured and prepared for localization tests with cochlear implant listeners. The method and apparatus used for the measurement is the same as used for the general HRTF measurement (see project HRTF-System); however, the place where sound is acquired is different. In this project, the microphones built into the behind-the-ear (BtE) processors of cochlear implantees are used. The processors are located on the pinna, and the unprocessed microphone signals are used to calculate the BtE-HRTFs for different spatial positions.

    The BtE-HRTFs are then used in localization tests like Loca BtE-CI.

  • Objective and Methods:

    This study investigates the effect of the number of frequency channels on vertical place sound localization, especially front/back discrimination. This is important to determine how many of the basal-most channels/electrodes of a cochlear implant (CI) are needed to encode spectral localization cues. Normal hearing subjects listening to a CI simulation (the newly developed GET vocoder) will perform the experiment using the localization method developed in the subproject "Loca Methods". Learning effects will be studied by obtaining visual feedback.

    Results:

    Experiments are underway.

    Application:

    Knowing the number of channels required to encode spectral cues for localization in the vertical planes is an important step in the development of a 3-D localization strategy for CIs. 

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Publications:

    • Goupell, M., Majdak, P., and Laback, B. (2010). Median-plane sound localization as a function of the number of spectral channels using a channel vocoder, J. Acoust. Soc. Am. 127, 990-1001.
  • Objective:

    This project investigated the perception of interaural intensity differences among cochlear implant (CI) listeners in relation to the spectral composition and the temporal structure of the signal.

    Method:

    The perception thresholds (just noticeable differences, JND) of CI listeners were examined using differently structured signals. The stimuli were applied directly to the clinical signal processing units, while the parameters of the ongoing stimulation were closely monitored.

    Results:

    JNDs of IIDs in CI listeners ranged from 1.5 - 2.5 dB for a detection level of 80 percent. The type of stimulus seems to bear little relevance on the detection performance, with the exception of one single type of signal - a pulse train with a frequency of 20 Hz. This means that JNDs of CI listeners are only irrelevantly higher than those of normal hearing listeners. CI implantees are sensitive to IIDs, and the JNDs correlate to a difference in arrival angles ranging from 5-10 degrees. Since the JNDs are within the minimal level widths of the transfer of amplitudes by the CI system, the reduction of level width in future systems seems advisable.

    Publication:

    • Laback, B., Pok, S. M., Baumgartner, W. D., Deutsch, W. A., and Schmid, K. (2004). “Sensitivity to interaural level and envelope time differences of two bilateral cochlear implant listeners using clinical sound processors,” Ear and Hearing 25, 5, 488-500.
  • Objective and Methods:

    This project cluster includes several studies on the perception of interaural time differences (ITD) in cochlear implant (CI), hearing impaired (HI), and normal hearing (NH) listeners. Studying different groups of listeners allows for identification of the factors that are most important to ITD perception. Furthermore, the comparison between the groups allows for the development of strategies to improve ITD sensitivity in CI and HI listeners.

    Subprojects:

    • FsGd: Effects of ITD in Ongoing, Onset, and Offset in Cochlear Implant Listeners
    • ITD Sync: Effects of interaural time difference in fine structure and envelope on lateral discrimination in electric hearing
    • ITD Jitter CI: Recovery from binaural adaptation with cochlear implants
    • ITD Jitter NH: Recovery from binaural adaptation in normal hearing
    • ITD Jitter HI: Recovery from binaural adaptation with sensorineural hearing impairment
    • ITD CF: Effect of center frequency and rate on the sensitivity to interaural delay in high-frequency click trains
    • IID-CI: Perception of Interaural Intensity Differences by Cochlear Implant Listeners

       

  • French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

    Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

    Running period: 2014-2017 (project started on March 1, 2014).

    Abstract:

    One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

    1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
    2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

    Working program:

    POTION is structured in three main tasks:

    1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
    2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
    3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

    More information on the project can be found on the POTION web page.

    Publications:

    • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

    Related topics investigated at the ARI:

  • Objective:

    The sensitivity of normal hearing listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). Cochlear implant (CI) listeners show a similar limitation in ITD sensitivity with respect to the rate of unmodulated pulse trains containing ITD. Unfortunately, such high rates are needed to appropriately sample the modulation information of the acoustic signal. This study tests the ideas that (1) similar "binaural adaptation" mechanisms are limiting the performance in both subject groups, (2) the effect is related to the periodicity of pulse trains, and (3) introducing jitter (randomness) into the pulse timing causes a recovery from binaural adaptation and thus improves ITD sensitivity at higher pulse rates.

    Method and Results:

    These ideas have been studied by testing the ITD sensitivity of five CI listeners. The parameters' pulse rate, amount of jitter (where the minimum represents the periodic condition), and ITD were all varied. We showed that introducing binaurally synchronized jitter in the stimulation timing causes large improvements in ITD sensitivity at higher pulse rates (? 800 pps). Our experimental results demonstrate that a purely temporal trigger can cause recovery from binaural adaptation.

    Application:

    Applying binaurally jittered in stimulation strategies may improve several aspects of binaural hearing in bilateral recipients of CIs, including localization of sound sources and speech segregation in noise.

    Funding:

    Internal

    Publications:

    • Laback, B., and Majdak, P. (2007). Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, Proc Natl Acad Sci USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008). Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc Natl Acad Sci USA 12, 105, 32.
    • Laback, B., and Majdak, P. (2007). Binaural stimulation in neural auditory prostheses or hearing aids, provisional US und EP patent application (submitted 20.06.07).
  • Objective:

    The sensitivity of normal hearing (NH) listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In another study (Laback and Majdak, 2008), it was hypothesized that introducing binaural jitter may improve ITD sensitivity in bilateral cochlear implant (CI) listeners by avoiding periodicity. Indeed, the results showed large improvements at high rates (≥ 800 pps). This was interpreted as an indication for a recovery from binaural adaptation. 

    In this study, we further investigated this effect using NH subjects. We attempted to understand the underlying mechanisms by applying a well-established model of peripheral auditory processing. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with a pulse rate of 600 pps were used at a nominal pulse rate of 600 pulses per second (pps). It was found that randomly jittering the timing of the pulses significantly increases detectability of the ITD. A second experiment was performed to observe the effect of place and rate for pulse trains. It was shown that ITD sensitivity for jittered pulse trains at 1200 pps were significantly higher than periodic pulse trains at 600 pps. Therefore, with the addition of jitter, listeners were not solely benefiting from the longest interpulse intervals and instances of reduced rate. A third experiment, using a 900 pps pulse train, confirmed the improvement in ITD sensitivity. This occurred even when random amplitude modulation, a side-effect in the case of large amounts of jitter, is ruled out. A model of peripheral auditory processing up to the brain stem (Nucleus Cochlearis) has been applied to study the mechanisms underlying the improvements in ITD sensitivity. It was found that the irregular timing of the jittered pulses increases the synchrony of firing of the cochlear nucleus. These results suggest that a recovery from binaural adaptation activated by a temporal irregularity is possibly occurring at the level of the cochlear nucleus.

    Application:

    Together with the results of Laback and Majdak (2008) on the effect of binaural jitter in CI listeners, these results suggest that the binaural adaptation effect first observed by Hafter and Dye (1983) is related to the synchrony of neural firings across auditory nerve fibers. The nerve fibers, in turn, innervate cochlear nucleus cells. At higher rates, periodic pulse trains result in little synchrony of the response to the ongoing signal. Jittering the pulse timing increases the probability of synchronous firing across AN fibers at certain instances of time. Further studies are required to determine if other aspects of binaural adaptation can also be attributed to this explanation. 

    Funding:

    Internal

    Publications:

    • Goupell, M. J., Laback, B., Majdak, P. (2009): Enhancing sensitivity to interaural time differences at high modulation rates by introducing temporal jitter, in: J. Acoust. Soc. Am. 126, 2511-2521.
    • Laback, B., and Majdak, P. (2007): Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, in: Proc. Natl. Acad. Sci. USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008): Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc. Natl. Acad. Sci. USA 12, 105, 32.
  • Objective:

    Normal hearing (NH) listener sensitivity to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In other studies (Laback and Majdak, 2008; Goupell et al., 2008), it has been shown that introducing binaural jitter improves ITD sensitivity at higher rates in bilateral cochlear implant (CI) listeners as well as in NH listeners. The results were interpreted in terms of a recovery from binaural adaptation. Sensorineural hearing impairment often results in reduced ITD sensitivity (e.g. Hawkins and Wightman, 1980). The present study investigates if a similar recovery from binaural adaptation, and thus an improvement in ITD sensitivity, can be achieved in hearing impaired listeners. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with pulse rates of 400 and 600 pulses per second (pps) are used. Different amounts of jitter (the minimum representing the periodic condition) and different ITDs are tested. Listeners with a moderate cochlear hearing loss are selected. Additional stimuli tested are bandpass-filtered noise bands at 4 kHz and low-frequency stimuli at 500 Hz (sinusoids, SAMs, noise bands  and jittered pulse trains). The levels of the stimuli are adjusted in pretests to achieve a centered auditory image at a comfortable loudness.

    Data collected so far show improvements in ITD sensitivity in some individuals but not in others.

    Application:

    The results may lead to the design of a new hearing aid processing algorithm that attempts to improve ITD sensitivity.

    Funding:

    Internal

  • Objective and Methods:

    Spectral peaks and notches are important cues that normal hearing listeners use to localize sounds in the vertical planes (the front/back and up/down dimensions). This study investigates to what extent cochlear implant (CI) listeners are sensitive to spectral peaks and notches imposed upon a constant-loudness background. 

    Results:

    Listeners could always detect peaks, but not always notches. Increasing the bandwidth beyond two electrodes showed no improvement in thresholds. The high-frequency place was significantly worse than the low and middle places; although, listeners had highly-individual tendencies. Thresholds decreased with an increase in the height of the peak. Thresholds for detecting a change in the frequency of a peak or notch were approximately one electrode. Level roving significantly increased thresholds. Thus, there is currently no indication that CI listeners can perform a "true" profile analysis. Future studies will explore if adding temporal cues or roving the level in equal loudness steps, instead of equal-current steps (as in the present study), is relevant for profile analysis.

    Application:

    Data on the sensitivity to spectral peaks and notches are required to encode spectral localization cues in future CI stimulation strategies. 

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Publications:

    • Goupell, M., Laback, B., Majdak, P., and Baumgartner, W. D. (2008). Current-level discrimination and spectral profile analysis in multi-channel electrical stimulation, J. Acoust. Soc. Am. 124, 3142-57.
    • Goupell, M. J., Laback, B., Majdak, P., and Baumgartner, W-D. (2007). Sensitivity to spectral peaks and notches in cochlear implant listeners, presented at Conference on Implantable Auditory Prostheses (CIAP), Lake Tahoe.
  • The spatially oriented format for acoustics (SOFA) is dedicated to store all kinds of acoustic informations related to a specified geometrical setup. The main task is to describe simple HRTF measurements, but SOFA also aims to provide the functionality to store measurements of something fancy like BRIRs with a 64-channel mic-array in a multi-source excitation situation or directivity measurement of a loudspeaker. The format is intended to be easily extendable, highly portable, and actually the greatest common divider of all publicly available HRTF databases at the moment of writing.

    SOFA defines the structure of data and meta data and stores them in a numerical container. The data description will be a hierarchical description when coming from free-field HRTFs (simple setup) and going to more complex setups like mic-array measurements in reverberant spaces, excited by a loudspeaker array (complex setup). We will use global geometry description (related to the room), and local geometry description (related to the listener/source) without limiting the number of acoustic transmitters and receivers. Room descriptions will be available by linking a CAD file within SOFA. Networking support will be provided as well allowing to remotely access HRTFs and BRIRs from client computers.

    SOFA is being developed by many contributors worldwide. The development is coordinated at ARI by Piotr Majdak.

    Further information:

    www.sofaconventions.org.
  • Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

  • Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

    Method:

    In normal hearing listeners, localization within the sagittal planes is mediated when the pinna (outer ear) evaluates the spectral coloring of incoming waveforms at higher frequencies. Current CI systems do not provide these so-called pinna cues (or spectral cues), because of behind-the-ear microphone placement and the processor's limited analysis-frequency range.

    While these technical limitations are relatively manageable, some fundamental questions arise:

    • What is the minimum number of channels required to encode the pinna cues relevant to vertical plane localization?
    • To what extent can CI listeners learn to localize sound sources using pinna cues that are mapped to tonotopic regions associated with lower characteristic frequencies (according to the position of typically implanted electrodes)?
    • Which modifications of stimulation strategies are required to facilitate the localization of sound sources for CI listeners?

    Application:

    The improvement of sound source localization in the front-back dimension is regarded as an important aspect in daily traffic safety.

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Status:

    Finished in Sept. 2010

    Subprojects:

    • ElecRang: Effects of upper-frequency boundary and spectral warping on speech intelligibility in electrical stimulation
    • SpecSens: Sensitivity to spectral peaks and notches
    • Loca-BtE-CI: Localization with behind-the-ear microphones
    • Loca Methods: Pointer method for localizing sound sources
    • Loca#Channels: Number of channels required for median place localization
    • SpatStrat: Development and evaluation of a spatialization strategy for cochlear implants
    • HRTF-Sim: Numerical simulation of HRTFs
  • Baumgartner et al. (2017a)

    Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.

    Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.

    The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017 PNAS): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.

    Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.

    SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017 EAA-ASA). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in press).

    Funding

    Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

    Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.

    Publications

    • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
    • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
    • Deng, Y., Choi, I., Shinn-Cunningham, B., Baumgartner, R. (2019): Impoverished auditory cues limit engagement of brain networks controlling spatial selective attention, in: Neuroimage 202, 116151. (article)
    • Baumgartner, R., Majdak, P. (2019): Predicting Externalization of Anechoic Sounds, in: Proceedings of ICA 2019. (proceedings)
    • Majdak, P., Baumgartner, R., Jenny, C. (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)
  • Objectives:

    In the context of binaural virtual acoustics, a sound source is positioned in a free-field 3-D space around the listener by filtering it via head-related transfer functions (HRTFs). In a real-time application, numerous HRTFs need to be processed. The long impulse responses of the HRTFs require a high computational power, which is difficult to directly implement on current processors in situations involving more than a few simultaneous sources.

    Technically speaking, an HRTF is a linear time-invariant (LTI) system. An LTI system can be implemented in the time domain by direct convolution or recursive filtering. This approach is computationally inefficient. A computationally efficient approach consists of implementing the system in the frequency domain; however, this approach is not suitable for real-time applications since a very large delay is introduced. A compromise solution of both approaches is provided by a family of segmented-FFT methods, which permits a trade-off between latency and computational complexity. As an alternative, the sub-band method can be applied as a technique to represent linear systems in the time-frequency domain. Recent work has showed that the sub-band method offers an even better tradeoff between latency and computational complexity than segmented-FFT methods. However, the sub-band analysis is still mathematically challenging and its optimum configuration is dependant on the application under consideration.

    Methods:

    TF-VA involves developing and investigating new techniques for configuring the sub-band method by using advanced optimization methods in a functional analysis context. As a result, an optimization technique that minimizes the computational complexity of the sub-band method will be obtained.

    Two approaches will be considered: The first approach designs the time-frequency transform for minimizing the complexity of each HRTF. In the second approach, we will design a unique time-frequency transform, which will be used for a joint implementation of all HRTFs of a listener. This will permit an efficient implementation of interpolation techniques while moving sources spatially in real-time. The results will be evaluated in subjective localization experiments and in terms of localization models.

    Status:

    • Main participator: Damian Marelli (University of Newcastle, Australia)
    • Co-applicants: Peter Balazs, Piotr Majdak
    • Project begin: November 2011
    • Funding: Lise-Meitner-Programm of the Austrian Science Fund (FWF) [M 1230-N13]