• PASS - Psychoakustische Analyse von schienen-verkehrsinduzierten Schallimmissionen

    Das Projekt PASS, welches in Kooperation mit dem IEW der TU Wien und psiacoustic GmbH durchgeführt wird, beschäftigt sich mit der psychoakustischen Bewertung von Lärm. Aufbauend auf den Ergebnissen des Projektes RELSKG werden dabei hohe und niedrige Lärmschutzwände numerisch simuliert mittels der 2.5 dimensionalen Randelemente Methode (2.5 D). Der Vergleich mit Messungen zeigt, dass die Annahme einer inkohärenten Linienquelle, wie sie mit der 2.5 D Methode möglich ist, für die Reproduktion der Messergebnisse erforderlich ist. Zusätzlich werden Schienenstegdämpfer aus Messdaten psychoakustisch bewertet. Die Bewertung erfolgt in zwei Tests mit 40 Probanden. Der erste Test vergleicht die relative Lästigkeit und der zweite die Schwellen für lästiger bzw. weniger lästig. Es ergab sich, dass Güterzüge bei gleichen A-Pegel als weniger lästig als Personenzüge eingestuft werden und dass bei gleichen A-Pegel der Lärm hinter einer Lärmschutzwand als geringfügig lästiger empfunden wird. Das Projekt starte in 2013 und läuft bis Ende 2014.

  • Perception of Interaural Intensity Differences by Cochlear Implant Listeners (IID-CI)


    This project investigated the perception of interaural intensity differences among cochlear implant (CI) listeners in relation to the spectral composition and the temporal structure of the signal.


    The perception thresholds (just noticeable differences, JND) of CI listeners were examined using differently structured signals. The stimuli were applied directly to the clinical signal processing units, while the parameters of the ongoing stimulation were closely monitored.


    JNDs of IIDs in CI listeners ranged from 1.5 - 2.5 dB for a detection level of 80 percent. The type of stimulus seems to bear little relevance on the detection performance, with the exception of one single type of signal - a pulse train with a frequency of 20 Hz. This means that JNDs of CI listeners are only irrelevantly higher than those of normal hearing listeners. CI implantees are sensitive to IIDs, and the JNDs correlate to a difference in arrival angles ranging from 5-10 degrees. Since the JNDs are within the minimal level widths of the transfer of amplitudes by the CI system, the reduction of level width in future systems seems advisable.


    • Laback, B., Pok, S. M., Baumgartner, W. D., Deutsch, W. A., and Schmid, K. (2004). “Sensitivity to interaural level and envelope time differences of two bilateral cochlear implant listeners using clinical sound processors,” Ear and Hearing 25, 5, 488-500.
  • Perception of Interaural Time Differences (ITD)

    Objective and Methods:

    This project cluster includes several studies on the perception of interaural time differences (ITD) in cochlear implant (CI), hearing impaired (HI), and normal hearing (NH) listeners. Studying different groups of listeners allows for identification of the factors that are most important to ITD perception. Furthermore, the comparison between the groups allows for the development of strategies to improve ITD sensitivity in CI and HI listeners.


    • FsGd: Effects of ITD in Ongoing, Onset, and Offset in Cochlear Implant Listeners
    • ITD Sync: Effects of interaural time difference in fine structure and envelope on lateral discrimination in electric hearing
    • ITD Jitter CI: Recovery from binaural adaptation with cochlear implants
    • ITD Jitter NH: Recovery from binaural adaptation in normal hearing
    • ITD Jitter HI: Recovery from binaural adaptation with sensorineural hearing impairment
    • ITD CF: Effect of center frequency and rate on the sensitivity to interaural delay in high-frequency click trains
    • IID-CI: Perception of Interaural Intensity Differences by Cochlear Implant Listeners


  • Perfect Reconstruction Overlap Add Method (PROLA)


    In signal processing, synthesis is important in addition to analysis. This is especially true for the modification of data. For the Short-Time Fourier Transformation, the synthesis is often done using a simple overlap add (OLA), which is the sum of the outputs of the filter. Also, the output is re-weighted with the analysis window, such as occurs when using the phase vocoder. It is often presumed that with standard windows this will give satisfactory results.

    Aside from Gabor frame theory, if the well-known construction of synthesis windows was possible, it would guarantee perfect reconstruction. However, this method is not used often in signal processing algorithms.


    In this project, we will systematically investigate if and for which parameters the respective OLA synthesis with the original window gives good reconstruction. We will compare it to the reconstruction with the dual window, introducing and motivating it as perfect reconstruction overlap add (PROLA). We will show that this method is always preferable to others and that it can be calculated very efficiently.


    This is currently being implemented in STx. There the phase vocoder will have the option to guarantee perfect reconstruction, either with dual or tight windows.


    Department of Mathematics, University of Wisconsin-Eau Claire

  • Phonetics and Phonology of the Viennese Dialect


    As is customary for urban varieties, the varieties of Vienna are predominantly social varieties. Education and social background form the primary factors which define the language behaviour of the speakers.

    The Viennese dialect belongs to the Middle Bavarian dialect group. Around the turn of the century, a sound change arose which monophthongized the diphthongs /aɛ/ and /ɑɔ/ to /æ:/ and /ɒ:/ repectively. This sound change was accomplished around 1950. As a result of the Viennese monophthongization, the palatal constriction location became overloaded. As early as the thirties, Kranzmayer observed what he called the "e-confusion", i.e., people stopped to discern the /e/-vowels, "Segen" (blessing) and "sehen" (to see) became homophones: [se:ŋ].


    5 female and 5 male speakers of the Viennese dialect were asked to name pictures, to read sentences, and to speak spontaneously.


    As a consequence of the Viennese monophthongization and the consecutive overcrowding of the palatal constriction location, speakers of the Viennese dialect developed two strategies. One group, in the sense Kranzmayer observed, neutralized /e/ and /ɛ/ to /e/. This neutralization made room for the new palatal vowel /æ/.

    The other group, however, preserved /e/ and /ɛ/, but sometimes applied the two vowels incorrectly, i.e., produced /ɛ/ instead of /e/ and the other way round. However, since no neutralization took place, the vowel /i/ is shifted to the pre-palatal constriction location. By this shift, room is created on the palatal bar for the new vowel /æ/.

    • Group I, consequently, discerns the following vowels:
    • palatal: /i:, i, e:, e, æ:/
    • velar: /u:, u/
    • uvular: /o:, o, ɔ:, ɔ/
    • pharyngeal: /ɑ:, ɑ, ɒ:/

    Group II discerns the vowels as follows:

    • pre-palatal: /i:, i/
    • palatal: /e:, e, ɛ:, ɛ, æ:/
    • velar: /u:, u/
    • uvular: /o:, o, ɔ:, ɔ/
    • pharyngeal: /ɑ:, ɑ, ɒ:/

    Lip rounding and duration is distinctive for each vowel system.

  • Pitch Versus Timbre


    Pitch and timbre are closely interrelated. Both determine the perception of complex tones. Both pitch and timbre variations characterize realistic signals. Particularly via diphthongs, pitch and timbre changes occur simultaneously and continuously.


    The slow (e.g. 0.5/s) and triangle frequency modulations (range: 1 octave) of a harmonic sound with the fundamental frequency of 220 Hz produces a specific pitch phenomenon. If one of the resolved partials is accentuated by a sharp onset, this partial gives rise to a temporary spectral pitch according to its position on the frequency continuum. At the same time, the pitch movement of the complex tone continues. After a short transition period of approxiamety 100 ms the partial loses its accentuated spectral pitch and is completely integrated into the timbre and pitch movement of the complex sound.


    The purpose of the present pilot study was to explore starting points for the determination and explanation of a new pitch glide transition and pitch ambiguity effect which occurs when a continuous varying pitch percept of a complex tone is interrupted by onset transients of emerging harmonic partials in successive order, followed by momentarily dominating spectral pitches of the corresponding harmonics. Immediately after the appearance of the initial spectral pitch dominance, which is in concurrence to the pitch of the complex tone, the latter is reinstalled by integrating the harmonic into timbre in a smoothly gliding manner.


    PACS: 43.66.Hg; Pitch perception.

  • Pole-Zero Model Estimation for Speech Analysis


    The identification of the parameters of the vocal tract system can be used for speaker identification.


    A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.

    A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.

    We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.


    In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:

    Re-Synthesized Sound

    Original Sound


  • POTION: Perceptual Optimization of Audio Time-Frequency Representations and Coding.

    French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

    Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

    Running period: 2014-2017 (project started on March 1, 2014).


    One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

    1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
    2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

    Working program:

    POTION is structured in three main tasks:

    1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
    2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
    3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

    More information on the project can be found on the POTION web page.


    • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

    Related topics investigated at the ARI:

  • Practical Time Frequency Analysis


    Numerous implementations and algorithms for time frequency analysis can be found in literature or on the internet. Most of them are either not well documented or no longer maintained. P. Soendergaard started to develop the Linear Time Frequency Toolbox for MATLAB. It is the goal of this project to find typical applications of this toolbox in acoustic applications, as well as incorporate successful, not-yet-implemented algorithms in STx.


    The linear time-frequency toolbox is a small open-source Matlab toolbox with functions for working with Gabor frames for finite sequences. It includes 1D Discrete Gabor Transform (sampled STFT) with inverse. It works with full-length windows and short windows. It computes the canonical dual and canonical tight windows.


    These algorithms are used for acoustic applications, like formants, data compression, or de-noising. These implementations are compared to the ones in STx, and will be implemented in this software package if they improve its performance.


    • H. G. Feichtinger et al., NuHAG, Faculty of Mathematics, University of Vienna
    • B. Torrèsani, Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • P. Soendergaard, Department of Mathematics, Technical University of Denmark
  • Principal Component Analysis (PCA) for the Estimation of the Acoustic Far-Field Level


    If measurements are possible only at the hull of a machine, a tool is needed to separate the dominating near-field components from the far-field components. This, in turn, allows the far-field levels to be estimated. The separation is often not possible using spectral methods, because both components have nearly the same frequency. Using a limited number of microphones, a modal separation is also impossible. Instead of a modal analysis, a principal component analysis is applied.


    The narrow-band Fourier transform method is used, and a separate analysis is conducted for each frequency. The cross-power matrix spanning all microphone positions is used. The components are then calculated using the PCA. As long as the modes at the microphone positions have different relative values, PCA can be used to separate them. In an initial test, the far field is observed and the transfer function for every component from the near field to the far field is estimated. These transfer functions are assumed to be constant in time. They are used for the estimation of the overall far-field level.


    Observation of the far-field level of machines.

  • QWeight

    Reweighting of Binaural Cues: Generalizability and Applications in Cochlear Implant Listening

    Normal-hearing (NH) listeners use two binaural cues, the interaural time difference (ITD) and the interaural level difference (ILD), for sound localization in the horizontal plane. They apply frequency-dependent weights when combining them to determine the perceived azimuth of a sound source. Cochlear implant (CI) listeners, however, rely almost entirely on ILDs. This is partly due to the properties of current envelope-based CI-systems, which do not explicitly encode carrier ITDs. However, even if they are artificially conveyed via a research system, CI listeners perform worse on average than NH listeners. Since current CI-systems do not reliably convey ITD information, CI listeners might learn to ignore ITDs and focus on ILDs instead. A recent study in our lab provided first evidence that such reweighting of binaural cues is possible in NH listeners.

    This project aims to further investigate the phenomenon: First, we will test whether a changed ITD/ILD weighting will generalize to different frequency regions. Second, the effect of ITD/ILD reweighting on spatial release from speech-on-speech masking will be investigated, as listeners benefit particularly from ITDs in such tasks. And third, we will test, whether CI listeners can also be trained to weight ITDs more strongly and whether that translates to an increase in ITD sensitivity. Additionally, we will explore and evaluate different training methods to induce ITD/ILD reweighting.

    The results are expected to shed further light on the plasticity of the binaural auditory system in acoustic and electric hearing.

    Start:October 2018

    Duration:3 years

    Funding:uni:docs fellowship program for doctoral candidates of the University of Vienna

  • RAARA - Residential Area Augmented Reality Acoustics


    Wir danken für die Förderung durch die Forschungsförderungsgesllschaft (FFG), Projektnummer 873588. Lärm bedeutet Ärger. Er wird neben Verkehr und Gewerbe vor allem von Heiz- oder Kühlgeräten emittiert: Luftwärmepumpen, Rückkühlern und Lüftern. Um die Schallimmissionen auf die Bevölkerung im urbanen Gebiet zu minimieren, werden im Projekt Methoden entwickelt, die einen einfachen, intuitiven und zugleich akkuraten Umgang mit Schallemissionen und deren Minderung ermöglichen.



    Ziel ist, die Lärmquellen vor deren Installation VOR ORT in realer Umgebung mittels Augmented Reality virtuell zu platzieren und die Schallemissionen visuell farblich darzustellen und hörbar zu machen. Hindernisse oder Schalldämmmaßnahmen, wie Wände, Zäune und Mauern werden automatisiert erkannt oder können virtuell hinzugefügt werden. Um diese Ziele zu erreichen, sind umfassende Methodenentwicklungen zur effizienten akustischen Berechnung erforderlich: frequenzabhängiges und zeitabhängiges Verhalten, Absorption und Reflexion. Dieser einzigartige Ansatz erleichtert die Planung von erneuerbaren Heiz- und Kühlgeräten, erhöht die Akzeptanz und damit den Anteil erneuerbarer Energien und senkt den Lärmpegel in Städten.






  • Recovery from Binaural Adaptation in Cochlear Implant Listeners (ITD Jitter CI)


    The sensitivity of normal hearing listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). Cochlear implant (CI) listeners show a similar limitation in ITD sensitivity with respect to the rate of unmodulated pulse trains containing ITD. Unfortunately, such high rates are needed to appropriately sample the modulation information of the acoustic signal. This study tests the ideas that (1) similar "binaural adaptation" mechanisms are limiting the performance in both subject groups, (2) the effect is related to the periodicity of pulse trains, and (3) introducing jitter (randomness) into the pulse timing causes a recovery from binaural adaptation and thus improves ITD sensitivity at higher pulse rates.

    Method and Results:

    These ideas have been studied by testing the ITD sensitivity of five CI listeners. The parameters' pulse rate, amount of jitter (where the minimum represents the periodic condition), and ITD were all varied. We showed that introducing binaurally synchronized jitter in the stimulation timing causes large improvements in ITD sensitivity at higher pulse rates (? 800 pps). Our experimental results demonstrate that a purely temporal trigger can cause recovery from binaural adaptation.


    Applying binaurally jittered in stimulation strategies may improve several aspects of binaural hearing in bilateral recipients of CIs, including localization of sound sources and speech segregation in noise.




    • Laback, B., and Majdak, P. (2007). Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, Proc Natl Acad Sci USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008). Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc Natl Acad Sci USA 12, 105, 32.
    • Laback, B., and Majdak, P. (2007). Binaural stimulation in neural auditory prostheses or hearing aids, provisional US und EP patent application (submitted 20.06.07).
  • Recovery from Binaural Adaptation in Normal Hearing Listeners (ITD Jitter NH)


    The sensitivity of normal hearing (NH) listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In another study (Laback and Majdak, 2008), it was hypothesized that introducing binaural jitter may improve ITD sensitivity in bilateral cochlear implant (CI) listeners by avoiding periodicity. Indeed, the results showed large improvements at high rates (≥ 800 pps). This was interpreted as an indication for a recovery from binaural adaptation. 

    In this study, we further investigated this effect using NH subjects. We attempted to understand the underlying mechanisms by applying a well-established model of peripheral auditory processing. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with a pulse rate of 600 pps were used at a nominal pulse rate of 600 pulses per second (pps). It was found that randomly jittering the timing of the pulses significantly increases detectability of the ITD. A second experiment was performed to observe the effect of place and rate for pulse trains. It was shown that ITD sensitivity for jittered pulse trains at 1200 pps were significantly higher than periodic pulse trains at 600 pps. Therefore, with the addition of jitter, listeners were not solely benefiting from the longest interpulse intervals and instances of reduced rate. A third experiment, using a 900 pps pulse train, confirmed the improvement in ITD sensitivity. This occurred even when random amplitude modulation, a side-effect in the case of large amounts of jitter, is ruled out. A model of peripheral auditory processing up to the brain stem (Nucleus Cochlearis) has been applied to study the mechanisms underlying the improvements in ITD sensitivity. It was found that the irregular timing of the jittered pulses increases the synchrony of firing of the cochlear nucleus. These results suggest that a recovery from binaural adaptation activated by a temporal irregularity is possibly occurring at the level of the cochlear nucleus.


    Together with the results of Laback and Majdak (2008) on the effect of binaural jitter in CI listeners, these results suggest that the binaural adaptation effect first observed by Hafter and Dye (1983) is related to the synchrony of neural firings across auditory nerve fibers. The nerve fibers, in turn, innervate cochlear nucleus cells. At higher rates, periodic pulse trains result in little synchrony of the response to the ongoing signal. Jittering the pulse timing increases the probability of synchronous firing across AN fibers at certain instances of time. Further studies are required to determine if other aspects of binaural adaptation can also be attributed to this explanation. 




    • Goupell, M. J., Laback, B., Majdak, P. (2009): Enhancing sensitivity to interaural time differences at high modulation rates by introducing temporal jitter, in: J. Acoust. Soc. Am. 126, 2511-2521.
    • Laback, B., and Majdak, P. (2007): Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, in: Proc. Natl. Acad. Sci. USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008): Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc. Natl. Acad. Sci. USA 12, 105, 32.
  • Recovery from Binaural Adaptation in Sensorineural Hearing Impairment (ITD Jitter HI)


    Normal hearing (NH) listener sensitivity to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In other studies (Laback and Majdak, 2008; Goupell et al., 2008), it has been shown that introducing binaural jitter improves ITD sensitivity at higher rates in bilateral cochlear implant (CI) listeners as well as in NH listeners. The results were interpreted in terms of a recovery from binaural adaptation. Sensorineural hearing impairment often results in reduced ITD sensitivity (e.g. Hawkins and Wightman, 1980). The present study investigates if a similar recovery from binaural adaptation, and thus an improvement in ITD sensitivity, can be achieved in hearing impaired listeners. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with pulse rates of 400 and 600 pulses per second (pps) are used. Different amounts of jitter (the minimum representing the periodic condition) and different ITDs are tested. Listeners with a moderate cochlear hearing loss are selected. Additional stimuli tested are bandpass-filtered noise bands at 4 kHz and low-frequency stimuli at 500 Hz (sinusoids, SAMs, noise bands  and jittered pulse trains). The levels of the stimuli are adjusted in pretests to achieve a centered auditory image at a comfortable loudness.

    Data collected so far show improvements in ITD sensitivity in some individuals but not in others.


    The results may lead to the design of a new hearing aid processing algorithm that attempts to improve ITD sensitivity.



  • Regular and Irregular Gabor Multiplier with Applications in Psychoacoustic Masking

    This project consists of three subprojects:

    1.1 Frame & Gabor Multiplier:

    Recently Gabor Muiltipliers have been used to implement time-variant filtering as Gabor Filters.  This idea can be further generalized. To investigate the basic properties of such operators the concept of abstract, i.e. unstructured, frames is used. Such multipliers are operators, where a certain fixed mask, a so-called symbol, is applied to the coefficients of frame analysis , whereafter synthesis is done. The properties that can be found for this case can than be used for all kind of frames, for example regular and irregular Gabor frames, wavelet frames or auditory filterbanks.
    The basic definition of a frame multiplier follows: 
    As special case of such multipliers such operators for irregular Gabor system will be investigated and implemented. This corresponds to a irregular sampled Short-Time-Fourier-Transformation. As application  an STFT correpsonding to the bark scale can be examined.
    This mathematical and basic research-oriented project is important for many other projects like time-frequency-masking or system-identification.


    • O. Christensen, An Introduction To Frames And Riesz Bases, Birkhäuser Boston (2003)
    • M. Dörfler, Gabor Analysis for a Class of Signals called Music, Dissertation Univ. Wien (2002)
    • R.J. Duffin, A.C. Schaeffer, A Class of nonharmonic Fourier series, Trans.Amer.Math.Soc., vol.72, pp. 341-366 (1952)
    • H. G. Feichtinger, K. Nowak, A First Survey of Gabor Multipliers, in H. G. Feichtinger, T. Strohmer



  • RELSKG: Development of a computational method for noise barriers with a complex geometry


    Standard noise mapping software use geometrical approaches to determine insertion loss for a noise barrier. These methods are not well suited for evaluating complex geometries e.g. curved noise barriers or noise barriers with multiple refracting edges. Here, we aim at deriving frequency and source- as well as receiver-position dependent adjustments using the boundary element method. Further, the effect of absorbing layers will be investigated as a function of the geometry. Results will be incorporated into a standard noise mapping software.


    The cross-sections of different geometries are first parameterized and discretized and then evaluated using two-dimensional boundary element simulations. The BEM code was developed at our institute. Different parameter sets are evaluated in order to derive the adjustments for the specific geometries compared to a straight noise barrier. To make the simulations more realistic, a grassland impedance model is used instead of a fully reflecting half plane. Simulations will also be evaluated using measurements from actual noise barriers.

    Wirkung einer T-Wand bei 800 Hz

    Project partners:

    • TAS Schreiner (measurements)
    • Soundplan (implementation in sound mapping software)


    This project is funded from the VIF2011 call of the FFG (BMVIT, ASFINAG, ÖBB)

  • Sensitivity to Spectral Peaks and Notches (SpecSens)

    Objective and Methods:

    Spectral peaks and notches are important cues that normal hearing listeners use to localize sounds in the vertical planes (the front/back and up/down dimensions). This study investigates to what extent cochlear implant (CI) listeners are sensitive to spectral peaks and notches imposed upon a constant-loudness background. 


    Listeners could always detect peaks, but not always notches. Increasing the bandwidth beyond two electrodes showed no improvement in thresholds. The high-frequency place was significantly worse than the low and middle places; although, listeners had highly-individual tendencies. Thresholds decreased with an increase in the height of the peak. Thresholds for detecting a change in the frequency of a peak or notch were approximately one electrode. Level roving significantly increased thresholds. Thus, there is currently no indication that CI listeners can perform a "true" profile analysis. Future studies will explore if adding temporal cues or roving the level in equal loudness steps, instead of equal-current steps (as in the present study), is relevant for profile analysis.


    Data on the sensitivity to spectral peaks and notches are required to encode spectral localization cues in future CI stimulation strategies. 


    FWF (Austrian Science Fund): Project #P18401-B15


    • Goupell, M., Laback, B., Majdak, P., and Baumgartner, W. D. (2008). Current-level discrimination and spectral profile analysis in multi-channel electrical stimulation, J. Acoust. Soc. Am. 124, 3142-57.
    • Goupell, M. J., Laback, B., Majdak, P., and Baumgartner, W-D. (2007). Sensitivity to spectral peaks and notches in cochlear implant listeners, presented at Conference on Implantable Auditory Prostheses (CIAP), Lake Tahoe.
  • SOFA: Spatially Oriented Format for Acoustics

    The spatially oriented format for acoustics (SOFA) is dedicated to store all kinds of acoustic informations related to a specified geometrical setup. The main task is to describe simple HRTF measurements, but SOFA also aims to provide the functionality to store measurements of something fancy like BRIRs with a 64-channel mic-array in a multi-source excitation situation or directivity measurement of a loudspeaker. The format is intended to be easily extendable, highly portable, and actually the greatest common divider of all publicly available HRTF databases at the moment of writing.

    SOFA defines the structure of data and meta data and stores them in a numerical container. The data description will be a hierarchical description when coming from free-field HRTFs (simple setup) and going to more complex setups like mic-array measurements in reverberant spaces, excited by a loudspeaker array (complex setup). We will use global geometry description (related to the room), and local geometry description (related to the listener/source) without limiting the number of acoustic transmitters and receivers. Room descriptions will be available by linking a CAD file within SOFA. Networking support will be provided as well allowing to remotely access HRTFs and BRIRs from client computers.

    SOFA is being developed by many contributors worldwide. The development is coordinated at ARI by Piotr Majdak.

    Further information:
  • softpinna: Non-Rigid Registration for the Calculation of HRTFs

    Millions of people use headphones everyday for listening to music, for watching movies, or when communicating with others. Nevertheless, the sounds presented via headphones are usually perceived inside the head and not at their actual natural spatial position. This limited perception is inherent and results in unrealistic listening situations.

    When listening to a sound without headphones, the acoustic information of the sound source is modified by our head and our torso, an effect described by the head-related transfer functions (HRTFs). The shape of our ears contributes to that modification by filtering the sound depending on the source direction. But the ear is very listener-specific – its individuality is similar to that of a finger print, and thus HRTFs are very listener-specific. When listening to sounds via headphones, the listener-specific filtering is usually not available. One of the main reasons is the difficulty in the process of acquisition of the ear shape of a person, and thus in calculation of listener-specific HRTFs.

    Thus, in softpinna, we will work on the development of new methods for a better acquisition of listener-specific ear shapes of a person. Specifically, we will investigate and improve the so-called "non-rigid registration" (NRR) algorithms, applied on 3-D ear geometries calculated from 2-D photos of a person’s ears. The improvement in the quality of the 3-D ear geometries acquisition will allow computer programs to accurately calculate the listener-specific HRTFs, thus enabling the incorporation of listener-specific HRTFs in future headphone systems providing realistic presentation of spatial sounds. The new ear-shape acquisition method will vastly reduce the technical requirements for accurate calculation of listener-specific HRTFs.

    This project is done in collaboration with Dreamwaves GmbH. It is supported by the Bridge Programme of the FFG