• LabEquip: Equipment and Facilities in the Lab

    The aim of this project is to maintain the experimental facilities in our institute's laboratory.

    The lab consists of four testing places:

    • GREEN and BLUE: Two sound-booths (IAC-1202A) are used for audio recording and psychoacoustic testing performed with headphones. Each of the booths is controlled from outside by a computer. Two bidirectional audio channels with sampling rates up to 192 kHz are available.
    • RED: A visually-separated corner can be used for experiments with cochlear implant listeners. A computer controls the experimental procedure using a bilateral, direct-electric stimulation.
    • YELLOW: A semi-anechoic room, with a size of 6 x 6 x 3 m, can be used for acoustic tests and measurements in a nearly-free field. As many as 24 bidirectional audio channels, virtual environments generated by a head mounted display, and audio and video surveillance are available for projects like HRTF measurement, localization tests or acoustic holography.

    The rooms are not only used for measurements and experiments, also the Acoustics Phonetics group is doing speech recordings for dialect research and speaker identification, for example for survey reports. The facilities are also used to detect psychoacoustical validations.

    During the breaks in experiments, the subjects can use an Internet terminal or relax on a couch while sipping hot coffee...

  • LARS


    Rumble strips are (typically periodic) grooves place at the side of the road. When a vehicle passes over a rumble strip the noise and vibration in the car should alert the driver of the imminent danger of running off the road. Thus, rumble strips have been shown to have a positive effect on traffic safety. Unfortunately, the use of rumble strips in the close vicinity of populated areas is problematic due to the increased noise burden.


    The aim of the project LARS (LärmArme RumpelStreifen or low noise rumble strips) was to find rumble strip designs that cause less noise in the environment without significantly affecting the alerting effect inside the vehicle. For this purpose, a number of conventional designs as well as three alternative concepts were investigated: conical grooves to guide the noise under the car, pseudo-random groove spacing to reduce tonality and thus annoyance, as well as sinusoidal depth profiles which should produce mostly vibration and only little noise and which are already used in practice.


    Two test tracks were established covering a range of different milling patterns in order to measure the effects of rumble strips for a car and a commercial vehicle running over them. Acoustic measurements using microphones and a head-and-torso-simulator were done inside the vehicle as well as in the surroundings of the track. Furthermore, the vibration of the steering wheel and the driver seat were measured. Using the acoustics measurements, synthetic rumble strip noises were produced, in order to get a wider range of possible rumble strip designs than by pure measurements.

    Perception tests with 16 listeners were performed where the annoyance of the immissions as well as the urgency and reaction times for the sounds generated in the interior were determined also using the synthetic stimuli.

    LARS was funded by the FFG (project 840515) and the ASFINAG. The project was done in cooperation with the Research Center of Railway Engineering, Traffic Economics and Ropeways, Institute of Transportation, Vienna University of Technology, and ABF Strassensanierungs GmbH.

  • Lateral Variants of Bosnian Migrants Living in Vienna


    The aim of this study is to investigate the phonetics of second language acquisition and first language attrition, based on the acoustic and articulatory lateral realizations of Bosnian migrants living in Vienna. Bosnian has two lateral phonemes (a palatalized and an alveolar/velarized one), whereas Standard Austrian German features only one lateral phoneme (an alveolar lateral). In the Viennese dialect however, this phoneme also has a velarized variant.

    This phonetic investigation will be conducted with respect to the influence of language contact between Bosnian and SAG, and Bosnian and the Viennese dialect, as well as concerning the influence of gender and identity construction.


    The recordings will be conducted with female and male Bosnian speakers, aged between 20 and 35 years at the time of emigration, who came to Vienna during the Bosnian war 1992-1995. Additionally, control groups of monolingual L1 speakers of Bosnian, SAG and Vd will be recorded. All recordings will include reading tasks in order to elicit controlled speech, as well as spontaneous speech in the form of biographical interviews. The analyses will comprise quantitative and qualitative aspects. Quantitatively, the acoustic parameters formant frequencies (especially F2 and F3), duration and intensity of the laterals and their phonetic surrounding will be analyzed. Additionally, articulatory analyses will be performed using EPG and UTI data. Qualitatively, biographical information, language attitudes and social networks will be analysed in order to obtain information about speaker-specific or group-specific characteristics.


    The results of this study are relevant to understanding the processes of sound-realization and sound-change in the domains of language contact (phonetic processes in second language acquisition and first language attrition), sociolinguistics, and the sociology of identity construction

  • Measurement of Head-Related Transfer Functions (HRTFs)


    Head-related transfer functions (HRTFs) describe sound transmission from the free field to a place in the ear canal in terms of linear time-invariant systems. They contain spectral and temporal features that vary according to the sound direction. Differences among subjects requires the measuring of subjects' individual HRTFs for studies on localization in virtual environments. In this project, a system for HRTF measurement was developed and installed in the semi-anechoic room at the Austrian Academy of Sciences.


    Measurement of an HRTF was considered a system identification of the electro-acoustic chain: sound source-room-HRTF-microphone. The sounds in the ear canals were captured using in-ear microphones. The direction of the sound source was varied horizontally by rotating the subject on a turntable, and vertically by accessing one of the 22 loudspeakers positioned in the median plane. An optimized form of system identification with sweeps, the multiple exponential sweep method (MESM), was used for the measurement of transfer functions with satisfactory signal-to-noise ratios occurring within a reasonable amount of time. Subjects' positions were tracked during the measurement to ensure sufficient measurement accuracy. Measurement of headphone transfer functions was included in the HRTF measurement procedure. This allows equalization of headphone influence during the presentation of virtual stimuli.


    Multi-channel audio equipment has been installed in the semi-anechoic room, giving access to recording and stimuli presentation via 24 channels simultaneously.

    The multiple exponential sweep method was developed, allowing fast transfer function measurement of weakly non-linear time invariant systems for multiple sources.

    The measurement procedure was developed and a database of HRTFs was created. Until now, HRTFs of over 200 subjects have been published, see The HRTFs can be used to create virtual stimuli and present them binaurally via headphones.

    To virtually position sounds in space, the HRTFs are used for filtering free-field sounds. This results in virtual acoustic stimuli (VAS). To create VAS and present them via headphones, applications called Virtual Sound Positioning (VSP) and Loca (Part of our ExpSuite Software Project) have been implemented. It allows virtual sound positioning in a free-field environment using both stationary and moving sound sources

  • MissiSIPI: Towards Improving Selective Hearing in Cochlear Implant Listeners

    Selective hearing refers to the ability of the human auditory system to selectively attend to a desired speaker while ignoring undesired, concurrent speakers. This is often referred to as the cocktail-party problem. In normal hearing, selective hearing is remarkably powerful. However, in so-called electric hearing, i.e., hearing with cochlear implants (CIs), selective hearing is severely degraded, close to not present at all. CIs are commonly used for treatment of severe-to-profound hearing loss or deafness because they provide good speech understanding in quiet. The reasons for the deficits in selective hearing are mainly twofold. First, they arise from structural limitations of current CI electrode designs which severely limit the spectral resolution. Second, they arise from a lack of salient timing cues, most importantly interaural time difference (ITD) and temporal pitch. The second limitation is assumed to be partly “software”-sided and conquerable with perception-driven signal processing. Yet, success achieved so far is at best moderate.

    A recently proposed approach to provide precise ITD and temporal-pitch cues in addition to speech understanding is to insert extra pulses with short inter-pulse intervals (so-called SIPI pulses) into periodic high-rate pulse trains. Results gathered so far in our previous project ITD PsyPhy in single-electrode configurations are encouraging in that both ITD and temporal-pitch sensitivity improved when SIPI pulses were inserted at the signals’ temporal-envelope peaks. Building on those results, this project aims to answer the most urgent research questions towards determining whether the SIPI approach improves selective hearing in CI listeners: Does the SIPI benefit translate into multi-electrode configurations? Does the multi-electrode SIPI approach harm speech understanding? Does the multi-electrode SIPI approach improve speech-in-speech understanding?

    Psychophysical experiments with CI listeners are planned to examine the research questions. To ensure high temporal precision and stimulus control, clinical CI signal processors will be bypassed by using a laboratory stimulation system directly connecting the CIs with a laboratory computer. The results are expected to shed light on parts of both electric and acoustic hearing that are still not fully understood to date, such as the role and the potential of temporal cues in selective hearing.

    References from our Lab:

    Duration: May 2020 - April 2022

    Funding: DOC Fellowship Program of the Austrian Academy of Sciences (A-25606)

    PI: Martin Lindenbeck

    Supervisors: Bernhard Laback and Ulrich Ansorge (University of Vienna)

    See also:

  • Optimal Gaussian Mixture Model (GMM) Initialization for Speaker Modeling


    The modeling step in speaker detection has an enormous influence on the classification task, because the quality of the model depends on the parameters chosen in this step. False classifications, false identifications, and false verifications can result from malformed speaker models. The initial model parameters have an influence on the final determined parameters of the speaker models. To obtain optimized speaker models, different initialization methods are explored.


    Speaker models are represented as Gaussian Mixture Models (GMMs). These models are mixtures of multivariate distributions that are parameterized by the means and the co-variance matrices of the distributions and the mixture weights. The parameters are estimated by the expectation maximization algorithm (EM algorithm) which maximizes the likelihood in the model. Initial model parameters have to be selected for this algorithm. Different initial parameters can lead to a convergence of the algorithm in local maximums. The effect of different initialization methods on the identification rate is analyzed.


    Optimized speaker models reflect the speech behavior of the speakers in an optimal way. The inter-speaker variability is maximized while the intra-speaker variability is minimized by avoidance of malformed speaker models. The usage of optimal initialization methods improves the robustness and the reliability of automatic speaker identification and verification systems.

  • Orthobem: Simulation of Vibrations in Tunnels


    Methods to predict the propagation of vibrations in soil are relatively undeveloped. Reasons for this include the complexity of the wave propagation in soil and the insufficient knowledge of material parameters. During this project a method was developed to simulate the propagation of vibrations that are caused by a load at the base of a tunnel.


    When dealing with the model of a tunnel in a semi-infinite domain like soil, the boundary element method (BEM) seems to be an appropriate tool. Unfortunately it cannot be applied directly to layered orthotropic media, because of the lack of a closed form of the Greens function, which is essential for BEM. But by transforming the whole system into the Fourier domain with respect to space and time, it is possible to numerically construct an approximation for this function on a predefined grid. With this approximation the boundary integral equation, that describes the propagation of waves caused by a vibrating load at the base of a tunnel can be solved.


    Models that can help to predict the propagation of vibrations inside soil layers are of great interest in earthquake sciences or when constructing railway lines and tunnels.

  • PAAB


    Railway vehicles passing through tight curves can produce a high pitched noise called curve squeal. Curve squeal is a very salient type of noise located in the high frequency range that can range between a tonal narrow band and a wide band noise. The reason for the tonal noise is lateral creepage on the top of the rail, which excites wheel vibration at frequencies corresponding to their modes. Wide band noise, however, is caused by wheel flanges touching the rail.


    The project PAAB aims at investigating the effect on the perceived annoyance of such noises using in a perception test. Using the resulting perceptual characterization of curve squeal should aid in more adequately considering this type of noise in noise mapping.


    Based on previous conventional large-scale emission measurements as well as new measurements at immission distances using a head-and-torso-simulator representative samples for curve squeal will be derived and used in a perception test. This will also be aided by using synthetic well defined curve squeal noise.

    PAAB is funded by the FFG (project 860523) and the Austrian Federal Railways (ÖBB). The project is done in cooperation with the Research Center of Railway Engineering, Traffic Economics and Ropeways, Institute of Transportation, Vienna University of Technololgy (project leader), Kirisits Engineering Consultants, and psiacoustic Umweltforschung und Engineering GmbH.



  • Practical Time Frequency Analysis


    Numerous implementations and algorithms for time frequency analysis can be found in literature or on the internet. Most of them are either not well documented or no longer maintained. P. Soendergaard started to develop the Linear Time Frequency Toolbox for MATLAB. It is the goal of this project to find typical applications of this toolbox in acoustic applications, as well as incorporate successful, not-yet-implemented algorithms in STx.


    The linear time-frequency toolbox is a small open-source Matlab toolbox with functions for working with Gabor frames for finite sequences. It includes 1D Discrete Gabor Transform (sampled STFT) with inverse. It works with full-length windows and short windows. It computes the canonical dual and canonical tight windows.


    These algorithms are used for acoustic applications, like formants, data compression, or de-noising. These implementations are compared to the ones in STx, and will be implemented in this software package if they improve its performance.


    • H. G. Feichtinger et al., NuHAG, Faculty of Mathematics, University of Vienna
    • B. Torrèsani, Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • P. Soendergaard, Department of Mathematics, Technical University of Denmark
  • QWeight

    Reweighting of Binaural Cues: Generalizability and Applications in Cochlear Implant Listening

    Normal-hearing (NH) listeners use two binaural cues, the interaural time difference (ITD) and the interaural level difference (ILD), for sound localization in the horizontal plane. They apply frequency-dependent weights when combining them to determine the perceived azimuth of a sound source. Cochlear implant (CI) listeners, however, rely almost entirely on ILDs. This is partly due to the properties of current envelope-based CI-systems, which do not explicitly encode carrier ITDs. However, even if they are artificially conveyed via a research system, CI listeners perform worse on average than NH listeners. Since current CI-systems do not reliably convey ITD information, CI listeners might learn to ignore ITDs and focus on ILDs instead. A recent study in our lab provided first evidence that such reweighting of binaural cues is possible in NH listeners.

    This project aims to further investigate the phenomenon: First, we will test whether a changed ITD/ILD weighting will generalize to different frequency regions. Second, the effect of ITD/ILD reweighting on spatial release from speech-on-speech masking will be investigated, as listeners benefit particularly from ITDs in such tasks. And third, we will test, whether CI listeners can also be trained to weight ITDs more strongly and whether that translates to an increase in ITD sensitivity. Additionally, we will explore and evaluate different training methods to induce ITD/ILD reweighting.

    The results are expected to shed further light on the plasticity of the binaural auditory system in acoustic and electric hearing.

    Start:October 2018

    Duration:3 years

    Funding:uni:docs fellowship program for doctoral candidates of the University of Vienna

  • RAARA - Residential Area Augmented Reality Acoustics


    We thank the Austrian Science Fund (FFG) for funding this project, grant number 873588. Principal Investigator is the AIT. Noise means trouble. In addition to traffic and industry, it is mainly emitted by heating or cooling appliances: Air heat pumps, recoolers and fans. In order to minimize noise immissions to the population in urban areas, the project is developing methods that enable simple, intuitive and at the same time accurate handling of noise emissions and their reduction.



    The aim is to virtually place the noise sources in a real environment VOR ORT using augmented reality before they are installed and to visually display the sound emissions in colour and make them audible. Obstacles or soundproofing measures such as walls, fences and walls are detected automatically or can be added virtually. In order to achieve these goals, comprehensive method developments for efficient acoustic calculation are required: frequency-dependent and time-dependent behaviour, absorption and reflection. This unique approach facilitates the planning of renewable heating and cooling appliances, increases the acceptance and thus the share of renewable energies and lowers the noise level in cities.






  • SOFA: Spatially Oriented Format for Acoustics

    The spatially oriented format for acoustics (SOFA) is dedicated to store all kinds of acoustic informations related to a specified geometrical setup. The main task is to describe simple HRTF measurements, but SOFA also aims to provide the functionality to store measurements of something fancy like BRIRs with a 64-channel mic-array in a multi-source excitation situation or directivity measurement of a loudspeaker. The format is intended to be easily extendable, highly portable, and actually the greatest common divider of all publicly available HRTF databases at the moment of writing.

    SOFA defines the structure of data and meta data and stores them in a numerical container. The data description will be a hierarchical description when coming from free-field HRTFs (simple setup) and going to more complex setups like mic-array measurements in reverberant spaces, excited by a loudspeaker array (complex setup). We will use global geometry description (related to the room), and local geometry description (related to the listener/source) without limiting the number of acoustic transmitters and receivers. Room descriptions will be available by linking a CAD file within SOFA. Networking support will be provided as well allowing to remotely access HRTFs and BRIRs from client computers.

    SOFA is being developed by many contributors worldwide. The development is coordinated at ARI by Piotr Majdak.

    Further information:
  • softpinna: Non-Rigid Registration for the Calculation of HRTFs

    Millions of people use headphones everyday for listening to music, for watching movies, or when communicating with others. Nevertheless, the sounds presented via headphones are usually perceived inside the head and not at their actual natural spatial position. This limited perception is inherent and results in unrealistic listening situations.

    When listening to a sound without headphones, the acoustic information of the sound source is modified by our head and our torso, an effect described by the head-related transfer functions (HRTFs). The shape of our ears contributes to that modification by filtering the sound depending on the source direction. But the ear is very listener-specific – its individuality is similar to that of a finger print, and thus HRTFs are very listener-specific. When listening to sounds via headphones, the listener-specific filtering is usually not available. One of the main reasons is the difficulty in the process of acquisition of the ear shape of a person, and thus in calculation of listener-specific HRTFs.

    Thus, in softpinna, we will work on the development of new methods for a better acquisition of listener-specific ear shapes of a person. Specifically, we will investigate and improve the so-called "non-rigid registration" (NRR) algorithms, applied on 3-D ear geometries calculated from 2-D photos of a person’s ears. The improvement in the quality of the 3-D ear geometries acquisition will allow computer programs to accurately calculate the listener-specific HRTFs, thus enabling the incorporation of listener-specific HRTFs in future headphone systems providing realistic presentation of spatial sounds. The new ear-shape acquisition method will vastly reduce the technical requirements for accurate calculation of listener-specific HRTFs.

    This project is done in collaboration with Dreamwaves GmbH. It is supported by the Bridge Programme of the FFG

  • Soziolekte in Wien - die mittelbairischen Varietäten

  • Stochastic Transformation Methods (Acoustics and Vibration)


    In the past a FWF project dealing with the basics of Stochastic Transformation Methods was executed at the ARI. Explicitly the Karhunen Loeve Expansion and the Transformation of a polynomial Chaos were applied in the wave number domain. The procedure is based on the assumption of Gaussian distributed variables. This assumption shall be generalized to arbitrary random variables.


    The assumption of a wave number domain limits the model to a horizontally layered half space. This limitation shall be overcome by Wavelets kernels in the transformation instead of Fourier kernels. The aim is the possibility to calculated one sided statistical distributions for the physical parameters and arbitrary boundaries with the new method.

  • VarDiÖ: Variation and Change of dialect varieties in Austria (in apparent and real time)

    Project Part 02 of the special research area German in Austria. Variation - Contact - Perception funded by FWF (FWF6002) in cooperation with the University of Salzburg

    Principal Investigators: Stephan Elspaß, Hannes Scheutz, Sylvia Moosmüller

    Start of the project: 1st of January 2016

    Project description:

    The diversity and dynamics of the various dialects in Austria are the topic of this project. Based on a new survey, different research questions will be addressed in the coming years, such as: What are the differences and changes (e.g. through processes of convergence and divergence) that can be observed within and between the Austrian dialect regions? What are the alterations in dialect change between urban and rural areas? Are there noticeable generational and gender differences with regard to dialect change? What can a comprehensive comparison of ‘real-time’ and ‘apparent-time’ analyses contribute to a general theory of language change?

    To answer these questions, speech samples from a total of 160 dialect speakers, balanced for age and gender, are collected and analysed within the first four years at 40 locations in Austria. Furthermore, samples from selected speakers will be recorded and valuated under laboratory conditions to determine phonetic peculiarities as precisely as possible. In the second survey phase complementary recordings are carried out at another 100 locations in Austria in order to analyse differences and changes between the dialect landscapes in more detail. State-of-the-art dialectometric methods will be used to arrive at probabilistic statements regarding dialect variation and change in Austria. The analyses will include all linguistic levels from phonetics to syntax and lexis. A documentation of these data will be carried out on the first visual and ‘talking’ dialect atlas of Austria.

    Project page of the project partners in Salzburg


  • Vibrations in Random Layers


    One of the biggest problems encountered when building numerical models for layers is the lack of exact deterministic material parameters. Therefore, stochastic models should be use. However, these models have the general drawback of overusing computer resources. This project developed a stochastic model with the ability to use a shear modulus in conjunction with a special iteration scheme allowing efficient implementation.


    With the Karhunen Loeve Expansion (KLE), it is possible to split the stochastic shear modulus, and therefore the whole system, into a deterministic and a stochastic part. These parts can then be transformed into a linear system of equations using finite elements and Chaos Polynomial Decomposition. Combining the KLE and the Fourier Transformation in combination with Plancherel's theorem enables decoupling of the deterministic part into smaller subsystems. An iteration scheme was developed which narrows the application of "costly" routines to only these smaller deterministic subsystems, instead of the whole higher dimensional (up to a dimension of 10,000) system matrix.


    As concerns about vibrations produced by machinery and traffic have increased in past years, models that can predict vibrations in soil became more important. However, since material parameters for soil layers cannot be measured exactly in practice, it is reasonable to use stochastic models.

  • Vowel and consonant quantity in Southern German varieties

    Vowel and consonant quantity in Southern German varieties: D - A - CH project granted by DFG, FWF, SNF

    Principal investigators: Felicitas Kleber, Michael Pucher, Sylvia Moosmüller†, Stephan Schmid 

    Start of the project: 1st of June 2015

    Project description:


    The Central Bavarian varieties, to which the Viennese varieties belong, seem to have changed diachronically. From the first phonetic descriptions (Pfalz 1913) to more current descriptions (Moosmüller & Brandstätter 2014) the diachronic change becomes visible on several levels of the varieties.

    In this project we focus on the (in)stability of the timing system, or more precise, the quantity relations in Vowel + Consonant sequences and compare our results with the project partners in Zurich and Munich.


    The aims of this project are two-fold. The first aim is to develop a typology of the Vowel + Consonant quantities in Southern German varieties (Bavarian (Munich + Vienna) and Alemannic (Zurich)) in C1V1C2V2contexts (where C2can be either fricatives or nasals or plosives) and in consonant cluster sequences with increasing initial and final consonant cluster complexity. The second aim is to investigate prosodic changes in an apparent-time study and to examine the influence of internal factors (eg. speech rate) and external factors (language attitudes) on the production of speech.


    Recordings and analyses of 40 speakers of the Viennese varieties (balanced for age, gender, and educational background) will be conducted. During the recording sessions the speakers are asked to read and repeat sentences in two speech rates. Furthermore a subset of speakers is asked to participate in an articulatory recording with an electromagnetic articulograph (EMA). These recordings take place at our project partners’ laboratory in Munich.


    The results will not only provide insight in the current timing system of speakers of the Viennese varieties but also enable us to draw conclusions about sound changes in progress.


  • WiABahn - Acoustic Effect of Shielding Edges Near the Rail and Roofs Above Railway Platforms


    Railway platforms are located very close to the track and thus are assumed to alter the sound propagation. The degree of this effect, however, has not yet been investigated in detail


    The aim of the project WiaBahn was to investigate the shielding effect of railway platforms. One of the main questions was how to properly deal with the vicinity to the track, the platform’s large reflecting horizontal surface, and the often present canopy. It is unclear whether standard noise propagation prediction methods can be applied without modifications.


    Based on measurements directly at the platform as well as in the distance the acoustic effect of low railway platforms was investigated and suitable source models for the 2.5D boundary element method (BEM) as well as for standardized prediction methods were derived. The advantage of the 2.5D method which was also used in the project PASS is, that a constant cross-section can be combined with point sources or incoherent line sources which is not possible with pure 2D methods. 3D BEM is not feasible for such large structures.

    WiaBahn was funded by the FFG (project 845678) and the ÖBB. The project was done in cooperation with the  Austrian Institute of Technology (AIT, project leader) and Kirisits Engineering Consultants.