• Gestural coordination and speech rhythm

    Implications for pathological speech

    Coordinated Project 2016-17 Scuola Normale Superiore (SNS), Pisa – Acoustic Research Institute (ARI), Austrian Academy of Science, Vienna
    PIs: Chiara Celata (SNS), Sylvia Moosmueller (ARI)
    Research personnel: Chiara Meluzzi (SNS), Bettina Hobel (ARI)

    Short Description

    The project aims at modeling the impact of speech gesture coordination on the rhythmical properties of languages.

    Speech gestural structures are sets of gestures and a specification of how they are temporally and spatially coordinated with respect to one another. Gestural anticipations, posticipations and overlap are the ingredients of coarticulation, i.e., the coordinatory activity of speech movements that allows adjacent vowels and consonants to be produced simultaneously, thus resulting into one smooth whole.

    Rhythm is the systematic patterning of timing, accent, and grouping in sequences of events and encompasses both speech and music domains. We only become aware of how important it is in verbal communication when we listen to non-fluent speech. For example, deaf people with impaired or absent auditory feedback can be taught, after cochlear implantation and logopedic rehabilitation, to develop an “auditory” map for speech processing and imitation, but native-like patterns of gestural and rhythmical coordination are much more difficult to achieve.

    Both gestural coordination and rhythm thus contribute to the way fluent speech is programmed, produced, and even perceived.

    However, we still miss a global understanding of how the two dimensions of gestural coordination and speech rhythm interact in natural languages.

    Indeed, the gestural and the rhythmical approaches sometimes make different predictions. For example, we do not know whether the consonants composing heterosyllabic clusters are articulatorily independent from one another and are timed with respect to different vocalic nuclei, as some theoretical frameworks in the domain of gestural coordination would predict, or whether they are rather globally timed with the preceding vocalic nucleus, especially if it is stressed, as some proposals in the domain of speech rhythm assume. Also, we do not know if cross-linguistic differences in how heterosyllabic clusters are articulatorily coordinated to vocalic nuclei reflect or are reflected by cross-linguistic differences in the languages’ rhythmical properties.

    This project thus tries to reconcile the gestural and the rhythmical perspective into a unified research framework devoted to uncovering how inter-segmental coordination influences, and is influenced by, the rhythmical properties of supra-segmental entities.

    To that aim, we develop a series of cross-linguistic experiments on Italian and Standard Austrian German to clarify some critical aspects of speech organization in the two languages and to establish a link between language-specific phonotactic constraints and the temporal and spatial properties of segments’ production.

    The experiments, based on a reading task, include acoustic analyses for the identification of the temporal patterns and articulatory (ultrasound tongue imaging, UTI) analyses for the investigation of gestural coordination.

    In addition, it is the purposes of the project to set the stage for an analysis of how the speech of cochlear implanted speakers differs from normal speech with respect to gestural coordination and rhythmic patterns. Spontaneous conversations will be recorded of both Italian and Standard Austrian German speakers. The target of the acoustic analyses will be the identification of the areas of most prominent difficulty concerning both the coarticulatory and the temporal aspects of spontaneous speech produced by CI-speakers.

  • HASSIP: Harmonic Analysis and Statistics for Signal and Image Processing

    Basic Description:

    HASSIP is a Research Training Network funded by the European Commission within the Improving the Human Potential program. The aim of the HASSIP network is to develop research activities and systematic interactions in mathematical analysis and statistics that are directly connected to signal and image processing. Although the Acoustics Research Institute was not initially a partner of this network, P. Balazs became a fellow of this network through cooperation with the group NuHAG.


    • NuHAG, Faculty of Mathematics, University of Vienna
    • Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux des LMA / CRNS Marseille
    • Unité de physique théorique et de physique mathématique – FYMA


    • Basic Properties of Bessel and Frame Multipliers: For Bessel sequences, the investigation of operators M = ∑ mk < f , ψk > is very natural and useful. The above M are Bessel multipliers. The goal of this project is to set the mathematical basis for this kind of operator.
    • Best Approximation of Matrices by Frame Multipliers: Finding the best approximation by multipliers of matrices that represent time-variant systems gives a way to find efficient algorithms to implement such operators. 


    • P. Balazs, "Hilbert-Schmidt Operators and Frames - Classification, Approximation by Multipliers and Algorithms" , International Journal of Wavelets, Multiresolution and Information Processing, Vol. 6, No. 2, pp. 315 - 330, March 2008, preprint, Codes and Pictures: here
    • P. Balazs, "Basic Definition and Properties of Bessel Multipliers", Journal of Mathematical Analysis and Applications, 325, 1: 571--585. (2007) doi:10.1016/j.jmaa.2006.02.012, preprint


    This project ended on 01.01.2009. Its completion allowed the sucessfull application for a 'High Potential'-Project of the WWTF, see MULAC.

  • HRTFMulAc: Improvement of Head-Related-Transfer-Function-Measurements


    Head-related transfer functions (HRTF) describe the sound transmission from the free field to a place in the ear canal in terms of linear time-invariant systems. Due to the physiological differences of the listeners' outer ears, the measurement of each subject's individual HRTFs is crucial for sound localization in virtual environments (virtual reality).

    Measurement of an HRTF can be considered a system identification of the weakly non-linear electro-acoustic chain from the sound source room's HRTF microphone. An optimized formulation of the system identification with exponential sweeps, called the "multiple exponential sweep method" (MESM), was used for the measurement of transfer functions. For this measurement of transfer functions, either the measurement duration or the signal-to-noise ratio could be optimized.

    Initial heuristic experiments have shown that using Gabor multipliers to extract the relevant sweeps in the MESM post-processing procedure improves the signal-to-noise ratio of the measured data even further. The objective of this project is to study, in detail, how frame multipliers can optimally be used during this post-processing procedure. In particular, wavelet frames, which best fit the structure of an exponential sweep, will be studied.


    Systematic numeric experiments will be conducted with simulated slowly time-variant, weakly non-linear systems. As the parameters of the involved signals are precisely known and controlled, an optimal symbol will automatically be created. Finally, the efficiency of the new method will be tested on a "real world" system, which was developed and installed in the semi-anechoic room of the Institute. It uses in-ear microphones, a subject turntable, 22 loudspeakers on a vertical arc, and a head tracker.


    The new method will be used for improved HRTF measurement.

  • Implementierung eines Formantsynthesizers (Klatt) in STx


    Es wird ein Formantsythesizer, basierend auf dem Klatt Synthesizer, implementiert, der sowohl zur Erzeugung stationärer Vokale und auch zeitvarianter Formant- und Grundfrequenzspuren verwendet werden kann. Die Implementierung erfolgt als SP-Atom.


    Die Synthese wird als Kontrollwerkzeug in die Anwendungen Viewer2 (Spektorgramm und Parameter Plot) und SPEXL (Segmentations-Tool) eingebunden. Dazu wird eine graphische Steuerung implementiert, die geeignete Funktionen zur Eingabe von Formantdaten (Vokalsynthese) und zur graphischen Auswahl von Parametersätzen (Resynthese von Parameterverläufen) zur Verfügung stellt.

  • INSIGHT: Infinite Dimensional Signal Processing Techniques for Acoustic Applications

    General Information

    Funded by the Vienna Science and Technology Fund (WWTF) within the  "Mathematics and …2016"  Call (MA16-053)

    Principal Investigator: Georg Tauböck

    Co-Principal Investigator: Peter Balazs

    Project Team: Günther Koliander, José Luis Romero  

    Duration: 01.07.2017 – 01.07.2021


    Signal processing is a key technology that forms the backbone of important developments like MP3, digital television, mobile communications, and wireless networking and is thus of exceptional relevance to economy and society in general. The overall goal of the proposed project is to derive highly efficient signal processing algorithms and to tailor them to dedicated applications in acoustics. We will develop methods that are able to exploit structural properties in infinite-dimensional signal spaces, since typically ad hoc restrictions to finite dimensions do not sufficiently preserve physically available structure. The approach adopted in this project is based on a combination of the powerful mathematical methodologies frame theory (FT), compressive sensing (CS), and information theory (IT). In particular, we aim at extending finite-dimensional CS methods to infinite dimensions, while fully maintaining their structure-exploiting power, even if only a finite number of variables are processed. We will pursue three acoustic applications, which will strongly benefit from the devised signal processing techniques, i.e., audio signal restoration, localization of sound sources, and underwater acoustic communications. The project is set up as an interdisciplinary endeavor in order to leverage the interrelations between mathematical foundations, CS, FT, IT, time-frequency representations, wave propagation, transceiver design, the human auditory system, and performance evaluation.


    compressive sensing, frame theory, information theory, signal processing, super resolution, phase retrieval, audio, acoustics




  • IrregGabMul: Basic Properties of Irregular Gabor Multipliers


    So-called Gabor multipliers are particular cases of time-variant filters. Recently, Gabor systems on irregular grids have become a popular research topic. This project deals with Gabor multipliers, as a specialization of frame multipliers on irregular grids.


    The initial stage of this project aims to investigate the continuous dependence of an irregular Gabor multiplier on its parameter (i.e. the symbol), window, and lattice. Furthermore, an algorithm to find the best approximation of any matrix (i.e. any time-variant system) by such an irregular Gabor multiplier is being developed.


    Gabor multipliers have been used implicitly for quite some time. Investigating the properties of these operators is a current topic for signal processing engineers. If the standard time-frequency grid is not useful to the application, it is natural to work with irregular grids. An example of this is the usage of non-linear frequency scales, like bark scales.


    H. G. Feichtinger, NuHAG, Faculty of Mathematics, University of Vienna


    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC (WWTF 2007).

  • Irregular Frames of Translates


    General frame theory can be more specialized if a structure is imposed on the elements of the frame in question. One possible, very natural structure is sequences of shifts of the same function. In this project, irregular shifts are investigated.


    In this project, the connection to irregular Gabor multipliers will be explored. Using the Kohn Nirenberg correspondence, the space spanned by Gabor multipliers is just a space spanned by translates. Furthermore, the special connection of the Gramian function and the Grame matrix for this case will be investigated.


    A typical example of frames of translates is filter banks, which have constant shapes. For example, the phase vocoder corresponds to a filter bank with regular shifts. Introducing an irregular shift gives rise to a generalization of this analysis / synthesis system.


    • S. Heineken, Research Group on Real and Harmonic Analysis, University of Buenos Aires
  • IrrelevanceMask: Mathematical Foundation of the Irrelevance Model


    An irrelevance algorithm based on simultaneous masking is implemented In STx. In the years following its first development by Eckel, the efficiency of this algorithm has been clearly shown. In this project, this irrelevance model will be based on modern mathematic and psychoacoustic theories and knowledge.


    This algorithm can be described as a Gabor multiplier with an adaptive symbol. With existing related theory, it becomes clear that a high redundancy must be selected. This guarantees:

    • perfect reconstruction synthesis
    • an under-spread operator for good time-frequency localization
    • a smoothing-out of easily detectable quick on/off cycles

    Furthermore, it can be shown that the model used for the spreading function here is mathematically equivalent to the excitation pattern.


    This algorithm has been used for several years already for things such as:

    • automobile sound design
    • over-masking for background-foreground separation
    • improved speech recognition in noise
    • contrast increase for hearing-impaired persons


    • G. Eckel, Institut für Elektronische Musik und Akustik, Graz


    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing, Vol. 17 (7) , in press (2009) , preprint


    This project ended on 01.01.2010, and leads to a sub-project of the 'High Potential'-Project of the WWTF, MULAC.

  • ITD MultEl

    ITD MultEl: Binaural-Timing Sensitivity in Multi-Electrode Stimulation

    Binaural hearing is extremely important in everyday life, most notably for sound localization and for understanding speech embedded in competing sound sources (e.g., other speech sources). While bilateral implantation has been shown to provide cochlear implant (CIs) listeners with some basic left/right localization ability, the performance with current CI systems is clearly reduced compared to normal hearing. Moreover, the binaural advantage in speech understanding in noise has been shown to be mediated mainly by the better-ear effect, while there is only very little binaural unmasking.

    There exists now a body of literature on binaural sensitivity of CI listeners stimulated at a single interaural electrode pair. However, the CI listener’s sensitivity to binaural cues under more realistic conditions, i.e., with stimulation at multiple electrodes, has not been systematically addressed in depth so far.

    This project attempts to fill this gap. In particular, given the high perceptual importance of ITDs, this project focuses on the systematic investigation of the sensitivity to ITD under various conditions of multi-electrode stimulation, including interference from neighboring channels, integration of ITD information across channels, and the perceptually tolerable room for degradations of binaural timing information.

    Involved people:

    Start: January 2013

    Duration: 3 years

    Funding: MED-EL

  • ITD PsyPhy

    Normal 0 21 false false false DE-AT X-NONE X-NONE

    /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Normale Tabelle"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Liberation Serif","serif"; mso-bidi-font-family:Mangal; mso-fareast-language:ZH-CN; mso-bidi-language:HI;}

    Bilateral Cochlear Implants: Physiology and Psychophysics

    Current cochlear implants (CIs) are very successful in restoring speech understanding in individuals with profound or complete hearing loss by electrically stimulating the auditory nerve. However, the ability of CI users to localize sound sources and to understand speech in complex listening situations, e.g. with interfering speakers, is dramatically reduced as compared to normal (acoustically) hearing listeners. From acoustic hearing studies it is known that interaural time difference (ITD) cues are essential for sound localization and speech understanding in noise. Users of current bilateral CI systems are, however, rather limited in their ability to perceive salient ITDs cues. One particular problem is that their ITD sensitivity is especially low when stimulating at relatively high pulses rates which are required for proper encoding of speech signals.  

    In this project we combine psychophysical studies in human bilaterally implanted listeners and physiological studies in bilaterally implanted animals to find ways in order to improve ITD sensitivity in electric hearing. We build on the previous finding that ITD sensitivity can be enhanced by introducing temporal jitter (Laback and Majdak, 2008) or short inter-pulse intervals (Hancock et al., 2012) in high-rate pulse sequences. Physiological experiments, performed at the Eaton-Peabody Laboratories Neural Coding Group (Massachusetts Eye and Ear Infirmary, Harvard Medical School, PI: Bertrand Delgutte), are combined with matched psychoacoustic experiments, performed at the EAP group of ARI (PI: Bernhard Laback). The main project milestones are the following:

    ·        Aim 1: Effects of auditory deprivation and electric stimulation through CI on neural ITD sensitivity. In physiological experiments it is studied if chronic CI stimulation can reverse the effect of neonatal deafness on neural ITD sensitivity.

    ·        Aim 2: Improving the delivery of ITD information with high-rate strategies for CI processors.

      A. Improving ITD sensitivity at high pulse rates by introducing short inter-pulse intervals

      B. Using short inter-pulse intervals to enhance ITD sensitivity with “pseudo-syllable” stimuli.

    Co-operation partners:

    ·        External: Eaton-Peabody Laboratories Neural Coding Group des Massachusetts Eye and Ear Infirmary an der Harvard Medical School (PI: Bertrand Delgutte)

    ·        Internal: Mathematics and Signal Processing for Acoustics


    ·     This project is funded by the National Institute of Health (NIH).

    ·     It is planned to run from 2014 to 2019.

    Press information:

    ·     Article in DER STANDARD:

    ·     Article in DIE PRESSE:

    ·     OEAW website:


    See Also

    ITD MultEl

  • LabEquip: Equipment and Facilities in the Lab

    The aim of this project is to maintain the experimental facilities in our institute's laboratory.

    The lab consists of four testing places:

    • GREEN and BLUE: Two sound-booths (IAC-1202A) are used for audio recording and psychoacoustic testing performed with headphones. Each of the booths is controlled from outside by a computer. Two bidirectional audio channels with sampling rates up to 192 kHz are available.
    • RED: A visually-separated corner can be used for experiments with cochlear implant listeners. A computer controls the experimental procedure using a bilateral, direct-electric stimulation.
    • YELLOW: A semi-anechoic room, with a size of 6 x 6 x 3 m, can be used for acoustic tests and measurements in a nearly-free field. As many as 24 bidirectional audio channels, virtual environments generated by a head mounted display, and audio and video surveillance are available for projects like HRTF measurement, localization tests or acoustic holography.

    The rooms are not only used for measurements and experiments, also the Acoustics Phonetics group is doing speech recordings for dialect research and speaker identification, for example for survey reports. The facilities are also used to detect psychoacoustical validations.

    During the breaks in experiments, the subjects can use an Internet terminal or relax on a couch while sipping hot coffee...

  • LARS


    Rumble strips are (typically periodic) grooves place at the side of the road. When a vehicle passes over a rumble strip the noise and vibration in the car should alert the driver of the imminent danger of running off the road. Thus, rumble strips have been shown to have a positive effect on traffic safety. Unfortunately, the use of rumble strips in the close vicinity of populated areas is problematic due to the increased noise burden.


    The aim of the project LARS (LärmArme RumpelStreifen or low noise rumble strips) was to find rumble strip designs that cause less noise in the environment without significantly affecting the alerting effect inside the vehicle. For this purpose, a number of conventional designs as well as three alternative concepts were investigated: conical grooves to guide the noise under the car, pseudo-random groove spacing to reduce tonality and thus annoyance, as well as sinusoidal depth profiles which should produce mostly vibration and only little noise and which are already used in practice.


    Two test tracks were established covering a range of different milling patterns in order to measure the effects of rumble strips for a car and a commercial vehicle running over them. Acoustic measurements using microphones and a head-and-torso-simulator were done inside the vehicle as well as in the surroundings of the track. Furthermore, the vibration of the steering wheel and the driver seat were measured. Using the acoustics measurements, synthetic rumble strip noises were produced, in order to get a wider range of possible rumble strip designs than by pure measurements.

    Perception tests with 16 listeners were performed where the annoyance of the immissions as well as the urgency and reaction times for the sounds generated in the interior were determined also using the synthetic stimuli.

    LARS was funded by the FFG (project 840515) and the ASFINAG. The project was done in cooperation with the Research Center of Railway Engineering, Traffic Economics and Ropeways, Institute of Transportation, Vienna University of Technology, and ABF Strassensanierungs GmbH.

  • Lateral Variants of Bosnian Migrants Living in Vienna


    The aim of this study is to investigate the phonetics of second language acquisition and first language attrition, based on the acoustic and articulatory lateral realizations of Bosnian migrants living in Vienna. Bosnian has two lateral phonemes (a palatalized and an alveolar/velarized one), whereas Standard Austrian German features only one lateral phoneme (an alveolar lateral). In the Viennese dialect however, this phoneme also has a velarized variant.

    This phonetic investigation will be conducted with respect to the influence of language contact between Bosnian and SAG, and Bosnian and the Viennese dialect, as well as concerning the influence of gender and identity construction.


    The recordings will be conducted with female and male Bosnian speakers, aged between 20 and 35 years at the time of emigration, who came to Vienna during the Bosnian war 1992-1995. Additionally, control groups of monolingual L1 speakers of Bosnian, SAG and Vd will be recorded. All recordings will include reading tasks in order to elicit controlled speech, as well as spontaneous speech in the form of biographical interviews. The analyses will comprise quantitative and qualitative aspects. Quantitatively, the acoustic parameters formant frequencies (especially F2 and F3), duration and intensity of the laterals and their phonetic surrounding will be analyzed. Additionally, articulatory analyses will be performed using EPG and UTI data. Qualitatively, biographical information, language attitudes and social networks will be analysed in order to obtain information about speaker-specific or group-specific characteristics.


    The results of this study are relevant to understanding the processes of sound-realization and sound-change in the domains of language contact (phonetic processes in second language acquisition and first language attrition), sociolinguistics, and the sociology of identity construction

  • LION - Localisation and Identification of Moving Noise Sources


    We thank the FWF for supporting the project – grant number I 4299-N32

    Sound source localisation methods are widely used in the automotive, railway, and aircraft industries. Many different methods are available for the analysis of sound sources at rest. However, methods for the analysis of moving sound sources still suffer from the complexities introduced by the Doppler frequency shift, the relatively short measuring times, and propagation effects in the atmosphere. The project LION combines the expertise of four research groups from three countries working in the field of sound source localisation: The Beuth Hochschule für Technik Berlin (Beuth), the Turbomachinery and Thermoacoustics chair at TU-Berlin (TUB), the Acoustic Research Institute (ARI) of the Austrian Academy of Sciences in Vienna and the Swiss laboratory for Acoustics / Noise Control of EMPA. The mentioned institutions cooperate to improve and extend the existing methods for the analysis of moving sound sources. They want to increase the dynamic range, the spatial, and the frequency resolution of the methods and apply them to complex problems like the analysis of tonal sources with strong directivities or coherent and spatially distributed sound sources.



    The partners want to jointly develop and validate these methods, exploiting the synergy effects that arise from such a partnership. Beuth plans to extend the equivalent source method in frequency domain to moving sources located in a halfspace, taking into account the influence of the ground and sound propagation through an inhomogeneous atmosphere. ARI contributes acoustic holography, principal component analysis, and independent component analysis methods and wants to use its experience with pass-by measurements for trains to improve numerical boundary element methods including the transformation from fixed to moving coordinates. TUB develops optimization methods and model based approaches for moving sound sources and will contribute its data base of fly-over measurements with large microphone arrays as test cases. EMPA contributes a sound propagation model based on Time Variant Digital Filters with particular consideration of turbulence and ground effects and will also generate synthetic test cases for the validation of sound source localization algorithms. The project is planned for a period of three years. The work program is organized in four work packages: 1) the development of algorithms and methods, 2) the development of a virtual test environment for the methods, 3) the simulation of virtual test cases, and 4) the application of the new methods to existing test cases of microphone array measurements of trains and aircraft.


  • Localization of Sound Sources with Behind-the-Ear Microphones (Loca-BtE-CI)

    Objective and Method:

    Current cochlear implant (CI) systems are not designed for sound localization in the sagittal planes (front-back and up/down-dimensions). Nevertheless, some of the spectral cues that are important for sagittal plane localization in normal hearing (NH) listeners might be audible for CI listeners. Here, we studied 3-D localization with bilateral CI-listeners using "clinical" CI systems and with NH listeners. Noise sources were filtered with subject-specific head-related transfer functions, and a virtually structured environment was presented via a head-mounted display to provide feedback for learning. 


    The CI listeners performed generally worse than NH listeners, both in the horizontal and vertical dimensions. The localization error decreases with an increase in the duration of training. The front/back confusion rate of trained CI listeners was comparable to that of untrained (naive) NH listeners and two times higher than for the trained NH listeners. 


    The results indicate that some spectral localization cues are available to bilateral CI listeners, even though the localization performance is much worse than for NH listeners. These results clearly show the need for new strategies to encode spectral localization cues for CI listeners, and thus improve sagittal plane localization. Front-back discrimination is particularly important in traffic situations.


    FWF (Austrian Science Fund): Project # P18401-B15


    • Majdak, P., Goupell, M., and Laback, B. (2011). Two-Dimensional Localization of Virtual Sound Sources in Cochlear-Implant Listeners, Ear & Hearing.
    • Majdak, P., Laback, B., and Goupell, M. (2008). 3D-localization of virtual sound sources in normal-hearing and cochlear-implant listeners, presented at Acoustics '08  (ASA-EAA joint) conference, Paris
  • LocaMethods: Localization of Virtual Sound Sources


    Humans' ability to localize sound sources in a 3-D space was tested.


    The subjects listened to noises filtered with subject-specific head-related transfer functions (HRTFs). In the first experiment with new subjects, the conditions included a type of visual environment (darkness or structured virtual world) presented via head mounted display (HMD) and pointing method (head and finger/shooter pointing).


    The results show that the errors in the horizontal dimension were smaller when head pointing was used. Finger/shooter pointing showed smaller errors in the vertical dimension. Generally, the different effects of the two pointing methods was significant but small. The presence of a structured, virtual visual environment significantly improved the localization accuracy in all conditions. This supports the idea that using a visual virtual environment in acoustic tasks, like sound localization, is beneficial. In Experiment II, the subjects were trained before performing acoustic tasks for data collection. The performance improved for all subjects over time, which indicates that training is necessary to obtain stable results in localization experiments.


    FWF (Austrian Science Fund): Project # P18401-B15


    • Majdak, P., Goupell, M., and Laback, B. (2010). 3-D localization of virtual sound sources: effects of visual environment, pointing method, and training, Attention, Perception, & Psychophysics 72, 454-469.
    • Majdak, P., Laback, B., Goupell, M., and Mihocic M. (2008). "The Accuracy of Localizing Virtual Sound Sources: Effects of Pointing Method and Visual Environment", presented at AES convention, Amsterdam.
  • LocaPhoto: Localization Model & Numeric Simulations

    Localization of sound sources is an important task of the human auditory system and much research effort has been put into the development of audio devices for virtual acoustics, i.e. the reproduction of spatial sounds via headphones. Even though the process of sound localization is not completely understood yet, it is possible to simulate spatial sounds via headphones by using head-related transfer functions (HRTFs). HRTFs describe the filtering of the incoming sound due to head, torso and particularly the pinna and thus they strongly depend on the particular details in the listener's geometry. In general, for realistic spatial-sound reproduction via headphones, the individual HRTFs must be measured. As of 2012, the available HRTF acquisition methods were acoustic measurements: a technically-complex process, involving placing microphones into the listener's ears, and lasting for tens of minutes.

    In LocaPhoto, we were working on an easily accessible method to acquire and evaluate listener-specific HRTFs. The idea was to numerically calculate HRTFs based on a geometrical representation of the listener (3-D mesh) obtained from 2-D photos by means of photogrammetric reconstruction.

    As a result, we have developed a software package for numerical HRTF calculations, a method for geometry acquisition, and models able to evaluate HRTFs in terms of broadband ITDs and sagittal-plane sound localization performance.


    Further information:


  • MakMulAc: Mathematical Modelling of Auditory Time-Frequency Masking Functions


    It is known in psychoacoustics that not all information contained in a "real world" acoustic signal is processed by the human auditory system. More precisely, it turns out that some time-frequency components mask (overshadow) other components that are close in time or frequency.

    In the software S_TOOLS-STx developed by the Institute, an algorithm based on simultaneous masking has been implemented. This algorithm removes perceptually irrelevant time-frequency components. In this implementation, the model is described as a Gabor multiplier with an adaptive symbol.

    In this project, the masking model will be extended to a true time-frequency model, incorporating frequency and temporal masking.


    Experiments have been conducted (in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille) to test the time-frequency masking properties of a single Gaussian atom, and to study the additivity of these masking properties for several Gaussian atoms.

    The results of these experiments will be used, in combination with theoretical results obtained in the parallel projects studying the mathematical properties of frame multipliers, to approximate or identify the masking model by wavelet and Gabor multipliers.

    The obtained model will then be validated by appropriate psychoacoustical experiments.


    Efficient implementation of a masking filter offers many applications:

    • Sound / Data Compression
    • Sound Design
    • Back-and-Foreground Separation
    • Optimization of Speech and Music Perception

    After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx. 


    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing (2009), in press
    • B. Laback, P. Balazs, G. Toupin, T. Necciari, S. Savel, S. Meunier, S. Ystad and R. Kronland-Martinet, "Additivity of auditory masking using Gaussian-shaped tones", Acoustics'08, Paris, 29.06.-04.07.2008 (03.07.2008)
    • B. Laback, P. Balazs, T. Necciari, S. Savel, S. Ystad, S. Meunier and R. Kronland-Martinet, "Additivity of auditory masking for Gaussian-shaped tone pulses", preprint
  • Mask Suite: Additivity of Auditory Masking Using Gaussian-Shaped Tones


    This project is part of a project cluster that investigates time-frequency masking in the auditory system, in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille. While other subprojects study the spread of masking across the time-frequency plane using Gaussian-shaped tones, this subproject investigates how multiple Gaussian maskers distributed across the time-frequency plane create masking that adds up at a given time-frequency point. This question is important in determining the total masking effect resulting from the multiple time-frequency components (that can be modeled as Gaussian Atoms) of a real-life signal.


    Both the maskers and the target are Gaussian-shaped tones with a frequency of 4 kHz. A two-stage approach is applied to measure the additivity of auditory masking. In the first stage, the levels of the maskers are adjusted to cause the same amount of masking in the target. In the second stage, various combinations of those maskers are tested to study their additivity.

    In the first study, the maskers are spread either in time OR in frequency. In the second study, the maskers are spread in time AND in frequency.


    New insight into the coding of sound in the auditory system could help to design more efficient audio codecs. These codecs could take the additivity of time-frequency masking into account.


    WTZ (project AMADEUS)


    • Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference, Paris.
  • Matrix Representation of Operators Using Frames


    Many problems in physics can be formulated as operator theory problems, such as in differential or integral equations. To function numerically, the operators must be discretized. One way to achieve discretization is to find (possibly infinite) matrices describing these operators using ONBs. In this project, we will use frames to investigate a way to describe an operator as a matrix.


    The standard matrix description of operators O using an ONB (e_k) involves constructing a matrix M with the entries M_{j,k} = < O e_k, e_j>. In past publications, a concept that described operator R in a very similar way has been presented. However, this description of R used a frame and its canonical dual. Currently, a similar representation is being used for the description of operators using Gabor frames. In this project, we are going to develop and completely generalize this idea for Bessel sequences, frames, and Riesz sequences. We will also look at the dual function that assigns an operator to a matrix.


    This "sampling of operators" is especially important for application areas where frames are heavily used, so that the link between model and discretization is maintained. To facilitate implementations, operator equations can be transformed into a finite, discrete problem with the finite section method (much in the same way as in the ONB case).


    • P. Balazs, "Matrix Representation of Operators Using Frames", Sampling Theory in Signal and Image Processing (STSIP) (2007, accepted), preprint
    • P. Balazs, "Hilbert-Schmidt Operators and Frames - Classification, Approximation by Multipliers and Algorithms" , International Journal of Wavelets, Multiresolution and Information Processing, (2007, accepted)  preprint, Codes and Pictures: here