French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

Running period: 2014-2017 (project started on March 1, 2014).


One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

  1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
  2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

Working program:

POTION is structured in three main tasks:

  1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
  2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
  3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

More information on the project can be found on the POTION web page.


  • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

Related topics investigated at the ARI:

START project of P. Balazs.



This international, multi-disciplinary and team-oriented project will expand the group Mathematics and Acoustical Signal Processing at the Acoustic Research Institute in cooperation with NuHAG Vienna (Hans G. Feichtinger, M. Dörfler, K. Gröchenig), Institute of Telecommunication Vienna (Franz Hlawatsch), LATP Marseille (Bruno Torrésani) LMA (Richard Kronland-Martinet). CAHR (Torsten Dau, Peter Soendergaard), the FYMA Louvain-la-Neuve (Jean-Pierre Antoine), AG Numerics (Stephan Dahlke), School of Electrical Engineering and Computer Science (Damian Marelli) as well as the BKA Wiesbaden (Timo Becker).

Within the institute the groups Audiological Acoustics and Psychoacoutics, Computational Acoustics, Acoustic Phonetics and Software Development are involved in the project.

This project is funded by the FWF as a START price . It is planned to run from May 2012 to April 2018.






General description:

We live in the age of information where the analysis, classification, and transmission of information is f essential importance. Signal processing tools and algorithms form the backbone of important technologieslike MP3, digital television, mobile phones and wireless networking. Many signal processing algorithms have been adapted for applications in audio and acoustics, also taking into account theproperties of the human auditory system.

The mathematical concept of frames describes a theoretical background for signal processing. Frames are generalizations of orthonormal bases that give more freedom for the analysis and modificationof information - however, this concept is still not firmly rooted in applied research. The link between the mathematical frame theory, the signal processing algorithms, their implementations andfinally acoustical applications is a very promising, synergetic combination of research in different fields.

Therefore the main goal of this multidisciplinary project is to

-> Establish Frame Theory as Theoretical Backbone of Acoustical Modeling

in particular in psychoacoustics, phonetic and computational acoustics as well as audio engineering.



For this auspicious connection of disciplines, FLAME will produce substantial impact on both the heory and applied research.

The theory-based part of FLAME consists of the following topics:

  • T1 Frame Analysis and Reconstruction Beyond Classical Approaches
  • T2 Frame Multipliers, Extended
  • T3 Novel Frame Representation of Operators Motivated by Computational Acoustics

The application-oriented part of FLAME consists of:

  • A1 Advanced Frame Methods for Perceptual Sparsity in the Time-Frequency Plane
  • A2 Advanced Frame Methods for the Analysis and Classification of Speech
  • A3 Advanced Frame Methods for Signal Enhancement and System Estimation

Press information:





From many previous applications, it is known that inverse problems often require a regularization that makes the inversion numerically stable. In this project, sequences that allow a bounded, injective analysis (that is not boundedly invertible) are investigated, .


Even for general sequences, analysis operator and synthesis operator can be defined. The first part of this project will investigate the most general results of these definitions. For example, it can be shown that the analysis operator is always a closed operator. Although it can be shown that the existence of another sequence that allows a perfect reconstruction fit can not be bounded, the question of how to construct such a "dual sequence" will be investigated.


Such sequences have already found applications in wavelet analysis, in which dual sequences were constructed algorithmically. Also, the original system investigated by Gabor with a redundancy of 1 satisfies this condition.


  • M. El-Gebeily, Department of Mathematical Sciences, King Fahd University of Petroleum and Minerals, Saudi Arabia
  • J. P. Antoine, Unité de physique théorique et de physique mathématique – FYMA, Belgium


Railway vehicles passing through tight curves can produce a high pitched noise called curve squeal. Curve squeal is a very salient type of noise located in the high frequency range that can range between a tonal narrow band and a wide band noise. The reason for the tonal noise is lateral creepage on the top of the rail, which excites wheel vibration at frequencies corresponding to their modes. Wide band noise, however, is caused by wheel flanges touching the rail.


The project PAAB aims at investigating the effect on the perceived annoyance of such noises using in a perception test. Using the resulting perceptual characterization of curve squeal should aid in more adequately considering this type of noise in noise mapping.


Based on previous conventional large-scale emission measurements as well as new measurements at immission distances using a head-and-torso-simulator representative samples for curve squeal will be derived and used in a perception test. This will also be aided by using synthetic well defined curve squeal noise.

PAAB is funded by the FFG (project 860523) and the Austrian Federal Railways (ÖBB). The project is done in cooperation with the Research Center of Railway Engineering, Traffic Economics and Ropeways, Institute of Transportation, Vienna University of Technololgy (project leader), Kirisits Engineering Consultants, and psiacoustic Umweltforschung und Engineering GmbH.



Bilateral Cochlear Implants: Physiology and Psychophysics

Current cochlear implants (CIs) are very successful in restoring speech understanding in individuals with profound or complete hearing loss by electrically stimulating the auditory nerve. However, the ability of CI users to localize sound sources and to understand speech in complex listening situations, e.g. with interfering speakers, is dramatically reduced as compared to normal (acoustically) hearing listeners. From acoustic hearing studies it is known that interaural time difference (ITD) cues are essential for sound localization and speech understanding in noise. Users of current bilateral CI systems are, however, rather limited in their ability to perceive salient ITDs cues. One particular problem is that their ITD sensitivity is especially low when stimulating at relatively high pulses rates which are required for proper encoding of speech signals.  

In this project we combine psychophysical studies in human bilaterally implanted listeners and physiological studies in bilaterally implanted animals to find ways in order to improve ITD sensitivity in electric hearing. We build on the previous finding that ITD sensitivity can be enhanced by introducing temporal jitter (Laback and Majdak, 2008) or short inter-pulse intervals (Hancock et al., 2012) in high-rate pulse sequences. Physiological experiments, performed at the Eaton-Peabody Laboratories Neural Coding Group (Massachusetts Eye and Ear Infirmary, Harvard Medical School, PI: Bertrand Delgutte), are combined with matched psychoacoustic experiments, performed at the EAP group of ARI (PI: Bernhard Laback). The main project milestones are the following:

·         Aim 1: Effects of auditory deprivation and electric stimulation through CI on neural ITD sensitivity. In physiological experiments it is studied if chronic CI stimulation can reverse the effect of neonatal deafness on neural ITD sensitivity.

·         Aim 2: Improving the delivery of ITD information with high-rate strategies for CI processors.

   A. Improving ITD sensitivity at high pulse rates by introducing short inter-pulse intervals

   B. Using short inter-pulse intervals to enhance ITD sensitivity with “pseudo-syllable” stimuli.

Co-operation partners:

·         External: Eaton-Peabody Laboratories Neural Coding Group des Massachusetts Eye and Ear Infirmary an der Harvard Medical School (PI: Bertrand Delgutte)

·         Internal: Mathematics and Signal Processing for Acoustics


·      This project is funded by the National Institute of Health (NIH).

·      It is planned to run from 2014 to 2019.

Press information:

·      Article in DER STANDARD:

·      Article in DIE PRESSE:

·      OEAW website:


See Also

ITD MultEl


The ability of listeners to discriminate literal meanings from figurative language, affective language, or rhetorical devices such as irony is crucial for a successful social interaction. This discriminative ability might be reduced in listeners supplied with cochlear implants (CIs), widely used auditory prostheses that restore auditory perception in the deaf or hard-of-hearing. Irony is acoustically characterised by especially a lower fundamental frequency (F0), a lower intensity and a longer duration in comparison to literal utterances. In auditory perception experiments, listeners mainly rely on F0 and intensity values to distinguish between context-free ironic and literal utterances. As CI listeners have great difficulties in F0 perception, the use of frequency information for the detection of irony is impaired. However, irony is often additionally conveyed by characteristic facial expressions.


The aim of the project is two-fold: The first (“Production”) part of the project will study the role of paraverbal cues in verbal irony of Standard Austrian German (SAG) speakers under well-controlled experimental conditions without acoustic context information. The second (“Perception”) part will investigate the performance in recognizing irony in a normal-hearing control group and a group of CI listeners.


Recordings of speakers of SAG will be conducted. During the recording session, the participants will be presented with scenarios that evoke either a literal or an ironic utterance. The response utterances will be audio- and video-recorded. Subsequently, the thus obtained context-free stimuli will be presented in a discrimination test to normal-hearing and to postlingually deafened CI listeners in three modes: auditory only, auditory+visual, visual only.


The results will not only provide information on irony production in SAG and on multimodal irony perception and processing, but will, most importantly, identify the cues that need to be improved in cochlear implants in order to allow CI listeners full participation in daily life.

Projektleitung: Michael Pucher

Beginn des Projekts: 1. Februar 2019


Um den aktuellen Zustand einer Sprache zu erheben, soll bekanntlich der Sprachgebrauch eines alten, ländlichen, nicht mobilen Mannes analysiert werden. Für Entwicklungstendenzen einer Varietät sollte man jedoch die Sprache einer jungen und gebildeten Frau im urbanen Bereich untersuchen. Der Sprachgebrauch von jungen Frauen stellt ein besonders interessantes Forschungsfeld dar: Sie gelten als Initiatoren und Treibkräfte linguistischer Neuheiten einer Sprache, lautlich wie lexikal, die sich von Großstädten aus in den weiteren Sprachraum verbreiten können. Ebenso wird angenommen, dass aufgeschlossene junge Frauen linguistische Innovationen rascher übernehmen als ihre männlichen Peers. Sie verleiben sich eine neue Art zu sprechen schneller ein und geben diese an ihre späteren Kinder weiter. Frauen tendieren auch dazu, sprachliche Merkmale als social identifier zu verwenden, um sich der gleichen Peergroup zugehörig zu zeigen und können dadurch zu einem Sprachwandel beitragen.

Die Stadt Wien hat sich in den vergangenen 30 Jahren stark verändert; so ist die Bevölkerung um 15% gestiegen und mit ihr auch die Anzahl der gesprochenen Sprachen. Laut einer Erhebung der Arbeiterkammer werden in Wien ca. 100 verschiedene Sprachen verwendet und man kann Wien nicht absprechen, weiterhin als ein Schmelztiegel verschiedenster Sprachen und Kulturen in Mitteleuropa zu gelten. Dass sich diese gesellschaftlichen bzw. gesellschaftspolitischen Veränderungen nicht nur im lexikalischen Sprachgebrauch der WienerInnen widerspiegeln, sondern ebenso in ihrer physiologischen Stimme zum Ausdruck kommen, soll hier den Ausgangspunkt der Studie darstellen.

In dieser Untersuchung wird die Stimme als der physiologische und im Vokaltrakt modulierter Schall zur Lautäußerungen des Menschen gesehen. Die Stimme kann abgesehen davon auch als Ort des verkörperlichten Herz der gesprochenen Sprache gelten, die den Körper durch Indexikalität im sozialen Raum verankert. Als Vehikel der persönlichen Identität kann die Stimme nicht nur soziokulturelle, sondern auch gesellschaftspolitische Merkmale (bspw. „Frauen in Führungspositionen haben eine tiefere Stimme“) widerspiegeln. Hier übernimmt die Soziophonetik eine tragende Rolle, denn sie stellt ein wichtiges Instrument dar, das es ermöglicht, den sozialen Raum und seine gesellschaftsrelevanten Diskurse mit dem Individuum zu verknüpfen.

Studien aus dem angloamerikanischen Raum wie legen nahe, dass sich die Stimme der jungen Frau in einem Wandel befindet. Das soziophonetische Stimmphänomen Vocal Fry hat sich inzwischen im angloamerikanischen Raum zum prominenten Sprachmerkmal junger, gebildeter und urbanen Frauen entwickelt.

Basierend auf zwei Korpora soll eine Longitudinalstudie entstehen, die nachskizziert, inwiefern sich die Stimme der jungen Wienerin geändert hat. Soziophonetische Studien zu Frauenstimmen gibt es in Österreich nicht, vor allem in Hinsicht auf die angestrebte Qualität der Studie. Durch ihren longitudinalen Charakter kann sie aufzeigen, in wie weit das gesellschaftliche Geschehen Einfluss auf die Stimme der Frau ausübt.

Darüber hinaus bietet diese Studie eine einmalige Gelegenheit, eine Momentaufnahme der Wienerin und ihrer Stimme zu erhalten und sie in einen historischen Kontext zu setzen.


Informationen zur Teilnahme finden Sie hier!

General Information

Funded by the Vienna Science and Technology Fund (WWTF) within the  "Mathematics and …2016"  Call (MA16-053)

Principal Investigator: Georg Tauböck

Co-Principal Investigator: Peter Balazs

Project Team: Günther Koliander, José Luis Romero  

Duration: 01.07.2017 – 01.07.2021


Signal processing is a key technology that forms the backbone of important developments like MP3, digital television, mobile communications, and wireless networking and is thus of exceptional relevance to economy and society in general. The overall goal of the proposed project is to derive highly efficient signal processing algorithms and to tailor them to dedicated applications in acoustics. We will develop methods that are able to exploit structural properties in infinite-dimensional signal spaces, since typically ad hoc restrictions to finite dimensions do not sufficiently preserve physically available structure. The approach adopted in this project is based on a combination of the powerful mathematical methodologies frame theory (FT), compressive sensing (CS), and information theory (IT). In particular, we aim at extending finite-dimensional CS methods to infinite dimensions, while fully maintaining their structure-exploiting power, even if only a finite number of variables are processed. We will pursue three acoustic applications, which will strongly benefit from the devised signal processing techniques, i.e., audio signal restoration, localization of sound sources, and underwater acoustic communications. The project is set up as an interdisciplinary endeavor in order to leverage the interrelations between mathematical foundations, CS, FT, IT, time-frequency representations, wave propagation, transceiver design, the human auditory system, and performance evaluation.


compressive sensing, frame theory, information theory, signal processing, super resolution, phase retrieval, audio, acoustics




Scientific and Technological Cooperation between Austria and Serbia (SRB 01/2018)

Duration of the project: 01.07.2018 - 30.06.2020


Project partners:

Acoustics Research Institute, ÖAW (Austria)

University of Vienna (Austria)

University of Novi Sad (Republic of Serbia)


Project website:

ITD MultEl: Binaural-Timing Sensitivity in Multi-Electrode Stimulation

Binaural hearing is extremely important in everyday life, most notably for sound localization and for understanding speech embedded in competing sound sources (e.g., other speech sources). While bilateral implantation has been shown to provide cochlear implant (CIs) listeners with some basic left/right localization ability, the performance with current CI systems is clearly reduced compared to normal hearing. Moreover, the binaural advantage in speech understanding in noise has been shown to be mediated mainly by the better-ear effect, while there is only very little binaural unmasking.

There exists now a body of literature on binaural sensitivity of CI listeners stimulated at a single interaural electrode pair. However, the CI listener’s sensitivity to binaural cues under more realistic conditions, i.e., with stimulation at multiple electrodes, has not been systematically addressed in depth so far.

This project attempts to fill this gap. In particular, given the high perceptual importance of ITDs, this project focuses on the systematic investigation of the sensitivity to ITD under various conditions of multi-electrode stimulation, including interference from neighboring channels, integration of ITD information across channels, and the perceptually tolerable room for degradations of binaural timing information.

Involved people:

Start: January 2013

Duration: 3 years

Funding: MED-EL