French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.
Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).
Running period: 2014-2017 (project started on March 1, 2014).
One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:
POTION is structured in three main tasks:
More information on the project can be found on the POTION web page.
This international, multi-disciplinary and team-oriented project will expand the group Mathematics and Acoustical Signal Processing at the Acoustic Research Institute in cooperation with NuHAG Vienna (Hans G. Feichtinger, M. Dörfler, K. Gröchenig), Institute of Telecommunication Vienna (Franz Hlawatsch), LATP Marseille (Bruno Torrésani) LMA (Richard Kronland-Martinet). CAHR (Torsten Dau, Peter Soendergaard), the FYMA Louvain-la-Neuve (Jean-Pierre Antoine), AG Numerics (Stephan Dahlke), School of Electrical Engineering and Computer Science (Damian Marelli) as well as the BKA Wiesbaden (Timo Becker).
We live in the age of information where the analysis, classification, and transmission of information is f essential importance. Signal processing tools and algorithms form the backbone of important technologieslike MP3, digital television, mobile phones and wireless networking. Many signal processing algorithms have been adapted for applications in audio and acoustics, also taking into account theproperties of the human auditory system.
The mathematical concept of frames describes a theoretical background for signal processing. Frames are generalizations of orthonormal bases that give more freedom for the analysis and modificationof information - however, this concept is still not firmly rooted in applied research. The link between the mathematical frame theory, the signal processing algorithms, their implementations andfinally acoustical applications is a very promising, synergetic combination of research in different fields.
Therefore the main goal of this multidisciplinary project is to
-> Establish Frame Theory as Theoretical Backbone of Acoustical Modeling
in particular in psychoacoustics, phonetic and computational acoustics as well as audio engineering.
For this auspicious connection of disciplines, FLAME will produce substantial impact on both the heory and applied research.
The theory-based part of FLAME consists of the following topics:
The application-oriented part of FLAME consists of:
From many previous applications, it is known that inverse problems often require a regularization that makes the inversion numerically stable. In this project, sequences that allow a bounded, injective analysis (that is not boundedly invertible) are investigated, .
Even for general sequences, analysis operator and synthesis operator can be defined. The first part of this project will investigate the most general results of these definitions. For example, it can be shown that the analysis operator is always a closed operator. Although it can be shown that the existence of another sequence that allows a perfect reconstruction fit can not be bounded, the question of how to construct such a "dual sequence" will be investigated.
Such sequences have already found applications in wavelet analysis, in which dual sequences were constructed algorithmically. Also, the original system investigated by Gabor with a redundancy of 1 satisfies this condition.
The ability of listeners to discriminate literal meanings from figurative language, affective language, or rhetorical devices such as irony is crucial for a successful social interaction. This discriminative ability might be reduced in listeners supplied with cochlear implants (CIs), widely used auditory prostheses that restore auditory perception in the deaf or hard-of-hearing. Irony is acoustically characterised by especially a lower fundamental frequency (F0), a lower intensity and a longer duration in comparison to literal utterances. In auditory perception experiments, listeners mainly rely on F0 and intensity values to distinguish between context-free ironic and literal utterances. As CI listeners have great difficulties in F0 perception, the use of frequency information for the detection of irony is impaired. However, irony is often additionally conveyed by characteristic facial expressions.
The aim of the project is two-fold: The first (“Production”) part of the project will study the role of paraverbal cues in verbal irony of Standard Austrian German (SAG) speakers under well-controlled experimental conditions without acoustic context information. The second (“Perception”) part will investigate the performance in recognizing irony in a normal-hearing control group and a group of CI listeners.
Recordings of speakers of SAG will be conducted. During the recording session, the participants will be presented with scenarios that evoke either a literal or an ironic utterance. The response utterances will be audio- and video-recorded. Subsequently, the thus obtained context-free stimuli will be presented in a discrimination test to normal-hearing and to postlingually deafened CI listeners in three modes: auditory only, auditory+visual, visual only.
The results will not only provide information on irony production in SAG and on multimodal irony perception and processing, but will, most importantly, identify the cues that need to be improved in cochlear implants in order to allow CI listeners full participation in daily life.
Projektleitung: Michael Pucher
Beginn des Projekts: 1. Februar 2019
Um den aktuellen Zustand einer Sprache zu erheben, soll bekanntlich der Sprachgebrauch eines alten, ländlichen, nicht mobilen Mannes analysiert werden. Für Entwicklungstendenzen einer Varietät sollte man jedoch die Sprache einer jungen und gebildeten Frau im urbanen Bereich untersuchen. Der Sprachgebrauch von jungen Frauen stellt ein besonders interessantes Forschungsfeld dar: Sie gelten als Initiatoren und Treibkräfte linguistischer Neuheiten einer Sprache, lautlich wie lexikal, die sich von Großstädten aus in den weiteren Sprachraum verbreiten können. Ebenso wird angenommen, dass aufgeschlossene junge Frauen linguistische Innovationen rascher übernehmen als ihre männlichen Peers. Sie verleiben sich eine neue Art zu sprechen schneller ein und geben diese an ihre späteren Kinder weiter. Frauen tendieren auch dazu, sprachliche Merkmale als social identifier zu verwenden, um sich der gleichen Peergroup zugehörig zu zeigen und können dadurch zu einem Sprachwandel beitragen.
Die Stadt Wien hat sich in den vergangenen 30 Jahren stark verändert; so ist die Bevölkerung um 15% gestiegen und mit ihr auch die Anzahl der gesprochenen Sprachen. Laut einer Erhebung der Arbeiterkammer werden in Wien ca. 100 verschiedene Sprachen verwendet und man kann Wien nicht absprechen, weiterhin als ein Schmelztiegel verschiedenster Sprachen und Kulturen in Mitteleuropa zu gelten. Dass sich diese gesellschaftlichen bzw. gesellschaftspolitischen Veränderungen nicht nur im lexikalischen Sprachgebrauch der WienerInnen widerspiegeln, sondern ebenso in ihrer physiologischen Stimme zum Ausdruck kommen, soll hier den Ausgangspunkt der Studie darstellen.
In dieser Untersuchung wird die Stimme als der physiologische und im Vokaltrakt modulierter Schall zur Lautäußerungen des Menschen gesehen. Die Stimme kann abgesehen davon auch als Ort des verkörperlichten Herz der gesprochenen Sprache gelten, die den Körper durch Indexikalität im sozialen Raum verankert. Als Vehikel der persönlichen Identität kann die Stimme nicht nur soziokulturelle, sondern auch gesellschaftspolitische Merkmale (bspw. „Frauen in Führungspositionen haben eine tiefere Stimme“) widerspiegeln. Hier übernimmt die Soziophonetik eine tragende Rolle, denn sie stellt ein wichtiges Instrument dar, das es ermöglicht, den sozialen Raum und seine gesellschaftsrelevanten Diskurse mit dem Individuum zu verknüpfen.
Studien aus dem angloamerikanischen Raum wie legen nahe, dass sich die Stimme der jungen Frau in einem Wandel befindet. Das soziophonetische Stimmphänomen Vocal Fry hat sich inzwischen im angloamerikanischen Raum zum prominenten Sprachmerkmal junger, gebildeter und urbanen Frauen entwickelt.
Basierend auf zwei Korpora soll eine Longitudinalstudie entstehen, die nachskizziert, inwiefern sich die Stimme der jungen Wienerin geändert hat. Soziophonetische Studien zu Frauenstimmen gibt es in Österreich nicht, vor allem in Hinsicht auf die angestrebte Qualität der Studie. Durch ihren longitudinalen Charakter kann sie aufzeigen, in wie weit das gesellschaftliche Geschehen Einfluss auf die Stimme der Frau ausübt.
Darüber hinaus bietet diese Studie eine einmalige Gelegenheit, eine Momentaufnahme der Wienerin und ihrer Stimme zu erhalten und sie in einen historischen Kontext zu setzen.
Funded by the Vienna Science and Technology Fund (WWTF) within the "Mathematics and …2016" Call (MA16-053)
Principal Investigator: Georg Tauböck
Co-Principal Investigator: Peter Balazs
Duration: 01.07.2017 – 01.07.2021
Signal processing is a key technology that forms the backbone of important developments like MP3, digital television, mobile communications, and wireless networking and is thus of exceptional relevance to economy and society in general. The overall goal of the proposed project is to derive highly efficient signal processing algorithms and to tailor them to dedicated applications in acoustics. We will develop methods that are able to exploit structural properties in infinite-dimensional signal spaces, since typically ad hoc restrictions to finite dimensions do not sufficiently preserve physically available structure. The approach adopted in this project is based on a combination of the powerful mathematical methodologies frame theory (FT), compressive sensing (CS), and information theory (IT). In particular, we aim at extending finite-dimensional CS methods to infinite dimensions, while fully maintaining their structure-exploiting power, even if only a finite number of variables are processed. We will pursue three acoustic applications, which will strongly benefit from the devised signal processing techniques, i.e., audio signal restoration, localization of sound sources, and underwater acoustic communications. The project is set up as an interdisciplinary endeavor in order to leverage the interrelations between mathematical foundations, CS, FT, IT, time-frequency representations, wave propagation, transceiver design, the human auditory system, and performance evaluation.
compressive sensing, frame theory, information theory, signal processing, super resolution, phase retrieval, audio, acoustics
Scientific and Technological Cooperation between Austria and Serbia (SRB 01/2018)
Duration of the project: 01.07.2018 - 30.06.2020
Acoustics Research Institute, ÖAW (Austria)
University of Vienna (Austria)
University of Novi Sad (Republic of Serbia)
Project website: http://nuhag.eu/anacres
The auditory system constantly monitors the environment to protect us from harmful events such as collisions with approaching objects. Auditory looming bias is an astoundingly fast perceptual bias favoring approaching compared to receding auditory motion and was demonstrated behaviorally even in infants of four months in age. The role of learning in developing this perceptual bias and its underlying mechanisms are yet to be investigated. Supervised learning and statistical learning are the two distinct mechanisms enabling neural plasticity. In the auditory system, statistical learning refers to the implicit ability to extract and represent regularities, such as frequently occurring sound patterns or frequent acoustic transitions, with or without attention while supervised learning refers to the ability to attentively encode auditory events based on explicit feedback. It is currently unclear how these two mechanisms are involved in learning auditory spatial cues at different stages of life. While newborns already possess basic skills of spatial hearing, adults are still able to adapt to changing circumstances such as modifications of spectral-shape cues. Spectral-shape cues are naturally induced when the complex geometry especially of the human pinna shapes the spectrum of an incoming sound depending on its source location. Auditory stimuli lacking familiarized spectral-shape cues are often perceived to originate from inside the head instead of perceiving them as naturally external sound sources. Changes in the salience or familiarity of spectral-shape cues can thus be used to elicit auditory looming bias. The importance of spectral-shape cues for both auditory looming bias and auditory plasticity makes it ideal for studying them together.
Born2Hear will combine auditory psychophysics and neurophysiological measures in order to 1) identify auditory cognitive subsystems underlying auditory looming bias, 2) investigate principle cortical mechanisms for statistical and supervised learning of auditory spatial cues, and 3) reveal cognitive and neural mechanisms of auditory plasticity across the human lifespan. These general research questions will be addressed within three studies. Study 1 will investigate the differences in the bottom-up processing of different spatial cues and the top-down attention effects on auditory looming bias by analyzing functional interactions between brain regions in young adults and then test in newborns whether these functional interactions are innate. Study 2 will investigate the cognitive and neural mechanisms of supervised learning of spectral-shape cues in young and older adults based on an individualized perceptual training on sound source localization. Study 3 will focus on the cognitive and neural mechanisms of statistical learning of spectral-shape cues in infants as well as young and older adults.
Project investigator (PI): Robert Baumgartner
Project partner / Co-PI: Brigitta Tóth, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
Supported by Austrian Science Fund (FWF, I 4294-B) and NKFIH.
Normal-hearing (NH) listeners use two binaural cues, the interaural time difference (ITD) and the interaural level difference (ILD), for sound localization in the horizontal plane. They apply frequency-dependent weights when combining them to determine the perceived azimuth of a sound source. Cochlear implant (CI) listeners, however, rely almost entirely on ILDs. This is partly due to the properties of current envelope-based CI-systems, which do not explicitly encode carrier ITDs. However, even if they are artificially conveyed via a research system, CI listeners perform worse on average than NH listeners. Since current CI-systems do not reliably convey ITD information, CI listeners might learn to ignore ITDs and focus on ILDs instead. A recent study in our lab provided first evidence that such reweighting of binaural cues is possible in NH listeners.
This project aims to further investigate the phenomenon: First, we will test whether a changed ITD/ILD weighting will generalize to different frequency regions. Second, the effect of ITD/ILD reweighting on spatial release from speech-on-speech masking will be investigated, as listeners benefit particularly from ITDs in such tasks. And third, we will test, whether CI listeners can also be trained to weight ITDs more strongly and whether that translates to an increase in ITD sensitivity. Additionally, we will explore and evaluate different training methods to induce ITD/ILD reweighting.
The results are expected to shed further light on the plasticity of the binaural auditory system in acoustic and electric hearing.
Start: October 2018
Duration: 3 years
Funding: uni:docs fellowship program for doctoral candidates of the University of Vienna
We thank the FWF for supporting the project – grant number I 4299-N32
Sound source localisation methods are widely used in the automotive, railway, and aircraft industries. Many different methods are available for the analysis of sound sources at rest. However, methods for the analysis of moving sound sources still suffer from the complexities introduced by the Doppler frequency shift, the relatively short measuring times, and propagation effects in the atmosphere. The project LION combines the expertise of four research groups from three countries working in the field of sound source localisation: The Beuth Hochschule für Technik Berlin (Beuth), the Turbomachinery and Thermoacoustics chair at TU-Berlin (TUB), the Acoustic Research Institute (ARI) of the Austrian Academy of Sciences in Vienna and the Swiss laboratory for Acoustics / Noise Control of EMPA. The mentioned institutions cooperate to improve and extend the existing methods for the analysis of moving sound sources. They want to increase the dynamic range, the spatial, and the frequency resolution of the methods and apply them to complex problems like the analysis of tonal sources with strong directivities or coherent and spatially distributed sound sources.
The partners want to jointly develop and validate these methods, exploiting the synergy effects that arise from such a partnership. Beuth plans to extend the equivalent source method in frequency domain to moving sources located in a halfspace, taking into account the influence of the ground and sound propagation through an inhomogeneous atmosphere. ARI contributes acoustic holography, principal component analysis, and independent component analysis methods and wants to use its experience with pass-by measurements for trains to improve numerical boundary element methods including the transformation from fixed to moving coordinates. TUB develops optimization methods and model based approaches for moving sound sources and will contribute its data base of fly-over measurements with large microphone arrays as test cases. EMPA contributes a sound propagation model based on Time Variant Digital Filters with particular consideration of turbulence and ground effects and will also generate synthetic test cases for the validation of sound source localization algorithms. The project is planned for a period of three years. The work program is organized in four work packages: 1) the development of algorithms and methods, 2) the development of a virtual test environment for the methods, 3) the simulation of virtual test cases, and 4) the application of the new methods to existing test cases of microphone array measurements of trains and aircraft.