Project

  • Soziolekte in Wien - die mittelbairischen Varietäten

  • Speaker Clustering

    Objective:

    In speaker identification and speaker verification, wrong classifications can result from a high similarity between speakers that is represented in the speaker models. These similarities can be explored using the application of cluster analysis.

    Method:

    In speaker detection, every speaker is represented as a Gaussian Mixture Model (GMM). By using a dissimilarity measure for these models (e.g. cross-entropy), cluster analysis can be applied. Hierarchical agglomerative clustering methods are able to show structures in the form of a dendrogram.

    Application:

    Structures in speech corpora can be visualized and can therefore be used to select groups of highly similar or dissimilar speakers. The investigation of the structures concerning the aspect of misclassification can lead to model generation improvements.

  • Spectral cues in auditory localization with cochlear implants

    Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

  • Spectral Cues in Auditory Localization with Cochlear Implants (CI HRTF)

    Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

    Method:

    In normal hearing listeners, localization within the sagittal planes is mediated when the pinna (outer ear) evaluates the spectral coloring of incoming waveforms at higher frequencies. Current CI systems do not provide these so-called pinna cues (or spectral cues), because of behind-the-ear microphone placement and the processor's limited analysis-frequency range.

    While these technical limitations are relatively manageable, some fundamental questions arise:

    • What is the minimum number of channels required to encode the pinna cues relevant to vertical plane localization?
    • To what extent can CI listeners learn to localize sound sources using pinna cues that are mapped to tonotopic regions associated with lower characteristic frequencies (according to the position of typically implanted electrodes)?
    • Which modifications of stimulation strategies are required to facilitate the localization of sound sources for CI listeners?

    Application:

    The improvement of sound source localization in the front-back dimension is regarded as an important aspect in daily traffic safety.

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Status:

    Finished in Sept. 2010

    Subprojects:

    • ElecRang: Effects of upper-frequency boundary and spectral warping on speech intelligibility in electrical stimulation
    • SpecSens: Sensitivity to spectral peaks and notches
    • Loca-BtE-CI: Localization with behind-the-ear microphones
    • Loca Methods: Pointer method for localizing sound sources
    • Loca#Channels: Number of channels required for median place localization
    • SpatStrat: Development and evaluation of a spatialization strategy for cochlear implants
    • HRTF-Sim: Numerical simulation of HRTFs
  • Spectral Transform of Sound Fields (STSF)

    Objective:

    The Spatial Transform of Sound Fields (STSF) is an extension of the acoustic holography that enables the handling of incoherent sound sources.

    Method:

    The Karhunen Loeve Expansion or Principal Component Analysis (PCA) method is used to separate the random field recorded at different microphone positions into coherent components. Again, acoustic holography is used to transform every component from the measurement plane into a plane in arbitrary depth. If needed, the total incoherent sound field in the chosen depth can be reconstructed.

    Application:

    Localization of incoherent sound sources near the hull of the structure.

  • SpExCue: Role of spectral cues in sound externalization - objective measures & modeling

    Baumgartner et al. (2017a)

    Räumliches Hören ist wichtig, um die Umgebung ständig auf interessante oder gefährliche Geräusche zu überwachen und gezielt die Aufmerksam auf sie richten zu können. Die räumliche Trennung der beiden Ohren und die komplexe Geometrie des menschlichen Körpers liefern akustische Information über den Ort einer Schallquelle. Je nach Schalleinfallsrichtung verändert v.a. die Ohrmuschel das Klangspektrum, bevor der Schall das Trommelfell erreicht. Da die Ohrmuschel sehr individuell geformt ist (mehr noch als ein Fingerabdruck), ist auch deren Klangfärbung sehr individuell. Für die künstliche Erzeugung realistischer Hörwahrnehmungen muss diese Individualität so präzise wie nötig abgebildet werden, wobei bisher nicht geklärt ist, was wirklich nötig ist. SpExCue hat deshalb nach elektrophysiologischen Maßen und Vorhersagemodellen geforscht, die abbilden können, wie räumlich realistisch („externalisiert“) eine virtuelle Quelle empfunden wird.

    Da künstliche Quellen vorzugsweise im Kopf wahrgenommen werden, eignete sich die Untersuchung dieser Klangspektren zugleich zur Erforschung einer Verzerrung in der Hörwahrnehmung: Schallereignisse, die sich dem Hörer annähern, werden intensiver wahrgenommen als jene, die sich vom Hörer entfernen. Frühere Studien zeigten diese Verzerrung ausschließlich durch Lautheitsänderungen (zunehmende/abnehmende Lautheit wurde verwendet um sich nähernde/entfernende Schallereignisse zu simulieren). Es war daher unklar, ob die Verzerrung wirklich auf Wahrnehmungsunterschiede gegenüber der Bewegungsrichtung oder nur auf die unterschiedlichen Lautstärken zurück zu führen sind. Unsere Studie konnte nachweisen, dass räumliche Änderungen der Klangfarbe diese Verzerrungen (auf Verhaltensebene und elektrophysiologisch) auch bei gleichbleibender Lautstärke hervorrufen können und somit von einer allgemeinen Wahrnehmungsverzerrung auszugehen ist.

    Des Weiteren untersuchte SpExCue, wie die Kombination verschiedener räumlicher Hörinformation die Aufmerksamkeitskontrolle in einer Spracherkennungsaufgabe mit gleichzeitigen Sprechern, wie z.B. bei einer Cocktailparty, beeinflusst. Wir fanden heraus, dass natürliche Kombinationen räumlicher Hörinformation mehr Gehinraktivität in Vorbereitung auf das Testsignal herrufen und dadurch die neurale Verarbeitung der zu folgenden Sprache optimiert wird.

    SpExCue verglich außerdem verschiedene Ansätze von Berechnungsmodellen, die darauf abzielen, die räumliche Wahrnehmung von Klangänderungen vorherzusagen. Obwohl viele frühere experimentelle Ergebnisse von mindestens einem der Modellansätze vorhergesagt werden konnten, konnte keines von ihnen all diese Ergebnisse erklären. Um das zukünftige Erstellen von allgemeingültigeren Berechnungsmodellen für den räumlichen Hörsinn zu unterstützen, haben wir abschließend ein konzeptionelles kognitives Modell dafür entwickelt.

    Funding

    Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

    Follow-up funding provided by Facebook Reality Labs, since March 2018. Project Investigator: Robert Baumgartner.

    Publications

    • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
    • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
    • Deng, Y., Choi, I., Shinn-Cunningham, B., Baumgartner, R. (2019): Impoverished auditory cues limit engagement of brain networks controlling spatial selective attention, in: Neuroimage 202, 116151. (article)
    • Baumgartner, R., Majdak, P. (2019): Predicting Externalization of Anechoic Sounds, in: Proceedings of ICA 2019. (proceedings)
    • Majdak, P., Baumgartner, R., Jenny, C. (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)
  • Stochastic Transformation Methods (Acoustics and Vibration)

    Objective:

    In the past a FWF project dealing with the basics of Stochastic Transformation Methods was executed at the ARI. Explicitly the Karhunen Loeve Expansion and the Transformation of a polynomial Chaos were applied in the wave number domain. The procedure is based on the assumption of Gaussian distributed variables. This assumption shall be generalized to arbitrary random variables.

    Method:

    The assumption of a wave number domain limits the model to a horizontally layered half space. This limitation shall be overcome by Wavelets kernels in the transformation instead of Fourier kernels. The aim is the possibility to calculated one sided statistical distributions for the physical parameters and arbitrary boundaries with the new method.

  • STx Database Functions

    Project Objective:

    This project aims to implement generally applicable database functions in STOOLS-STx as, for example, the access to sound and metadata management of segments and annotations (list, sort, select etc.). The essence (sound data), segmentations, and manually compiled and calculated annotations (e.g. wave band level) are the basis of an integrated sound database. The essential demand is that the original sound data remain unchanged. Therefore the metadata have to be filed in separate *.xml files. In this way, a dynamic management of the metadata will be possible, but only the sound files should be opened in a write-protected way.

    Method:

    The segment lists implemented in STx keep all the annotations linked to each individual sound segment. The segments stay in the context of the continuous sound recordings and enable segment addressing, which is made of the numeric terms, treatment, and tapping of the segments (including the acoustic surrounding).

    Application:

    Signal databases are the basis for practically all applications that use realistic sound material. The time frequency representation, statistics, main component analysis, cluster analysis, etc. belong to these signal databases. Furthermore, signal databases are used for the realization of subjective evaluations and psychoacoustic experiments. The STx databases use more than 100 sound files and thousands of segments in a very short time.

    Ref.:

    PACS: 43.50.Rq; Project: NOIDESc: Deskriptoren zur Bewertung von Lärmsignalen (FFG-809085, bmvit-isb2). PACS: 43.72.Fx; Project: Akustische Phonetik, Sprechererkennung.

  • STx Framework zur Analyse, Resynthese und graphischen Modifikation von Signalen

    Beschreibung

    Entwicklung eines Frameworks zur Analyse und Resynthese (Phasenvocoder) und zur graphisch unterstützten Modifikation von Signalen. Die Implementierung erfolgt als STx Script.

    Zur Realisierung dieses Projekts wird die STx Grafik um ein allgemeines Selektions-Tool erweitert (Selektion beliebiger Bereiche). Außerdem wird ein File-Objekt entwickelt, das gleichzeitig vom Script (zur Datengenerierung bzw. Modifikation) und von der Grafik verwendet werden kann.

  • STx Graph Rotation and Flip

    Objective:

    Extend the S_TOOLS-STx graphics library to enable graphs to be rotated and flipped horizontally or vertically without having to recalculate the data.

    Method:

    The base graphics object class was modified, adding rotation and flip settings. A new macro command was implemented to set a graph's rotation-and-flip values. Each graphic object was then modified to display itself correctly according to these settings.

    Application:

    Rotating and flipping graphical representations of data can now be achieved with on-macro command, rather than having to modify the data itself.

  • STx Implementation of a Script Console

    Description:

    A console is extremely helpful when developing scripts and applications. This new script console can execute (almost) all STx commands and allows access to variables and objects. The console was implemented with the help of an extended edit control and a macro class. Command line oriented versions of current STx functionality will be developed as and when needed.

    The script debugger was extended during the console development, and a number of thread synchronisation problems were solved.

  • STx ManualScript: Script Programmer's Handbook

    Description:

    The "STx Script Programming" guide written for internal workshops has been integrated into the online help with the title "Becoming an STx Guru (in 538.338 simple steps)". This chapter contains a general description of the STx script language (including tips and tricks) and an overview of several important STx components and attributes (e.g. shell items).

  • STx MethodSprach: Speech Parameter Extraction

    Description:

    The methods for extracting speech parameters, most notably formants have been improved. Firstly, a new method was implemented, which measures formants taking some formant attributes (a model) into account. The model uses frequency range and rate of change attributes. Tracking is limited to the voiced parts of the signal. A test version of this method was integrated into the STx speech analysis application SPEXL.

  • STx Monitoring System

    Project Objective:

    The project aims to implement an automatic sound recording system that would allow continuous sound recordings for any length of time (several weeks) without user intervention. The long-term investigation of sound data is used for the observation of noise emission from machines in continuous operation and for the documentation of noise situations. The hardware and system complexity are to be restricted to standard measurement microphones and standard PCs or for continuously running, suitable laptops with external disk storage units.

    Method:

    The recorder contained in the standard S_TOOLS-STx software package is controlled by the macro programming so that the sound files (about the length of one hour) are generated consecutively with date-and-time specification and then written on the hard disk. The intelligent segmentation algorithms insert automatic "tags" and annotations in real-time or in post-processing. The segmentation data are administrated dynamically and enable the direct inclusion of the recorded sound events in the sound data files, signal analytical processes, and statistical processes. Using the currently available storage units, e.g. disk storage on the scale of 1.28 TByte, continuous measuring (2-canal stereo, 44.1. kHz, 16bit) over a period of 2.5 months is possible.

    Application:

    For the investigation of noise emissions, e.g. traffic and environmental noise, permanent control stations that measure all sound sources in their time context are needed. Only with a broad analysis of the whole situation can noise pollution and health risks be collected.

    Ref.:

    PACS: 43.50.Rq; Project: NOIDESc: Deskriptoren zur Bewertung von Lärmsignalen (FFG-809085, bmvit-isb2).

  • STx Sampling Rate Converter

    Description:

    Implement a method for up-sampling and down-sampling a signal. The conversion is done by resampling the median of the original signal after applying an ideal TP filter (sinx/x). The method is implemented as an SPAtom and integrated into the SPEXL speech analysis script.

    Application:

    Convert recordings with unusual sampling rates, match recordings with different sampling rates, and correct 'incorrect' sampling rates.

  • STx SegmTrans: Segmentation and Transcription Tool

    Objective:

    STx includes a number of applications with which one can segment and transcribe signals. All of these applications, however, have additional functionality not necessary for transcription (e.g. parameter extraction), and have not been optimised with transcription in mind. Since transcription and segmentation represents a large part of the time spent on speach analysis, this projects aim is to develop an application which includes everything to need for the job, and is easy to use. The application is included in the speach analysis script SPEXL.

    Features:

    • waveform and spectrogram signal representations
    • convenient signal bracketing, zoom and playback functionality
    • existing segments displayed in graphs and as list
    • direct input of text and metadata
    • configurable attributes (segment templates)
    • interface to all STx analysis applications
    • hotkeys for all essential functions
  • STx SpektrTrans: Integration of New Spectral Transformations

    Description:

    The time/frequency transformations (wavelets, Cohen's class distributions developed in earlier projects (2006/2007) have been integrated into the spectrogram and parameter viewer in STx.

  • SysBahnlärm

    Objective:

    SysBahnLärm was a joint project of the ARI with the TU Vienna the Austrian Railways and industrial partners funded by the FFG as well as the ÖBB. Aim of the project was to create a handbook on the systemic reduction of railway noise. The ARI was responsible for the psychoacoustic evaluation of the effects of noise from wheels with different roughness and of different noise reduction systems e.g. rail damping systems. Further, the ARI investigated the emission pattern of the rail-wheel contact using our 64-channel microphone array.

    Method:

    Using measured train pass-by signals, a psychoacoustic testing procedure was developed and stimuli for this test were selected. Subjects had to rate the relative annoyance of different trains or different noise reduction systems with respect to each other.
    For investigating the rail-wheel contact, a beamforming technique was used in order to determine the point of the maximal emission relative to the top of the rail.

    Application:

    The handbook should act as a guideline for the different noise reduction measures and their respective advantages and problems.

  • TFMask: Time Frequency Masking

    Objective:

    A Gaussian Atom is suitable as an ideal atom for the time frequency representation of the human audio perception. This is not only because of the Gaussian Atom's special mathematic features, but also because of results from existing psychoacoustic studies. Developing a time-frequency mask (occlusion) requires testing the time-frequency masking effects of this atom. So far, short-tape limited signals have not been investigated in masking experiments. Relatively few psychoacoustic experiments have been explored completely, and these have been combined with time-frequency effects.

    Method:

    In cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille, an experimental protocol was developed for testing the time-frequency method of a singular Gaussian atom. Experiments were made for the first time in 2006, and gave the first results concerning the hearing threshold and the existence of such a signal. The experiments that included the masking threshold began as a PhD project before the end of 2006 in Marseille.

    Application:

    Efficient implementation of a masking filter offers many applications:

    • Sound / Data Compression
    • Sound Design
    • Back-and-Foreground Separation
    • Optimization of Speech and Music Perception

    After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx

    Subprojects:

    • Amadée: Time Frequency Representations and Auditory Perception
    • Cotutelle de thèse
    • Experiments studying additivity of masking for multiple maskers

    Funding:

    WTZ (project AMADEUS)

    Publications:

    • Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference.
  • TFMaskEval: Time Frequency Masking: Gabor Multiplier Models and Evaluation

    Objective:

    The most basic model for convolution algorithms is an extension of the simultaneous irrelevance model. A triangle-like function describes the masking effect in the frequency and time direction. Combined, they result in a 2-D function, which is used as convolution on the time-frequency coefficients of the given signal. The resulting information is then used to calculate a threshold function. This can be implemented as a Gabor multiplier. This very simple function should be exchanged for a more elaborate 2-D kernel. A more elaborate 2-D kernel can be developed from the first time frequency masking effect measurements of a Gaussian atom.

    Method:

    An extension of the simultaneous irrelevance model is used as the most basic model for the convolution algorithm under investigation. A triangle-like function describes the masking effect in the frequency and time direction. Combined, they result in a 2-D function, which is used as convolution on the time-frequency coefficients of the given signal to calculate a threshold function. This can be implemented as a Gabor multiplier. This very simple function should be exchanged for a more elaborate 2-D kernel developed from the first time-frequency masking effect measurements of a Gaussian atom.

    Application:

    After thoroughly testing this algorithm in psychoacoustic experiments, it will be implemented in STx.

    Partners:

    • R. Kronland-Martinet, S. Ytad, T. Necciari, Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux of the LMA / CRNS Marseille
    • S. Meunier, S. Savel, Acoustique perceptive et qualité de l’environnement sonore of the LMA / CRNS Marseille

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC.