Project

  • Objective:

    The identification of the parameters of the vocal tract system can be used for speaker identification.

    Method:

    A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.

    A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.

    We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.

    Evaluation:

    In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:

    Re-Synthesized Sound

    Original Sound

    Publications:

  • French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

    Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

    Running period: 2014-2017 (project started on March 1, 2014).

    Abstract:

    One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

    1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
    2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

    Working program:

    POTION is structured in three main tasks:

    1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
    2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
    3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

    More information on the project can be found on the POTION web page.

    Publications:

    • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

    Related topics investigated at the ARI:

  • Objective:

    Numerous implementations and algorithms for time frequency analysis can be found in literature or on the internet. Most of them are either not well documented or no longer maintained. P. Soendergaard started to develop the Linear Time Frequency Toolbox for MATLAB. It is the goal of this project to find typical applications of this toolbox in acoustic applications, as well as incorporate successful, not-yet-implemented algorithms in STx.

    Method:

    The linear time-frequency toolbox is a small open-source Matlab toolbox with functions for working with Gabor frames for finite sequences. It includes 1D Discrete Gabor Transform (sampled STFT) with inverse. It works with full-length windows and short windows. It computes the canonical dual and canonical tight windows.

    Application:

    These algorithms are used for acoustic applications, like formants, data compression, or de-noising. These implementations are compared to the ones in STx, and will be implemented in this software package if they improve its performance.

    Partners:

    • H. G. Feichtinger et al., NuHAG, Faculty of Mathematics, University of Vienna
    • B. Torrèsani, Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • P. Soendergaard, Department of Mathematics, Technical University of Denmark
  • Objective:

    If measurements are possible only at the hull of a machine, a tool is needed to separate the dominating near-field components from the far-field components. This, in turn, allows the far-field levels to be estimated. The separation is often not possible using spectral methods, because both components have nearly the same frequency. Using a limited number of microphones, a modal separation is also impossible. Instead of a modal analysis, a principal component analysis is applied.

    Method:

    The narrow-band Fourier transform method is used, and a separate analysis is conducted for each frequency. The cross-power matrix spanning all microphone positions is used. The components are then calculated using the PCA. As long as the modes at the microphone positions have different relative values, PCA can be used to separate them. In an initial test, the far field is observed and the transfer function for every component from the near field to the far field is estimated. These transfer functions are assumed to be constant in time. They are used for the estimation of the overall far-field level.

    Application:

    Observation of the far-field level of machines.

  • Objective:

    The sensitivity of normal hearing listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). Cochlear implant (CI) listeners show a similar limitation in ITD sensitivity with respect to the rate of unmodulated pulse trains containing ITD. Unfortunately, such high rates are needed to appropriately sample the modulation information of the acoustic signal. This study tests the ideas that (1) similar "binaural adaptation" mechanisms are limiting the performance in both subject groups, (2) the effect is related to the periodicity of pulse trains, and (3) introducing jitter (randomness) into the pulse timing causes a recovery from binaural adaptation and thus improves ITD sensitivity at higher pulse rates.

    Method and Results:

    These ideas have been studied by testing the ITD sensitivity of five CI listeners. The parameters' pulse rate, amount of jitter (where the minimum represents the periodic condition), and ITD were all varied. We showed that introducing binaurally synchronized jitter in the stimulation timing causes large improvements in ITD sensitivity at higher pulse rates (? 800 pps). Our experimental results demonstrate that a purely temporal trigger can cause recovery from binaural adaptation.

    Application:

    Applying binaurally jittered in stimulation strategies may improve several aspects of binaural hearing in bilateral recipients of CIs, including localization of sound sources and speech segregation in noise.

    Funding:

    Internal

    Publications:

    • Laback, B., and Majdak, P. (2007). Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, Proc Natl Acad Sci USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008). Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc Natl Acad Sci USA 12, 105, 32.
    • Laback, B., and Majdak, P. (2007). Binaural stimulation in neural auditory prostheses or hearing aids, provisional US und EP patent application (submitted 20.06.07).
  • Objective:

    The sensitivity of normal hearing (NH) listeners to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In another study (Laback and Majdak, 2008), it was hypothesized that introducing binaural jitter may improve ITD sensitivity in bilateral cochlear implant (CI) listeners by avoiding periodicity. Indeed, the results showed large improvements at high rates (≥ 800 pps). This was interpreted as an indication for a recovery from binaural adaptation. 

    In this study, we further investigated this effect using NH subjects. We attempted to understand the underlying mechanisms by applying a well-established model of peripheral auditory processing. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with a pulse rate of 600 pps were used at a nominal pulse rate of 600 pulses per second (pps). It was found that randomly jittering the timing of the pulses significantly increases detectability of the ITD. A second experiment was performed to observe the effect of place and rate for pulse trains. It was shown that ITD sensitivity for jittered pulse trains at 1200 pps were significantly higher than periodic pulse trains at 600 pps. Therefore, with the addition of jitter, listeners were not solely benefiting from the longest interpulse intervals and instances of reduced rate. A third experiment, using a 900 pps pulse train, confirmed the improvement in ITD sensitivity. This occurred even when random amplitude modulation, a side-effect in the case of large amounts of jitter, is ruled out. A model of peripheral auditory processing up to the brain stem (Nucleus Cochlearis) has been applied to study the mechanisms underlying the improvements in ITD sensitivity. It was found that the irregular timing of the jittered pulses increases the synchrony of firing of the cochlear nucleus. These results suggest that a recovery from binaural adaptation activated by a temporal irregularity is possibly occurring at the level of the cochlear nucleus.

    Application:

    Together with the results of Laback and Majdak (2008) on the effect of binaural jitter in CI listeners, these results suggest that the binaural adaptation effect first observed by Hafter and Dye (1983) is related to the synchrony of neural firings across auditory nerve fibers. The nerve fibers, in turn, innervate cochlear nucleus cells. At higher rates, periodic pulse trains result in little synchrony of the response to the ongoing signal. Jittering the pulse timing increases the probability of synchronous firing across AN fibers at certain instances of time. Further studies are required to determine if other aspects of binaural adaptation can also be attributed to this explanation. 

    Funding:

    Internal

    Publications:

    • Goupell, M. J., Laback, B., Majdak, P. (2009): Enhancing sensitivity to interaural time differences at high modulation rates by introducing temporal jitter, in: J. Acoust. Soc. Am. 126, 2511-2521.
    • Laback, B., and Majdak, P. (2007): Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates, in: Proc. Natl. Acad. Sci. USA (PNAS) 105, 2, 814-817.
    • Laback, B., and Majdak, P. (2008): Reply to van Hoesel: Binaural jitter with cochlear implants, improved interaural time-delay sensitivity, and normal hearing, letter to Proc. Natl. Acad. Sci. USA 12, 105, 32.
  • Objective:

    Normal hearing (NH) listener sensitivity to interaural time differences (ITD) in the envelope of high-frequency carriers is limited with respect to the envelope modulation rate. Increasing the envelope rate reduces the sensitivity, an effect that has been termed binaural adaptation (Hafter and Dye, 1983). In other studies (Laback and Majdak, 2008; Goupell et al., 2008), it has been shown that introducing binaural jitter improves ITD sensitivity at higher rates in bilateral cochlear implant (CI) listeners as well as in NH listeners. The results were interpreted in terms of a recovery from binaural adaptation. Sensorineural hearing impairment often results in reduced ITD sensitivity (e.g. Hawkins and Wightman, 1980). The present study investigates if a similar recovery from binaural adaptation, and thus an improvement in ITD sensitivity, can be achieved in hearing impaired listeners. 

    Method and Results:

    Bandpass-filtered clicks (4 kHz) with pulse rates of 400 and 600 pulses per second (pps) are used. Different amounts of jitter (the minimum representing the periodic condition) and different ITDs are tested. Listeners with a moderate cochlear hearing loss are selected. Additional stimuli tested are bandpass-filtered noise bands at 4 kHz and low-frequency stimuli at 500 Hz (sinusoids, SAMs, noise bands  and jittered pulse trains). The levels of the stimuli are adjusted in pretests to achieve a centered auditory image at a comfortable loudness.

    Data collected so far show improvements in ITD sensitivity in some individuals but not in others.

    Application:

    The results may lead to the design of a new hearing aid processing algorithm that attempts to improve ITD sensitivity.

    Funding:

    Internal

  • This project consists of three subprojects:

    1.1 Frame & Gabor Multiplier:

    Recently Gabor Muiltipliers have been used to implement time-variant filtering as Gabor Filters.  This idea can be further generalized. To investigate the basic properties of such operators the concept of abstract, i.e. unstructured, frames is used. Such multipliers are operators, where a certain fixed mask, a so-called symbol, is applied to the coefficients of frame analysis , whereafter synthesis is done. The properties that can be found for this case can than be used for all kind of frames, for example regular and irregular Gabor frames, wavelet frames or auditory filterbanks.
     
    The basic definition of a frame multiplier follows: 
    FrameMultiplier
    As special case of such multipliers such operators for irregular Gabor system will be investigated and implemented. This corresponds to a irregular sampled Short-Time-Fourier-Transformation. As application  an STFT correpsonding to the bark scale can be examined.
    This mathematical and basic research-oriented project is important for many other projects like time-frequency-masking or system-identification.

    References:

    • O. Christensen, An Introduction To Frames And Riesz Bases, Birkhäuser Boston (2003)
    • M. Dörfler, Gabor Analysis for a Class of Signals called Music, Dissertation Univ. Wien (2002)
    • R.J. Duffin, A.C. Schaeffer, A Class of nonharmonic Fourier series, Trans.Amer.Math.Soc., vol.72, pp. 341-366 (1952)
    • H. G. Feichtinger, K. Nowak, A First Survey of Gabor Multipliers, in H. G. Feichtinger, T. Strohmer

    Dokumente:

    Kooperationen:

  • Objective:

    Standard noise mapping software use geometrical approaches to determine insertion loss for a noise barrier. These methods are not well suited for evaluating complex geometries e.g. curved noise barriers or noise barriers with multiple refracting edges. Here, we aim at deriving frequency and source- as well as receiver-position dependent adjustments using the boundary element method. Further, the effect of absorbing layers will be investigated as a function of the geometry. Results will be incorporated into a standard noise mapping software.

    Method:

    The cross-sections of different geometries are first parameterized and discretized and then evaluated using two-dimensional boundary element simulations. The BEM code was developed at our institute. Different parameter sets are evaluated in order to derive the adjustments for the specific geometries compared to a straight noise barrier. To make the simulations more realistic, a grassland impedance model is used instead of a fully reflecting half plane. Simulations will also be evaluated using measurements from actual noise barriers.

    Wirkung einer T-Wand bei 800 Hz

    Project partners:

    • TAS Schreiner (measurements)
    • Soundplan (implementation in sound mapping software)

    Funding:

    This project is funded from the VIF2011 call of the FFG (BMVIT, ASFINAG, ÖBB)

  • Objective and Methods:

    Spectral peaks and notches are important cues that normal hearing listeners use to localize sounds in the vertical planes (the front/back and up/down dimensions). This study investigates to what extent cochlear implant (CI) listeners are sensitive to spectral peaks and notches imposed upon a constant-loudness background. 

    Results:

    Listeners could always detect peaks, but not always notches. Increasing the bandwidth beyond two electrodes showed no improvement in thresholds. The high-frequency place was significantly worse than the low and middle places; although, listeners had highly-individual tendencies. Thresholds decreased with an increase in the height of the peak. Thresholds for detecting a change in the frequency of a peak or notch were approximately one electrode. Level roving significantly increased thresholds. Thus, there is currently no indication that CI listeners can perform a "true" profile analysis. Future studies will explore if adding temporal cues or roving the level in equal loudness steps, instead of equal-current steps (as in the present study), is relevant for profile analysis.

    Application:

    Data on the sensitivity to spectral peaks and notches are required to encode spectral localization cues in future CI stimulation strategies. 

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Publications:

    • Goupell, M., Laback, B., Majdak, P., and Baumgartner, W. D. (2008). Current-level discrimination and spectral profile analysis in multi-channel electrical stimulation, J. Acoust. Soc. Am. 124, 3142-57.
    • Goupell, M. J., Laback, B., Majdak, P., and Baumgartner, W-D. (2007). Sensitivity to spectral peaks and notches in cochlear implant listeners, presented at Conference on Implantable Auditory Prostheses (CIAP), Lake Tahoe.
  • The spatially oriented format for acoustics (SOFA) is dedicated to store all kinds of acoustic informations related to a specified geometrical setup. The main task is to describe simple HRTF measurements, but SOFA also aims to provide the functionality to store measurements of something fancy like BRIRs with a 64-channel mic-array in a multi-source excitation situation or directivity measurement of a loudspeaker. The format is intended to be easily extendable, highly portable, and actually the greatest common divider of all publicly available HRTF databases at the moment of writing.

    SOFA defines the structure of data and meta data and stores them in a numerical container. The data description will be a hierarchical description when coming from free-field HRTFs (simple setup) and going to more complex setups like mic-array measurements in reverberant spaces, excited by a loudspeaker array (complex setup). We will use global geometry description (related to the room), and local geometry description (related to the listener/source) without limiting the number of acoustic transmitters and receivers. Room descriptions will be available by linking a CAD file within SOFA. Networking support will be provided as well allowing to remotely access HRTFs and BRIRs from client computers.

    SOFA is being developed by many contributors worldwide. The development is coordinated at ARI by Piotr Majdak.

    Further information:

    www.sofaconventions.org.
  • Objective:

    Measuring sound absorption is essential to performing acoustic measurements and experiments under controlled acoustic conditions, especially considering the acoustic influence of room boundaries.

    So-called "in-situ" methods allow measurement of the reflection and absorption coefficients under real conditions in a single measurement procedure. The method proposed captures the direct signal and reflections in one measurement. These reflections not only include the direct, interesting one, but also others from the surroundings. To separate the reflections coming from the tested surface, the influence of the direct signal and other reflections must be cancelled.

    One known separation method uses a time-windowing technique to separate the direct signal from the reflections. When the impulse response of the direct signal and reflections overlap in time, this method is no longer satisfactory. Frequency-dependent windowing is necessary to separate the different parts of the signal. However, in the wavelet domain, it is possible to observe separation of the interesting reflection.

    The objective of this project is to study how the use of wavelet multipliers could improve the efficiency of the in-situ methods in this context .

    Method:

    A demonstrator system will be built to acquire the necessary measurements for the evaluation of absorption coefficients. This demonstrator will be used to evaluate the usefulness of the new methods in a semi-anechoic room.

    A systematic numeric study will be carried out on the acquired signals, in order to manually determine the symbol of a wavelet multiplier for the extraction of the reflected signal. The best parameters for optimal separation will then be investigated. This, in combination with the use of physical models, will help design a semi-automatic method for the calculation of the optimal multiplier symbol.

    Application:

    The improved measurement method will be available for in-situ measurement of reflection and absorption coefficients

  • Objective:

    In speaker identification and speaker verification, wrong classifications can result from a high similarity between speakers that is represented in the speaker models. These similarities can be explored using the application of cluster analysis.

    Method:

    In speaker detection, every speaker is represented as a Gaussian Mixture Model (GMM). By using a dissimilarity measure for these models (e.g. cross-entropy), cluster analysis can be applied. Hierarchical agglomerative clustering methods are able to show structures in the form of a dendrogram.

    Application:

    Structures in speech corpora can be visualized and can therefore be used to select groups of highly similar or dissimilar speakers. The investigation of the structures concerning the aspect of misclassification can lead to model generation improvements.

  • Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

  • Objective:

    Bilateral use of current cochlear implant (CI) systems allows for the localization of sound sources in the left-right dimension. However, localization in the front-back and up-down dimensions (within the so-called sagittal planes) is restricted as a result of insufficient transmission of the relevant information.

    Method:

    In normal hearing listeners, localization within the sagittal planes is mediated when the pinna (outer ear) evaluates the spectral coloring of incoming waveforms at higher frequencies. Current CI systems do not provide these so-called pinna cues (or spectral cues), because of behind-the-ear microphone placement and the processor's limited analysis-frequency range.

    While these technical limitations are relatively manageable, some fundamental questions arise:

    • What is the minimum number of channels required to encode the pinna cues relevant to vertical plane localization?
    • To what extent can CI listeners learn to localize sound sources using pinna cues that are mapped to tonotopic regions associated with lower characteristic frequencies (according to the position of typically implanted electrodes)?
    • Which modifications of stimulation strategies are required to facilitate the localization of sound sources for CI listeners?

    Application:

    The improvement of sound source localization in the front-back dimension is regarded as an important aspect in daily traffic safety.

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Status:

    Finished in Sept. 2010

    Subprojects:

    • ElecRang: Effects of upper-frequency boundary and spectral warping on speech intelligibility in electrical stimulation
    • SpecSens: Sensitivity to spectral peaks and notches
    • Loca-BtE-CI: Localization with behind-the-ear microphones
    • Loca Methods: Pointer method for localizing sound sources
    • Loca#Channels: Number of channels required for median place localization
    • SpatStrat: Development and evaluation of a spatialization strategy for cochlear implants
    • HRTF-Sim: Numerical simulation of HRTFs
  • Objective:

    The Spatial Transform of Sound Fields (STSF) is an extension of the acoustic holography that enables the handling of incoherent sound sources.

    Method:

    The Karhunen Loeve Expansion or Principal Component Analysis (PCA) method is used to separate the random field recorded at different microphone positions into coherent components. Again, acoustic holography is used to transform every component from the measurement plane into a plane in arbitrary depth. If needed, the total incoherent sound field in the chosen depth can be reconstructed.

    Application:

    Localization of incoherent sound sources near the hull of the structure.

  • Baumgartner et al. (2017a)

    Räumliches Hören ist wichtig, um die Umgebung ständig auf interessante oder gefährliche Geräusche zu überwachen und gezielt die Aufmerksam auf sie richten zu können. Die räumliche Trennung der beiden Ohren und die komplexe Geometrie des menschlichen Körpers liefern akustische Information über den Ort einer Schallquelle. Je nach Schalleinfallsrichtung verändert v.a. die Ohrmuschel das Klangspektrum, bevor der Schall das Trommelfell erreicht. Da die Ohrmuschel sehr individuell geformt ist (mehr noch als ein Fingerabdruck), ist auch deren Klangfärbung sehr individuell. Für die künstliche Erzeugung realistischer Hörwahrnehmungen muss diese Individualität so präzise wie nötig abgebildet werden, wobei bisher nicht geklärt ist, was wirklich nötig ist. SpExCue hat deshalb nach elektrophysiologischen Maßen und Vorhersagemodellen geforscht, die abbilden können, wie räumlich realistisch („externalisiert“) eine virtuelle Quelle empfunden wird.

    Da künstliche Quellen vorzugsweise im Kopf wahrgenommen werden, eignete sich die Untersuchung dieser Klangspektren zugleich zur Erforschung einer Verzerrung in der Hörwahrnehmung: Schallereignisse, die sich dem Hörer annähern, werden intensiver wahrgenommen als jene, die sich vom Hörer entfernen. Frühere Studien zeigten diese Verzerrung ausschließlich durch Lautheitsänderungen (zunehmende/abnehmende Lautheit wurde verwendet um sich nähernde/entfernende Schallereignisse zu simulieren). Es war daher unklar, ob die Verzerrung wirklich auf Wahrnehmungsunterschiede gegenüber der Bewegungsrichtung oder nur auf die unterschiedlichen Lautstärken zurück zu führen sind. Unsere Studie konnte nachweisen, dass räumliche Änderungen der Klangfarbe diese Verzerrungen (auf Verhaltensebene und elektrophysiologisch) auch bei gleichbleibender Lautstärke hervorrufen können und somit von einer allgemeinen Wahrnehmungsverzerrung auszugehen ist.

    Des Weiteren untersuchte SpExCue, wie die Kombination verschiedener räumlicher Hörinformation die Aufmerksamkeitskontrolle in einer Spracherkennungsaufgabe mit gleichzeitigen Sprechern, wie z.B. bei einer Cocktailparty, beeinflusst. Wir fanden heraus, dass natürliche Kombinationen räumlicher Hörinformation mehr Gehinraktivität in Vorbereitung auf das Testsignal herrufen und dadurch die neurale Verarbeitung der zu folgenden Sprache optimiert wird.

    SpExCue verglich außerdem verschiedene Ansätze von Berechnungsmodellen, die darauf abzielen, die räumliche Wahrnehmung von Klangänderungen vorherzusagen. Obwohl viele frühere experimentelle Ergebnisse von mindestens einem der Modellansätze vorhergesagt werden konnten, konnte keines von ihnen all diese Ergebnisse erklären. Um das zukünftige Erstellen von allgemeingültigeren Berechnungsmodellen für den räumlichen Hörsinn zu unterstützen, haben wir abschließend ein konzeptionelles kognitives Modell dafür entwickelt.

    Funding

    Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

    Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.

    Publications

    • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
    • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
    • Deng, Y., Choi, I., Shinn-Cunningham, B., Baumgartner, R. (2019): Impoverished auditory cues limit engagement of brain networks controlling spatial selective attention, in: Neuroimage 202, 116151. (article)
    • Baumgartner, R., Majdak, P. (2019): Predicting Externalization of Anechoic Sounds, in: Proceedings of ICA 2019. (proceedings)
    • Majdak, P., Baumgartner, R., Jenny, C. (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)
  • Objective:

    In the past a FWF project dealing with the basics of Stochastic Transformation Methods was executed at the ARI. Explicitly the Karhunen Loeve Expansion and the Transformation of a polynomial Chaos were applied in the wave number domain. The procedure is based on the assumption of Gaussian distributed variables. This assumption shall be generalized to arbitrary random variables.

    Method:

    The assumption of a wave number domain limits the model to a horizontally layered half space. This limitation shall be overcome by Wavelets kernels in the transformation instead of Fourier kernels. The aim is the possibility to calculated one sided statistical distributions for the physical parameters and arbitrary boundaries with the new method.

  • Project Objective:

    This project aims to implement generally applicable database functions in STOOLS-STx as, for example, the access to sound and metadata management of segments and annotations (list, sort, select etc.). The essence (sound data), segmentations, and manually compiled and calculated annotations (e.g. wave band level) are the basis of an integrated sound database. The essential demand is that the original sound data remain unchanged. Therefore the metadata have to be filed in separate *.xml files. In this way, a dynamic management of the metadata will be possible, but only the sound files should be opened in a write-protected way.

    Method:

    The segment lists implemented in STx keep all the annotations linked to each individual sound segment. The segments stay in the context of the continuous sound recordings and enable segment addressing, which is made of the numeric terms, treatment, and tapping of the segments (including the acoustic surrounding).

    Application:

    Signal databases are the basis for practically all applications that use realistic sound material. The time frequency representation, statistics, main component analysis, cluster analysis, etc. belong to these signal databases. Furthermore, signal databases are used for the realization of subjective evaluations and psychoacoustic experiments. The STx databases use more than 100 sound files and thousands of segments in a very short time.

    Ref.:

    PACS: 43.50.Rq; Project: NOIDESc: Deskriptoren zur Bewertung von Lärmsignalen (FFG-809085, bmvit-isb2). PACS: 43.72.Fx; Project: Akustische Phonetik, Sprechererkennung.