The auditory system constantly monitors the environment to protect us from harmful events such as collisions with approaching objects. Auditory looming bias is an astoundingly fast perceptual bias favoring approaching compared to receding auditory motion and was demonstrated behaviorally even in infants of four months in age. The role of learning in developing this perceptual bias and its underlying mechanisms are yet to be investigated. Supervised learning and statistical learning are the two distinct mechanisms enabling neural plasticity. In the auditory system, statistical learning refers to the implicit ability to extract and represent regularities, such as frequently occurring sound patterns or frequent acoustic transitions, with or without attention while supervised learning refers to the ability to attentively encode auditory events based on explicit feedback. It is currently unclear how these two mechanisms are involved in learning auditory spatial cues at different stages of life. While newborns already possess basic skills of spatial hearing, adults are still able to adapt to changing circumstances such as modifications of spectral-shape cues. Spectral-shape cues are naturally induced when the complex geometry especially of the human pinna shapes the spectrum of an incoming sound depending on its source location. Auditory stimuli lacking familiarized spectral-shape cues are often perceived to originate from inside the head instead of perceiving them as naturally external sound sources. Changes in the salience or familiarity of spectral-shape cues can thus be used to elicit auditory looming bias. The importance of spectral-shape cues for both auditory looming bias and auditory plasticity makes it ideal for studying them together.
Born2Hear will combine auditory psychophysics and neurophysiological measures in order to 1) identify auditory cognitive subsystems underlying auditory looming bias, 2) investigate principle cortical mechanisms for statistical and supervised learning of auditory spatial cues, and 3) reveal cognitive and neural mechanisms of auditory plasticity across the human lifespan. These general research questions will be addressed within three studies. Study 1 will investigate the differences in the bottom-up processing of different spatial cues and the top-down attention effects on auditory looming bias by analyzing functional interactions between brain regions in young adults and then test in newborns whether these functional interactions are innate. Study 2 will investigate the cognitive and neural mechanisms of supervised learning of spectral-shape cues in young and older adults based on an individualized perceptual training on sound source localization. Study 3 will focus on the cognitive and neural mechanisms of statistical learning of spectral-shape cues in infants as well as young and older adults.
Project investigator (PI): Robert Baumgartner
Project partner / Co-PI: Brigitta Tóth, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
Collaboration partners:
Supported by Austrian Science Fund (FWF, I 4294-B) and NKFIH.
Machine learning has become an integral part of our everyday lives over the last few year. Whether we use a smartphone, shop online, consume media, drive a car or much more, machine learning (ML) and, more generally, artificial intelligence (AI) support, influence and analyze us in different life situations. In particular deep learning methods based on artificial neural networks are used in many areas.
Also in the sciences ML and AI have already generated important impulses and it is expected that this influence will spread in the future to an even wider field of scientific disciplines.
This increases both the interest in a deeper, science-based understanding of ML methods, as well as the need for scientists of various disciplines to develop a strong understanding of the application and design of such methods.
The Institute for Acoustic Research, which conducts application-oriented basic research in the field of acoustics, is rising to this challenge and as founded the Machine Learning research group.
It sheds light on the different aspects of machine learning and artificial intelligence, with a particular focus on potential applications in acoustics. The collaboration of scientists from different disciplines in the areas of ML and AI will not only enable the Institute for Acoustic Research to make pioneering progress in all areas of sound research, but will also make essential contributions to theoretical issues in the highly up-to-date research field of artificial intelligence.
AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.
AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).
Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki
Members:
Annual meetings are held at the beginning of each year:
Contact person: Piotr Majdak
The Musicality and Bioacoustics group merges music and biology to study the origins of music through cross-species studies. Like language, music is found in all cultures around the world. Even isolated cultures have music, and all musical systems share important parallels such as the use of discrete notes and a steady beat.
Here we study other animals to try and understand what aspects of music are uniquely human and why humans may have developed these abilities. Specifically, here are some active research directions of the group:
The budgerigar laboratory facilities are currently in the Department of Cognitive Biology at the University of Vienna where we also have collaborations with some of the other species that are housed there.
This web page provides resources for the figures and the implementation of inversion of frame multipliers in the research manuscript:
"A survey on the unconditional convergence and the invertibility of multipliers with implementation"
Diana T. Stoeva and Peter Balazs
Abstract:
The paper presents a survey over frame multipliers and related concepts. In particular, it includes a short motivation of why multipliers are of interest to consider, a review as well as extension of recent results, devoted to the unconditional convergence of multipliers, sufficient and/or necessary conditions for the invertibility of multipliers, and representation of the inverse via Newmann-like series and via multipliers with particular parameters. Multipliers for frames with specific structure, namely, Gabor and wavelet multipliers, are also considered. Some of the results for the representation of the inverse multiplier are implemented in Matlab codes and the implementations are described.
Here we provide:
- the scripts which were used to generate Fig. 1 and Fig.2 in the paper;
- implementation of Propositions 8, 9, and 11, written in Matlab-codes using the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version ... and above).
In order to run the codes, provided below, first one needs to install the toolbox LTFAT, freely available at Sourceforge.
I. Fig. 1 in the paper and the script, which was used to generate this figure (an illustrative example to visualize a multiplier).
Fig.1 An illustrative example to visualize a multiplier.
(TOP LEFT) The time-frequency representation of the music signal $f$. (TOP RIGHT) The symbol $m$, found by a (manual) estimation of the
time-frequency region of the singer's voice. (BOTTOM LEFT) The multiplication in the TF domain. (BOTTOM RIGHT) Time-frequency representation
of $M_{m,\widetilde \Psi,\Psi}f$.
Fig. 1 was produced via the script testGabMulExp_new.m using the original sound-file originalsignal.wav and the manually determined symbol Symbol6_BW.png.
The script also provides the modified signal (obtained when applying the symbol/mask on the original signal) and you can listen it here.
II. Implementation of inversion of multipliers according to Section 3.2.3 of the paper.
II.1. Implementation of Proposition 8
(a) Implementation of inversion of multipliers $M_{m,\Phi,\Psi}$ (M1) and $M_{m,\Psi,\Phi}$ (M2) for positive m according to Proposition 8 is done in the program Prop8MultiplierInversionOp.m, which involves the function Prop8InvMultOp.m.
function [TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop8MultiplierInversionOp.m", the user will be required to enter the following parameters (which are the input-parameters for the function Prop8InvMultOp.m):
c - the number of the frame vectors;
r - the number of the coordinates of the frame vectors;
TPhi - the synthesis matrix (rxc) of the frame $\Phi$;
TG - the synthesis matrix (rxc) of a frame G (with the meaning of $\Psi-\Phi$);
m - the symbol of the multiplier (c numbers in a row);
e - the desired error bound.
Note:
- the program requires entries of m until positive m is entered;
- after entering TPhi, TG, and positive m, the program checks if they satisfy the assumptions of Prop. 8 and if not,
the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 8.
The implementation is done using an iterative algorithm according to Prop. 8, until one reaches the desired error-bound e.
The output of the program "Prop8MultiplierInversionOp.m'':
TPsi - the synthesis operator of $\Psi$,
M1 - the multiplier $M_{m,\Phi,\Psi}$,
M2 - the multiplier $M_{m,\Psi,\Phi}$,
M1inv - the iteratively inverted M1,
M2inv - the iteratively inverted M2,
M1invMatlab - the inversion of M1 using the matlab-command ``inv'' (for comparison reason),
M2invMatlab - the inversion of M2 using the matlab-command ``inv'' (for comparison reason),
n - the number of the iteration steps.
Note:
After presenting the output parameters, the program allows the user to
- either enter new $TG$ and new error-bound e, and repeat the inversion procedure,
- or to terminate the program by pressing zero.
A demo-file (applying "Prop8InvMultOp.m" with concrete parameters) is available in the script Prop8InvMultOpRun.m.
(b) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}f$ and $M_{m,\Psi,\Phi}^{-1}f$ for given f (and for positive m) is done in the program Prop8MultiplierInversionf.m, which involves the function Prop8InvMultf.m.
function [TPsi,M1,M2,M1invf,M2invf,n] = Prop8InvMultf(c,r,TPhi,TG,m,f,e)
The implementation goes in a similar way as in (a), requiring one more input, namely f, and using appropriate modification of the iteration steps.
A demo-file (applying Prop8InvMultf.m with concrete parameters) is available in the script Prop8InvMultfRun.m.
(c) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}$ and $M_{m,\Psi,\Phi}^{-1}$ for positive $m$ and Gabor frames $\Phi$ and $\Psi$ is done in the program Prop8MultiplierInversionOpGabor.m, which involves the function Prop8InvMultOpGabor.m.
function [TPhi,TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOpGabor(L,a, M,gPhi,gG,m,e)
The implementation of the inversion is like the one in (a), but using $\Phi$ and $\Psi$ which are Gabor frames.
The input parameters of "Prop8MultiplierInversionOpGabor.m'':
L - the length of the transform,
a - the time-shift (should be divisor of L),
M - the number of channels (should be divisor of L and bigger or equal to a),
gPhi - the window function of the Gabor frame Phi,
gG - the window function of the Gabor frame G(with the meaning of Psi-Phi),
TPhi - the synthesis matrix ($rxc$) of the frame $\Phi$,
TG - the synthesis matrix ($rxc$) of a frame $G$ (with the meaning of $\Psi-\Phi$),
m - the symbol of the multiplier (ML/a positive numbers),
e - the desired error bound.
The output parameters of "Prop8MultiplierInversionOpGabor.m'':
TPhi - the synthesis operator of the frame Phi,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying "Prop8InvMultOpGabor.m" with concrete parameters) is available in the code Prop8InvMultOpGaborRun.m.
For the convergence rate of this algorithm, see Fig.2 below and the script which was used to generate it.
II.2. Implementation of Proposition 9
Implementation of the inversion of $M_{m,\Phi,\Phi}$, $M_{m,\Phi,\Psi}$, and $M_{m,\Psi,\Phi}$ according to Proposition 9 is done in the program Prop9MultiplierInversionOp.m, which involves the function Prop9InvMultOp.m.
function [m,TPsi,M0,M1,M2,M0inv,n0,M1inv,M2inv,n] = Prop9InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop9MultiplierInversionOp.m'', the user will be required to enter the same parameters as the ones for "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
Note:
- the program checks whether the entered TPhi and m satisfy the assumpitons of Prop. 9
and if not, the program adjusts m to be within the settings of Prop. 9;
- the program checks whether the entered TPhi, TG, and the adjusted m satisfy the assumpitons of Prop. 9
and if not, the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 9.
The input parameters of "Prop9MultiplierInversionOp.m'' are like the ones in "Prop8MultiplierInversionOp.m" (see above, the implementation of Proposition 8(a)).
The output parameters of "Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
M0 - the multiplier $M_{m,\Phi,\Phi}$,
M0inv - the iteratively inverted M0,
n0 - the number of the iteration steps for the inversion of M0,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop9InvMultOp.m with concrete parameters) is available in the code Prop9InvMultOpRun.m.
II.3. Implementation of Proposition 11
Implementation of the inversion of $M_{m,\Phi,\Psi}$ and $M_{m,\Psi,\Phi}$ according to Proposition 11 is done in the program Prop11MultiplierInversionOp.m, which involves the function Prop11InvMultOp.m.
function [m,TPsi,M1,M2,M1inv,M2inv,n] = Prop11InvMultOp(c,r,TPhi,TPsi,m,e)
Running the program "Prop11MultiplierInversionOp.m'', the user will be required to enter the following parameters (which are the input-parameters for the function "Prop11InvMultOp.m"):
c, r, TPhi, m, e - like the ones in "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
TPsi - the synthesis matrix (rxc) of an aproximate dual $\Psi$ of the frame $\Phi$.
Note:
- using the entered TPhi and TPsi, the program checks whether $\Psi$ is an approximate dual of $\Phi$
and if not, the program replaces $\Psi$ with the canonical dual of $\Phi$;
- after that the program checks whether $\Phi$, $\Psi$, and m satisfy the assumptions of Prop. 11
and if not, the program adjusts m.
The output parameters of the program "Prop11MultiplierInversionOp.m'': The output parameters of ``Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop11InvMultOp.m with concrete parameters) is available in the code Prop11InvMultOpRun.m.
I. Fig. 2 in the paper and the script, which was used to generate this figure (the convergence rate of the algorithm in II.1.(c) above).
Fig. 2. The convergence rate of Alg. 3 using base-10 logarithmic scale in the vertical axis and a
linear scale in the horizontal axis. Here the absolute error in each iteration is plotted in red, and
the convergence value predicted in Proposition 8 is plotted in blue.
Fig. 2 was produced using the script Prop8InvMultOpGaborPlotFigure.m which involves the function Prop8InvMultOpGaborForFigure.m.
This is the companion Webpage of the manuscript:
Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.
Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis-synthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis-synthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.
Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture -- target
Rt | β = 1 | β = 1/6 | 1024-point STFT | ||||
1.1 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
1.5 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
4.0 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.
Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.
The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017 PNAS): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.
Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.
SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017 EAA-ASA). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in press).
Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.
Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.
This page provides resources for the research article:
to appear in the book "Excursions in Harmonic Analysis" published by Springer.
Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.
The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.
IMPORTANT NOTE: The Matlab/Octave toolbox Large Time-Frequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.
If you encounter any issue with the files, please do not hesitate to contact the authors.
This page provides the sound files corresponding to the results of the perceptual matching pursuit algorithm presented in:
submitted at the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2014).
Iterations | Matching Pursuit | Perceptual Matching Pursuit | Residual (MP) | Masked components (PMP) | Residual + masked components |
---|---|---|---|---|---|
10000 | wav | wav | wav | wav | wav |
20000 | wav | wav | wav | wav | wav |
40000 | wav | wav | wav | wav | wav |
80000 | wav | wav | wav | wav | wav |
Iterations | Matching Pursuit | Perceptual Matching Pursuit | Residual (MP) | Masked components (PMP) | Residual + masked components |
---|---|---|---|---|---|
10000 | wav | wav | wav | wav | wav |
20000 | wav | wav | wav | wav | wav |
40000 | wav | wav | wav | wav | wav |
80000 | wav | wav | wav | wav | wav |
ARI guest talk by Verena Haunschmid, Shreyan Chowdhury
16. Oktober 2019
14.30
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Read more ...ARI guest talk by Georg Pichler
23. Oktober 2019
14.30
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Read more ...14. October 2019
Anyone who has missed our event "Encounter of the Kempelenschen Sprechmaschinen" in September can listen to the MAKRO MIKRO Podcast of the ÖAW. Listen to the quality in which the first 18th century...
Read more ...17. September 2019
We are proud to share the most recent findings of our research group Psychoacoustics and Experimental Audiology and the cooperation partners (Boston University, Carnegie Mellon University and...
Read more ...17. September 2019
We congratulate our scientists Nicki Holighaus, Günther Koliander and Luis Daniel Abreu on the Dafx19 Best Paper Award. In their publication, they present an algorithm that renders it possible to...
Read more ...27. June 2019
The Acoustics Research Institute and a number of other institutes of the Austrian Academy of Sciences want to close the gap between training in machine learning and modern data science and are ...
Read more ...20. June 2019
Kernsätze in der Mathematik besagen, dass ‚vernünftige‘ Systeme (d.h. Operatoren) als Integrale dargestellt werden können, ähnlich der Matrizendarstellung. Das erlaubt einen deutlich einfacheren...
Read more ...