Research Topics/Groups

The auditory system constantly monitors the environment to protect us from harmful events such as collisions with approaching objects. Auditory looming bias is an astoundingly fast perceptual bias favoring approaching compared to receding auditory motion and was demonstrated behaviorally even in infants of four months in age. The role of learning in developing this perceptual bias and its underlying mechanisms are yet to be investigated. Supervised learning and statistical learning are the two distinct mechanisms enabling neural plasticity. In the auditory system, statistical learning refers to the implicit ability to extract and represent regularities, such as frequently occurring sound patterns or frequent acoustic transitions, with or without attention while supervised learning refers to the ability to attentively encode auditory events based on explicit feedback. It is currently unclear how these two mechanisms are involved in learning auditory spatial cues at different stages of life. While newborns already possess basic skills of spatial hearing, adults are still able to adapt to changing circumstances such as modifications of spectral-shape cues. Spectral-shape cues are naturally induced when the complex geometry especially of the human pinna shapes the spectrum of an incoming sound depending on its source location. Auditory stimuli lacking familiarized spectral-shape cues are often perceived to originate from inside the head instead of perceiving them as naturally external sound sources. Changes in the salience or familiarity of spectral-shape cues can thus be used to elicit auditory looming bias. The importance of spectral-shape cues for both auditory looming bias and auditory plasticity makes it ideal for studying them together.

Born2Hear project overview

Born2Hear will combine auditory psychophysics and neurophysiological measures in order to 1) identify auditory cognitive subsystems underlying auditory looming bias, 2) investigate principle cortical mechanisms for statistical and supervised learning of auditory spatial cues, and 3) reveal cognitive and neural mechanisms of auditory plasticity across the human lifespan. These general research questions will be addressed within three studies. Study 1 will investigate the differences in the bottom-up processing of different spatial cues and the top-down attention effects on auditory looming bias by analyzing functional interactions between brain regions in young adults and then test in newborns whether these functional interactions are innate. Study 2 will investigate the cognitive and neural mechanisms of supervised learning of spectral-shape cues in young and older adults based on an individualized perceptual training on sound source localization. Study 3 will focus on the cognitive and neural mechanisms of statistical learning of spectral-shape cues in infants as well as young and older adults.

Project investigator (PI): Robert Baumgartner

Project partner / Co-PI: Brigitta Tóth, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary

Collaboration partners:

Supported by Austrian Science Fund (FWF, I 4294-B) and NKFIH.


Machine Learning

Machine learning has become an integral part of our everyday lives over the last few year. Whether we use a smartphone, shop online, consume media, drive a car or much more, machine learning (ML) and, more generally, artificial intelligence (AI) support, influence and analyze us in different life situations. In particular deep learning methods based on artificial neural networks are used in many areas.

Also in the sciences ML and AI have already generated important impulses and it is expected that this influence will spread in the future to an even wider field of scientific disciplines.

This increases both the interest in a deeper, science-based understanding of ML methods, as well as the need for scientists of various disciplines to develop a strong understanding of the application and design of such methods.

The Institute for Acoustic Research, which conducts application-oriented basic research in the field of acoustics, is rising to this challenge and as founded the Machine Learning research group.

It sheds light on the different aspects of machine learning and artificial intelligence, with a particular focus on potential applications in acoustics. The collaboration of scientists from different disciplines in the areas of ML and AI will not only enable the Institute for Acoustic Research to make pioneering progress in all areas of sound research, but will also make essential contributions to theoretical issues in the highly up-to-date research field of artificial intelligence.


AABBA is an intellectual open group of scientists collaborating on development and applications of models of human spatial hearing

AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.

AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).


  • Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki

  • Members:

    • Aachen: Janina Fels, ITA, RWTH Aachen
    • Bochum: Dorothea Kolossa & Jens Blauert, Ruhr-Universität Bochum
    • Cardiff: John Culling, School of Psychology, Cardiff University
    • Copenhagen: Torsten Dau & Tobias May, DTU, Lyngby
    • Dresden: Ercan Altinsoy, TU Dresden
    • Ghent: Sarah Verhulst, Ghent University
    • Guangzhou: Bosun Xie, South China University of Technology, Guangzhou
    • Helsinki: Ville Pulkki & Nelli Salminen, Aalto University
    • Ilmenau: Alexander Raake, TU Ilmenau
    • Kosice: Norbert Kopčo, Safarik University, Košice
    • London: Lorenzo Picinali, Imperial College, London
    • Lyon: Mathieu Lavandier, Université de Lyon
    • Munich I: Werner Hemmert, TUM München
    • Munich II: Bernhard Seeber, TUM München 
    • Oldenburg I: Bernd Meyer, Carl von Ossietzky Universität Oldenburg
    • Oldenburg II: Mathias Dietz, Carl von Ossietzky Universität Oldenburg
    • Oldenburg-Eindhoven: Steven van de Par & Armin Kohlrausch, Universität Oldenburg
    • Paris: Brian Katz, Sorbonne Université
    • Patras: John Mourjopoulos, University of Patras
    • Rostock: Sascha Spors, Universität Rostock
    • Sheffield: Guy Brown, The University of Sheffield
    • Tabriz: Masoud Geravanchizadeh, University of Tabriz
    • Toulouse: Patrick Danès, Université de Toulouse
    • Troy: Jonas Braasch, Rensselaer Polytechnic Institute, Troy
    • Vienna: Bernhard Laback & Robert Baumgartner, Austrian Academy of Sciences, Wien
    • The AMT (Umbrella Project): Piotr Majdak
AABBA Group 2019
AABBA group as of the 11th meeting 2019 in Vienna.


Annual meetings are held at the beginning of each year:

  • 12th meeting: 16-17 January 2020, Vienna
  • 11th meeting: 19-20 February 2019, Vienna. Schedule.
  • 10th meeting: 30-31 January 2018, Vienna. Schedule. Group photo
  • 9th meeting: 27-28 February 2017, Vienna. Schedule.
  • 8th meeting: 21-22 January 2016, Vienna. Schedule.
  • 7th meeting: 22-23 February 2015, Berlin.
  • 6th meeting: 17-18 February 2014, Berlin.
  • 5th meeting: 24-25 January 2013, Berlin.
  • 4th meeting: 19-20 January 2012, Berlin.
  • 3rd meeting: 13-14 January 2011, Berlin.
  • 2nd meeting: 29-30 September 2009, Bochum.
  • 1st meeting: 23-26 March 2009, Rotterdam.


  • Upcoming: Structured Session "Binaural models: development and applications" at the Forum Acusticum 2020, Lyon.
  • Special Session "Binaural models: development and applications" at the ICA 2019, Aachen.
  • Special Session "Models and reproducible research" at the Acoustics'17 (EAA/ASA) 2017, Boston.
  • Structured Session "Applied Binaural Signal Processing" at the Forum Acusticum 2014, Krakòw.
  • Structured Session "The Technology of Binaural Listening & Understanding" at the ICA 2016, Buenos Aires.

Contact person: Piotr Majdak

The Musicality and Bioacoustics group merges music and biology to study the origins of music through cross-species studies. Like language, music is found in all cultures around the world. Even isolated cultures have music, and all musical systems share important parallels such as the use of discrete notes and a steady beat.

Here we study other animals to try and understand what aspects of music are uniquely human and why humans may have developed these abilities. Specifically, here are some active research directions of the group:

  • Cross-species tests using operant conditioning to train and test human and non-human animal sound categorization
  • Cross-species tests of the preferences for different sounds
  • Bioacoustic analysis of animal vocalizations using linguistics and computational methods

The budgerigar laboratory facilities are currently in the Department of Cognitive Biology at the University of Vienna where we also have collaborations with some of the other species that are housed there.


This web page provides resources for the figures and the implementation of inversion of frame multipliers in the research manuscript:


"A survey on the unconditional convergence and the invertibility of multipliers with implementation"

Diana T. Stoeva and Peter Balazs



The paper presents a survey over frame multipliers and related concepts. In particular, it includes a short motivation of why multipliers are of interest to consider, a review as well as extension of recent results, devoted to the unconditional convergence of multipliers, sufficient and/or necessary conditions for the invertibility of multipliers, and representation of the inverse via Newmann-like series and via multipliers with particular parameters. Multipliers for frames with specific structure, namely, Gabor and wavelet multipliers, are also considered. Some of the results for the representation of the inverse multiplier are implemented in Matlab codes and the implementations are described.


Here we provide:

- the scripts which were used to generate Fig. 1 and Fig.2 in the paper;

- implementation of Propositions 8, 9, and 11, written in Matlab-codes using the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version ... and above).

In order to run the codes, provided below, first one needs to install the toolbox LTFAT, freely available at Sourceforge.


I. Fig. 1 in the paper and the script, which was used to generate this figure (an illustrative example to visualize a multiplier).


              Fig.1 An illustrative example to visualize a multiplier.

              (TOP LEFT) The time-frequency representation of the music signal $f$. (TOP RIGHT) The symbol $m$, found by a (manual) estimation of the

              time-frequency region of the singer's voice. (BOTTOM LEFT) The multiplication in the TF domain. (BOTTOM RIGHT) Time-frequency representation

              of $M_{m,\widetilde \Psi,\Psi}f$.


Fig. 1 was produced via the script  testGabMulExp_new.m using the original sound-file originalsignal.wav and the manually determined symbol Symbol6_BW.png.

The script also provides the modified signal (obtained when applying the symbol/mask on the original signal) and you can listen it here.


II. Implementation of inversion of multipliers according to Section 3.2.3 of the paper.

II.1. Implementation of Proposition 8

(a) Implementation of inversion of multipliers $M_{m,\Phi,\Psi}$ (M1) and $M_{m,\Psi,\Phi}$ (M2) for positive m according to Proposition 8 is done in the program Prop8MultiplierInversionOp.m, which involves the function Prop8InvMultOp.m.

   function [TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOp(c,r,TPhi,TG,m,e)

Running the program "Prop8MultiplierInversionOp.m", the user will be required to enter the following parameters (which are the input-parameters for the function Prop8InvMultOp.m):
   c - the number of the frame vectors;
   r - the number of the coordinates of the frame vectors;
   TPhi - the synthesis matrix (rxc) of the frame $\Phi$;
   TG - the synthesis matrix (rxc) of a frame G (with the meaning of $\Psi-\Phi$);
   m -  the symbol of the multiplier (c numbers in a row);
   e - the desired error bound.

- the program requires entries of m until positive m is entered;
- after entering TPhi, TG, and positive m, the program checks if they satisfy the assumptions of Prop. 8 and if not,
the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 8.

The implementation is done using an iterative algorithm according to Prop. 8, until one reaches the desired error-bound e.

The output of the program "Prop8MultiplierInversionOp.m'':
   TPsi - the synthesis operator of $\Psi$,
   M1 - the multiplier $M_{m,\Phi,\Psi}$,
   M2 - the multiplier $M_{m,\Psi,\Phi}$,  
   M1inv - the iteratively inverted M1,
   M2inv -  the iteratively inverted M2,
   M1invMatlab - the inversion of M1 using the matlab-command ``inv'' (for comparison reason),
   M2invMatlab - the inversion of M2 using the matlab-command ``inv'' (for comparison reason),
   n - the  number of the iteration steps.

After presenting the output parameters, the program allows the user to
- either enter new $TG$ and new error-bound e, and repeat the inversion procedure,
- or to terminate the program by pressing zero.

A demo-file (applying "Prop8InvMultOp.m" with concrete parameters) is available in the script Prop8InvMultOpRun.m.

(b) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}f$ and $M_{m,\Psi,\Phi}^{-1}f$ for given f (and for positive m) is done in the program Prop8MultiplierInversionf.m, which involves the function Prop8InvMultf.m.

   function [TPsi,M1,M2,M1invf,M2invf,n] = Prop8InvMultf(c,r,TPhi,TG,m,f,e)

The implementation goes in a similar way as in (a), requiring one more input, namely f, and using appropriate modification of the iteration steps.

A demo-file (applying Prop8InvMultf.m with concrete parameters) is available in the script Prop8InvMultfRun.m.

(c) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}$ and $M_{m,\Psi,\Phi}^{-1}$ for positive $m$ and Gabor frames $\Phi$ and $\Psi$ is done in the program Prop8MultiplierInversionOpGabor.m, which involves the function Prop8InvMultOpGabor.m.

   function [TPhi,TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOpGabor(L,a, M,gPhi,gG,m,e)

The implementation of the inversion is like the one in (a), but using $\Phi$ and $\Psi$ which are Gabor frames.

The input parameters of "Prop8MultiplierInversionOpGabor.m'':
   L - the length of the transform,
   a - the time-shift (should be divisor of L),
   M - the number of channels (should be divisor of L and bigger or equal to a),
   gPhi - the window function of the Gabor frame Phi,
   gG - the window function of the Gabor frame G(with the meaning of Psi-Phi),
   TPhi - the synthesis matrix ($rxc$) of the frame $\Phi$,
   TG - the synthesis matrix ($rxc$) of a frame $G$ (with the meaning of $\Psi-\Phi$),
   m -  the symbol of the multiplier (ML/a positive numbers),
   e - the desired error bound.

The output parameters of "Prop8MultiplierInversionOpGabor.m'':
   TPhi - the synthesis operator of the frame Phi,
   and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

A demo-file (applying "Prop8InvMultOpGabor.m" with concrete parameters) is available in the code Prop8InvMultOpGaborRun.m.

For the convergence rate of this algorithm, see Fig.2 below and the script which was used to generate it.

II.2. Implementation of Proposition 9

Implementation of the inversion of $M_{m,\Phi,\Phi}$, $M_{m,\Phi,\Psi}$, and $M_{m,\Psi,\Phi}$ according to Proposition 9 is done in the program Prop9MultiplierInversionOp.m, which involves the function Prop9InvMultOp.m.

   function [m,TPsi,M0,M1,M2,M0inv,n0,M1inv,M2inv,n] = Prop9InvMultOp(c,r,TPhi,TG,m,e)

Running the program "Prop9MultiplierInversionOp.m'', the user will be required to enter the same parameters as the ones for "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

- the program checks whether the entered TPhi and m satisfy the assumpitons of Prop. 9
and if not, the program adjusts m  to be within the settings of Prop. 9;
- the program checks whether the entered TPhi, TG, and the adjusted m satisfy the assumpitons of Prop. 9
and if not, the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 9.

The input parameters of "Prop9MultiplierInversionOp.m'' are like the ones in "Prop8MultiplierInversionOp.m" (see above, the implementation of Proposition 8(a)).

The output parameters of "Prop9MultiplierInversionOp.m'':
   m - the symbol of the multiplier,
   M0 - the multiplier $M_{m,\Phi,\Phi}$,
   M0inv - the iteratively inverted M0,
   n0 - the  number of the iteration steps for the inversion of M0,
   and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

A demo-file (applying Prop9InvMultOp.m with concrete parameters) is available in the code Prop9InvMultOpRun.m.


II.3. Implementation of Proposition 11

Implementation of the inversion of  $M_{m,\Phi,\Psi}$ and $M_{m,\Psi,\Phi}$ according to Proposition 11 is done in the program Prop11MultiplierInversionOp.m, which involves the function Prop11InvMultOp.m.

   function [m,TPsi,M1,M2,M1inv,M2inv,n] = Prop11InvMultOp(c,r,TPhi,TPsi,m,e)

Running the program "Prop11MultiplierInversionOp.m'', the user will be required to enter the following parameters (which are the input-parameters for the function "Prop11InvMultOp.m"):
   c, r, TPhi, m, e - like the ones in "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
   TPsi - the synthesis matrix (rxc) of an aproximate dual $\Psi$ of the frame $\Phi$.

- using the entered TPhi and TPsi, the program checks whether $\Psi$ is an approximate dual of $\Phi$
and if not, the program replaces $\Psi$ with the canonical dual of $\Phi$;
- after that the program checks whether $\Phi$, $\Psi$, and m satisfy the assumptions of Prop. 11
and if not, the program adjusts m. 

The output parameters of the program  "Prop11MultiplierInversionOp.m'': The output parameters of ``Prop9MultiplierInversionOp.m'':
   m - the symbol of the multiplier,
   and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

 A demo-file (applying Prop11InvMultOp.m with concrete parameters) is available in the code Prop11InvMultOpRun.m.


I. Fig. 2 in the paper and the script, which was used to generate this figure (the convergence rate of the algorithm in II.1.(c) above).

             Fig. 2. The convergence rate of Alg. 3 using base-10 logarithmic scale in the vertical axis and a
             linear scale in the horizontal axis. Here the absolute error in each iteration is plotted in red, and
             the convergence value predicted in Proposition 8 is plotted in blue.

Fig. 2 was produced using the script Prop8InvMultOpGaborPlotFigure.m which involves the function Prop8InvMultOpGaborForFigure.m.



  • [1] D. T. Stoeva and P. Balazs, "On the unconditional convergence and invertibility of multipliers", arXiv.
  • [2] Z. Průša,  P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Balazs, "The Large Time-Frequency Analysis Toolbox 2.0". In: Aramaki M., Derrien O., Kronland-Martinet R., Ystad S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science, vol 8905. Springer, Cham, (2014).

This is the companion Webpage of the manuscript:

Audlet Filter Banks: A Versatile Analysis/Synthesis Framework using Auditory Frequency Scales

Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.

Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis-synthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis-synthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.

Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture -- target

Rt β = 1 β = 1/6 1024-point STFT
1.1 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann
1.5 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann
4.0 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann

Baumgartner et al. (2017a)

Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.

Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.

The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017 PNAS): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.

Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.

SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017 EAA-ASA). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in press).


Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.


  • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
  • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
  • Deng, Y., Choi, I., Shinn-Cunningham, B., Baumgartner, R. (2019): Impoverished auditory cues limit engagement of brain networks controlling spatial selective attention, in: Neuroimage 202, 116151. (article)
  • Baumgartner, R., Majdak, P. (2019): Predicting Externalization of Anechoic Sounds, in: Proceedings of ICA 2019. (proceedings)
  • Majdak, P., Baumgartner, R., Jenny, C. (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)

This page provides resources for the research article:

"Frame Theory for Signal Processing in Psychoacoustics"

by Peter Balazs, Nicki Holighaus, Thibaud Necciari, and Diana Stoeva

to appear in the book "Excursions in Harmonic Analysis" published by Springer.

Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.

The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.

IMPORTANT NOTE: The Matlab/Octave toolbox Large Time-Frequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.

If you encounter any issue with the files, please do not hesitate to contact the authors.


This page provides the sound files corresponding to the results of the perceptual matching pursuit algorithm presented in:

"Perceptual Matching Pursuit with Gabor Dictionaries and Time-Frequency Masking"

Gilles Chardon, Thibaud Necciari, and Peter Balazs

submitted at the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2014).

Musical excerpt 1: Bruno Maderna (piano concerto, Fig. 2a in the manuscript)


IterationsMatching PursuitPerceptual Matching PursuitResidual (MP)Masked components (PMP)Residual + masked components
10000 wav wav wav wav wav
20000 wav wav wav wav wav
40000 wav wav wav wav wav
80000 wav wav wav wav wav


Musical excerpt 2: Suzanne Vega (Fig. 2b in the manuscript)


IterationsMatching PursuitPerceptual Matching PursuitResidual (MP)Masked components (PMP)Residual + masked components
10000 wav wav wav wav wav
20000 wav wav wav wav wav
40000 wav wav wav wav wav
80000 wav wav wav wav wav


Upcoming Events

Explainable Models and Their Application in Music Emotion Recognition

ARI guest talk by Verena Haunschmid, Shreyan Chowdhury

16. Oktober 2019


Seminar Room, Wohllebengasse 12-14 / Ground Floor

Read more ...

Blind Output Matching for Domain Adaptation in Segmentation Networks

ARI guest talk by Georg Pichler

23. Oktober 2019


Seminar Room, Wohllebengasse 12-14 / Ground Floor

Read more ...