Machine learning has become an integral part of our everyday lives over the last few year. Whether we use a smartphone, shop online, consume media, drive a car or much more, machine learning (ML) and, more generally, artificial intelligence (AI) support, influence and analyze us in different life situations. In particular deep learning methods based on artificial neural networks are used in many areas.
Also in the sciences ML and AI have already generated important impulses and it is expected that this influence will spread in the future to an even wider field of scientific disciplines.
This increases both the interest in a deeper, science-based understanding of ML methods, as well as the need for scientists of various disciplines to develop a strong understanding of the application and design of such methods.
The Institute for Acoustic Research, which conducts application-oriented basic research in the field of acoustics, is rising to this challenge and as founded the Machine Learning research group.
It sheds light on the different aspects of machine learning and artificial intelligence, with a particular focus on potential applications in acoustics. The collaboration of scientists from different disciplines in the areas of ML and AI will not only enable the Institute for Acoustic Research to make pioneering progress in all areas of sound research, but will also make essential contributions to theoretical issues in the highly up-to-date research field of artificial intelligence.
AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.
AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).
Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki
Members:
Annual meetings are held at the beginning of each year:
Contact person: Piotr Majdak
The Musicality and Bioacoustics group merges music and biology to study the origins of music through cross-species studies. Like language, music is found in all cultures around the world. Even isolated cultures have music, and all musical systems share important parallels such as the use of discrete notes and a steady beat.
Here we study other animals to try and understand what aspects of music are uniquely human and why humans may have developed these abilities. Specifically, here are some active research directions of the group:
The budgerigar laboratory facilities are currently in the Department of Cognitive Biology at the University of Vienna where we also have collaborations with some of the other species that are housed there.
This web page provides resources for the figures and the implementation of inversion of frame multipliers in the research manuscript:
"A survey on the unconditional convergence and the invertibility of multipliers with implementation"
Diana T. Stoeva and Peter Balazs
Abstract:
The paper presents a survey over frame multipliers and related concepts. In particular, it includes a short motivation of why multipliers are of interest to consider, a review as well as extension of recent results, devoted to the unconditional convergence of multipliers, sufficient and/or necessary conditions for the invertibility of multipliers, and representation of the inverse via Newmann-like series and via multipliers with particular parameters. Multipliers for frames with specific structure, namely, Gabor and wavelet multipliers, are also considered. Some of the results for the representation of the inverse multiplier are implemented in Matlab codes and the implementations are described.
Here we provide:
- the scripts which were used to generate Fig. 1 and Fig.2 in the paper;
- implementation of Propositions 8, 9, and 11, written in Matlab-codes using the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version ... and above).
In order to run the codes, provided below, first one needs to install the toolbox LTFAT, freely available at Sourceforge.
I. Fig. 1 in the paper and the script, which was used to generate this figure (an illustrative example to visualize a multiplier).
Fig.1 An illustrative example to visualize a multiplier.
(TOP LEFT) The time-frequency representation of the music signal $f$. (TOP RIGHT) The symbol $m$, found by a (manual) estimation of the
time-frequency region of the singer's voice. (BOTTOM LEFT) The multiplication in the TF domain. (BOTTOM RIGHT) Time-frequency representation
of $M_{m,\widetilde \Psi,\Psi}f$.
Fig. 1 was produced via the script testGabMulExp_new.m using the original sound-file originalsignal.wav and the manually determined symbol Symbol6_BW.png.
The script also provides the modified signal (obtained when applying the symbol/mask on the original signal) and you can listen it here.
II. Implementation of inversion of multipliers according to Section 3.2.3 of the paper.
II.1. Implementation of Proposition 8
(a) Implementation of inversion of multipliers $M_{m,\Phi,\Psi}$ (M1) and $M_{m,\Psi,\Phi}$ (M2) for positive m according to Proposition 8 is done in the program Prop8MultiplierInversionOp.m, which involves the function Prop8InvMultOp.m.
function [TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop8MultiplierInversionOp.m", the user will be required to enter the following parameters (which are the input-parameters for the function Prop8InvMultOp.m):
c - the number of the frame vectors;
r - the number of the coordinates of the frame vectors;
TPhi - the synthesis matrix (rxc) of the frame $\Phi$;
TG - the synthesis matrix (rxc) of a frame G (with the meaning of $\Psi-\Phi$);
m - the symbol of the multiplier (c numbers in a row);
e - the desired error bound.
Note:
- the program requires entries of m until positive m is entered;
- after entering TPhi, TG, and positive m, the program checks if they satisfy the assumptions of Prop. 8 and if not,
the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 8.
The implementation is done using an iterative algorithm according to Prop. 8, until one reaches the desired error-bound e.
The output of the program "Prop8MultiplierInversionOp.m'':
TPsi - the synthesis operator of $\Psi$,
M1 - the multiplier $M_{m,\Phi,\Psi}$,
M2 - the multiplier $M_{m,\Psi,\Phi}$,
M1inv - the iteratively inverted M1,
M2inv - the iteratively inverted M2,
M1invMatlab - the inversion of M1 using the matlab-command ``inv'' (for comparison reason),
M2invMatlab - the inversion of M2 using the matlab-command ``inv'' (for comparison reason),
n - the number of the iteration steps.
Note:
After presenting the output parameters, the program allows the user to
- either enter new $TG$ and new error-bound e, and repeat the inversion procedure,
- or to terminate the program by pressing zero.
A demo-file (applying "Prop8InvMultOp.m" with concrete parameters) is available in the script Prop8InvMultOpRun.m.
(b) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}f$ and $M_{m,\Psi,\Phi}^{-1}f$ for given f (and for positive m) is done in the program Prop8MultiplierInversionf.m, which involves the function Prop8InvMultf.m.
function [TPsi,M1,M2,M1invf,M2invf,n] = Prop8InvMultf(c,r,TPhi,TG,m,f,e)
The implementation goes in a similar way as in (a), requiring one more input, namely f, and using appropriate modification of the iteration steps.
A demo-file (applying Prop8InvMultf.m with concrete parameters) is available in the script Prop8InvMultfRun.m.
(c) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}$ and $M_{m,\Psi,\Phi}^{-1}$ for positive $m$ and Gabor frames $\Phi$ and $\Psi$ is done in the program Prop8MultiplierInversionOpGabor.m, which involves the function Prop8InvMultOpGabor.m.
function [TPhi,TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOpGabor(L,a, M,gPhi,gG,m,e)
The implementation of the inversion is like the one in (a), but using $\Phi$ and $\Psi$ which are Gabor frames.
The input parameters of "Prop8MultiplierInversionOpGabor.m'':
L - the length of the transform,
a - the time-shift (should be divisor of L),
M - the number of channels (should be divisor of L and bigger or equal to a),
gPhi - the window function of the Gabor frame Phi,
gG - the window function of the Gabor frame G(with the meaning of Psi-Phi),
TPhi - the synthesis matrix ($rxc$) of the frame $\Phi$,
TG - the synthesis matrix ($rxc$) of a frame $G$ (with the meaning of $\Psi-\Phi$),
m - the symbol of the multiplier (ML/a positive numbers),
e - the desired error bound.
The output parameters of "Prop8MultiplierInversionOpGabor.m'':
TPhi - the synthesis operator of the frame Phi,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying "Prop8InvMultOpGabor.m" with concrete parameters) is available in the code Prop8InvMultOpGaborRun.m.
For the convergence rate of this algorithm, see Fig.2 below and the script which was used to generate it.
II.2. Implementation of Proposition 9
Implementation of the inversion of $M_{m,\Phi,\Phi}$, $M_{m,\Phi,\Psi}$, and $M_{m,\Psi,\Phi}$ according to Proposition 9 is done in the program Prop9MultiplierInversionOp.m, which involves the function Prop9InvMultOp.m.
function [m,TPsi,M0,M1,M2,M0inv,n0,M1inv,M2inv,n] = Prop9InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop9MultiplierInversionOp.m'', the user will be required to enter the same parameters as the ones for "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
Note:
- the program checks whether the entered TPhi and m satisfy the assumpitons of Prop. 9
and if not, the program adjusts m to be within the settings of Prop. 9;
- the program checks whether the entered TPhi, TG, and the adjusted m satisfy the assumpitons of Prop. 9
and if not, the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 9.
The input parameters of "Prop9MultiplierInversionOp.m'' are like the ones in "Prop8MultiplierInversionOp.m" (see above, the implementation of Proposition 8(a)).
The output parameters of "Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
M0 - the multiplier $M_{m,\Phi,\Phi}$,
M0inv - the iteratively inverted M0,
n0 - the number of the iteration steps for the inversion of M0,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop9InvMultOp.m with concrete parameters) is available in the code Prop9InvMultOpRun.m.
II.3. Implementation of Proposition 11
Implementation of the inversion of $M_{m,\Phi,\Psi}$ and $M_{m,\Psi,\Phi}$ according to Proposition 11 is done in the program Prop11MultiplierInversionOp.m, which involves the function Prop11InvMultOp.m.
function [m,TPsi,M1,M2,M1inv,M2inv,n] = Prop11InvMultOp(c,r,TPhi,TPsi,m,e)
Running the program "Prop11MultiplierInversionOp.m'', the user will be required to enter the following parameters (which are the input-parameters for the function "Prop11InvMultOp.m"):
c, r, TPhi, m, e - like the ones in "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
TPsi - the synthesis matrix (rxc) of an aproximate dual $\Psi$ of the frame $\Phi$.
Note:
- using the entered TPhi and TPsi, the program checks whether $\Psi$ is an approximate dual of $\Phi$
and if not, the program replaces $\Psi$ with the canonical dual of $\Phi$;
- after that the program checks whether $\Phi$, $\Psi$, and m satisfy the assumptions of Prop. 11
and if not, the program adjusts m.
The output parameters of the program "Prop11MultiplierInversionOp.m'': The output parameters of ``Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop11InvMultOp.m with concrete parameters) is available in the code Prop11InvMultOpRun.m.
I. Fig. 2 in the paper and the script, which was used to generate this figure (the convergence rate of the algorithm in II.1.(c) above).
Fig. 2. The convergence rate of Alg. 3 using base-10 logarithmic scale in the vertical axis and a
linear scale in the horizontal axis. Here the absolute error in each iteration is plotted in red, and
the convergence value predicted in Proposition 8 is plotted in blue.
Fig. 2 was produced using the script Prop8InvMultOpGaborPlotFigure.m which involves the function Prop8InvMultOpGaborForFigure.m.
This is the companion Webpage of the manuscript:
Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.
Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis-synthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis-synthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.
Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture -- target
Rt | β = 1 | β = 1/6 | 1024-point STFT | ||||
1.1 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
1.5 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
4.0 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.
Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.
The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017a): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.
Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.
SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017b). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in prep.).
Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.
Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.
This page provides resources for the research article:
to appear in the book "Excursions in Harmonic Analysis" published by Springer.
Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.
The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.
IMPORTANT NOTE: The Matlab/Octave toolbox Large Time-Frequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.
If you encounter any issue with the files, please do not hesitate to contact the authors.
This page provides the sound files corresponding to the results of the perceptual matching pursuit algorithm presented in:
submitted at the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2014).
Iterations | Matching Pursuit | Perceptual Matching Pursuit | Residual (MP) | Masked components (PMP) | Residual + masked components |
---|---|---|---|---|---|
10000 | wav | wav | wav | wav | wav |
20000 | wav | wav | wav | wav | wav |
40000 | wav | wav | wav | wav | wav |
80000 | wav | wav | wav | wav | wav |
Iterations | Matching Pursuit | Perceptual Matching Pursuit | Residual (MP) | Masked components (PMP) | Residual + masked components |
---|---|---|---|---|---|
10000 | wav | wav | wav | wav | wav |
20000 | wav | wav | wav | wav | wav |
40000 | wav | wav | wav | wav | wav |
80000 | wav | wav | wav | wav | wav |
This page provides resources for the visualizations and the algorithms in the research manuscript:
Certain mathematical objects appear in a lot of scientific disciplines, like physics, signal processing and, naturally, mathematics. In a general setting they can be described as frame multipliers, consisting of analysis, multiplication by a fixed sequence (called the symbol), and synthesis. They are not only interesting mathematical objects, but also important for applications, for example for the realization of time-varying filters. In this paper we show a surprising result about the inverse of such operators, if existing, as well as new results about a core concept of frame theory, dual frames. We show that for semi-normalized symbols, the inverse of any invertible frame multiplier can always be represented as a frame multiplier with dual frames and reciprocal symbol. Furthermore, one of those dual frames is uniquely defined and the other one can be arbitrarily chosen. We investigate sufficient conditions for the special case, when both dual frames can be chosen to be the canonical duals. In connection to the above, we show that the set of dual frames determines a frame uniquely. Furthermore, for a given frame, the union of all coefficients of its dual frames is dense in l^{2}. We investigate invertible Gabor multipliers; we show that the inverse of every invertible lattice-invariant operator (in particular, every invertible Gabor frame multiplier with a constant symbol (1)) can be represented as a Gabor frame multiplier with a constant symbol (1). Finally we give a numerical example for the invertibility of multipliers in the Gabor case.
In Figure 1 we show a visualization of a multiplier M_{m,Φ,Ψ} in the time-frequency plane. We consider a music signal f and the action of a multiplier M_{m,Φ,Ψ} on f. For f we use a 2 seconds long excerpt of the "Jump" from Van Halen (click here to listen the signal). For a time-frequency representation of the musical signal f (TOP LEFT) we use a 'painless' Gabor frame Ψ (a 80 ms Hanning window with 12,5% overlap). By manual estimation, we determine the symbol m that should describe the time-frequency region of the singer's voice. This region is then multiplied by 0.01, the rest by 1 (TOP RIGHT) (see the symbol here). Finally, we show a time-frequency representation of the modified signal M_{m,Φ,Ψ}f (BOTTOM). To listen the modified signal, click here.
(TOP LEFT) The time-frequency representation of the music signal f | (TOP RIGHT) The symbol m, found by a (manual) estimation of the time-frequency region of the singer's voice. |
(BOTTOM) Time-Frequency representation of M_{m,Φ,Ψ}f. |
Here we use the same signal f and the same multiplier M_{m,Φ,Ψ} as in Figure 1. Note that all the elements of the symbol m fulfill m_{n,k}∊{1,10^{-2}}. Since m is semi-normalized, the multiplier M is analytically invertible [1]. However, the operator is badly conditioned, the condition number is around 99. The signal f is approximately 2 seconds long, using a sampling rate of 44100. Thus, the signal is a 128148-dimensional vector.
Starting from
we compare two approaches numerically:
To listen the 'naive' inversion, click here.
the 'iterative' inversion
To listen the 'iterative' inversion, click here.
Clearly, the naive approach has strong artifacts. The error is especially big at the boundaries of the constant region of the symbols. The chosen atoms are well localized in time-frequency, so that within the interior of the constant regions, this inversion works well. This could be expected as we have shown in the manuscript that constant symbols allow this kind of inversion for equivalent frames.
The iterative inversion worked well with an error of 3%. This could, naturally, be decreased by investing more calculation time. But also in the chosen setting for the iterative inversion (100 iterations in iframemul [2]) no difference can be seen in the time-frequency representation, as well as no audible difference can be detected.
(TOP LEFT) The time-frequency representation of the result of the 'naive' inversion . | (TOP RIGHT) The time-frequency representation of the error of the 'naive' inversion, i.e. . |
(BOTTOM LEFT) The time-frequency representation of the iterative inversion . | (BOTTOM RIGHT) The time-frequency representation of the error of the iterative inversion . |
The above visualizations are done using algorithms in the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version 1.4.0 and above). In order to run the script, provided below, first you need to install the the Matlab/Octave toolbox LTFAT, freely available at Sourceforge.
The Matlab script for producing Figures 1 and 2 is available for download here. To run the scrip, you need to have the following two files in the same folder:
Running the script, the output is the Figures 1 and 2.
The 3rd International Workshop on the
History of Speech Communication Research
13. - 14. September 2019
Vienna, Austria
ARI guest talk by Michael R. Lomnitz
19. September 2019
14.30
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Read more ...27. June 2019
The Acoustics Research Institute and a number of other institutes of the Austrian Academy of Sciences want to close the gap between training in machine learning and modern data science and are ...
Read more ...27. June 2019
We are proud to report that the HRTF database developed here at the Acoustics Research Institute is now being used in the numerical computing software MATLAB. Nachzulesen unter:...
Read more ...20. June 2019
Kernsätze in der Mathematik besagen, dass ‚vernünftige‘ Systeme (d.h. Operatoren) als Integrale dargestellt werden können, ähnlich der Matrizendarstellung. Das erlaubt einen deutlich einfacheren...
Read more ...19. June 2019
We are pleased to announce that José Luis Romero received a 2019 FWF START award for his project "Time Frequency Analysis, Randomness and Scanning". He works as a research associate at the...
Read more ...12. June 2019
The Acoustic Research Institute (ARI) of the Austrian Academy of Sciences, Austria’s leading non- university research facility, is offering a PhD Student Position (m/f) (part time, 30h/week)
Read more ...09. May 2019
Bisher waren die Tage im diesjährigen Wonnemonat Mai eher bescheiden, doch letzten Mittwoch hatten wir Glück. Da hielten wir bei schönstem Kaiserwetter unseren Betriebsausflug in der Wachau ab....
Read more ...