This is the companion Webpage of the manuscript:
Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.
Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and resynthesize sounds. For these applications, an important property of the analysissynthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysissynthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.
Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture  target
Rt  β = 1  β = 1/6  1024point STFT  
1.1  trev_gfb  Audlet_gfb  Audlet_hann  trev_gfb  Audlet_gfb  Audlet_hann  STFT_hann 
1.5  trev_gfb  Audlet_gfb  Audlet_hann  trev_gfb  Audlet_gfb  Audlet_hann  STFT_hann 
4.0  trev_gfb  Audlet_gfb  Audlet_hann  trev_gfb  Audlet_gfb  Audlet_hann  STFT_hann 
This page provides the sound files corresponding to the results of the perceptual matching pursuit algorithm presented in:
submitted at the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2014).
Iterations  Matching Pursuit  Perceptual Matching Pursuit  Residual (MP)  Masked components (PMP)  Residual + masked components 

10000  wav  wav  wav  wav  wav 
20000  wav  wav  wav  wav  wav 
40000  wav  wav  wav  wav  wav 
80000  wav  wav  wav  wav  wav 
Iterations  Matching Pursuit  Perceptual Matching Pursuit  Residual (MP)  Masked components (PMP)  Residual + masked components 

10000  wav  wav  wav  wav  wav 
20000  wav  wav  wav  wav  wav 
40000  wav  wav  wav  wav  wav 
80000  wav  wav  wav  wav  wav 
This page provides resources for the visualizations and the algorithms in the research manuscript:
Certain mathematical objects appear in a lot of scientific disciplines, like physics, signal processing and, naturally, mathematics. In a general setting they can be described as frame multipliers, consisting of analysis, multiplication by a fixed sequence (called the symbol), and synthesis. They are not only interesting mathematical objects, but also important for applications, for example for the realization of timevarying filters. In this paper we show a surprising result about the inverse of such operators, if existing, as well as new results about a core concept of frame theory, dual frames. We show that for seminormalized symbols, the inverse of any invertible frame multiplier can always be represented as a frame multiplier with dual frames and reciprocal symbol. Furthermore, one of those dual frames is uniquely defined and the other one can be arbitrarily chosen. We investigate sufficient conditions for the special case, when both dual frames can be chosen to be the canonical duals. In connection to the above, we show that the set of dual frames determines a frame uniquely. Furthermore, for a given frame, the union of all coefficients of its dual frames is dense in l^{2}. We investigate invertible Gabor multipliers; we show that the inverse of every invertible latticeinvariant operator (in particular, every invertible Gabor frame multiplier with a constant symbol (1)) can be represented as a Gabor frame multiplier with a constant symbol (1). Finally we give a numerical example for the invertibility of multipliers in the Gabor case.
In Figure 1 we show a visualization of a multiplier M_{m,Φ,Ψ} in the timefrequency plane. We consider a music signal f and the action of a multiplier M_{m,Φ,Ψ} on f. For f we use a 2 seconds long excerpt of the "Jump" from Van Halen (click here to listen the signal). For a timefrequency representation of the musical signal f (TOP LEFT) we use a 'painless' Gabor frame Ψ (a 80 ms Hanning window with 12,5% overlap). By manual estimation, we determine the symbol m that should describe the timefrequency region of the singer's voice. This region is then multiplied by 0.01, the rest by 1 (TOP RIGHT) (see the symbol here). Finally, we show a timefrequency representation of the modified signal M_{m,Φ,Ψ}f (BOTTOM). To listen the modified signal, click here.
(TOP LEFT) The timefrequency representation of the music signal f  (TOP RIGHT) The symbol m, found by a (manual) estimation of the timefrequency region of the singer's voice. 
(BOTTOM) TimeFrequency representation of M_{m,Φ,Ψ}f. 
Here we use the same signal f and the same multiplier M_{m,Φ,Ψ} as in Figure 1. Note that all the elements of the symbol m fulfill m_{n,k}∊{1,10^{2}}. Since m is seminormalized, the multiplier M is analytically invertible [1]. However, the operator is badly conditioned, the condition number is around 99. The signal f is approximately 2 seconds long, using a sampling rate of 44100. Thus, the signal is a 128148dimensional vector.
Starting from
we compare two approaches numerically:
To listen the 'naive' inversion, click here.
the 'iterative' inversion
To listen the 'iterative' inversion, click here.
Clearly, the naive approach has strong artifacts. The error is especially big at the boundaries of the constant region of the symbols. The chosen atoms are well localized in timefrequency, so that within the interior of the constant regions, this inversion works well. This could be expected as we have shown in the manuscript that constant symbols allow this kind of inversion for equivalent frames.
The iterative inversion worked well with an error of 3%. This could, naturally, be decreased by investing more calculation time. But also in the chosen setting for the iterative inversion (100 iterations in iframemul [2]) no difference can be seen in the timefrequency representation, as well as no audible difference can be detected.
(TOP LEFT) The timefrequency representation of the result of the 'naive' inversion .  (TOP RIGHT) The timefrequency representation of the error of the 'naive' inversion, i.e. . 
(BOTTOM LEFT) The timefrequency representation of the iterative inversion .  (BOTTOM RIGHT) The timefrequency representation of the error of the iterative inversion . 
The above visualizations are done using algorithms in the Matlab/Octave toolbox Linear TimeFrequency Analysis (LTFAT) [2] (version 1.4.0 and above). In order to run the script, provided below, first you need to install the the Matlab/Octave toolbox LTFAT, freely available at Sourceforge.
The Matlab script for producing Figures 1 and 2 is available for download here. To run the scrip, you need to have the following two files in the same folder:
Running the script, the output is the Figures 1 and 2.
This page provides resources and complementary results for the research manuscript:
accepted for the special issue 'TimeFrequency Analysis and Applications' of the IEEE Signal Processing Magazine.
In this paper, we give an overview of linear timefrequency representations, focusing mainly on two fundamental aspects. The first one is the introduction of flexibility, more precisely the construction of timefrequency waveform systems that can be adapted to specific signals, or specific signal processing problems. To do this, we base the constructions on frame theory, which allows a lot of options, while still ensuring perfect reconstruction. The second aspect is the choice of the synthesis framework rather than the usual analysis framework. Instead of the correlation of the signal with the chosen waveforms, i.e. the inner product with them, we look at how the signals can be constructed using those waveforms, i.e. find the coefficient in thir linear combination. We show how this point of view allows the easy introduction of prior information into the representation. We give an overview over methods for transform domain modeling, in particular those based on sparsity and structured sparsity. Finally we present an illustrative application for these concepts: a denoising scheme.









All files are collected in a ZIpfile!
For this code several other packages are needed:
Acknowledgments: P. Balazs is supported by the Austrian Science Fund (FWF) STARTproject FLAME ('Frames and Linear Operators for Acoustical Modeling and Parameter Estimation'; Y 551N13); M. Dörfler is supported by the WWTF project Audiominer (MA0924); B. Torrésani is supported by the European project UNLocX, grant number 255931, and by the ANR project Metason ANR10CORD010 ; M. Kowalski, benefited from the support of the "FMJH Program Gaspard Monge in optimization and operation research", and from the support to this program from EDF.
This page provides resources and complementary results for the research article:
presented at the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2013). A PDF version of the article is available here for download.
Abstract: This paper describes a method for obtaining a perceptually motivated and perfectly invertible timefrequency representation of a sound signal. Based on frame theory and the recent nonstationary Gabor transform, a linear representation with resolution evolving across frequency is formulated and implemented as a nonuniform filterbank. To match the human auditory timefrequency resolution, the transform uses Gaussian windows equidistantly spaced on the psychoacoustic "ERB" frequency scale. Additionally, the transform features adaptable resolution and redundancy. Simulations showed that perfect reconstruction can be achieved using fast iterative methods and preconditioning even using one filter per ERB and a very low redundancy (1.08). Comparison with a linear gammatone filterbank showed that the ERBlet approximates well the auditory timefrequency resolution.
redundancy = 12, relative reconstruction error < 10^{15}. redundancy = 11.80, relative reconstruction error < 10^{15}. redundancy = 12, relative reconstruction error < 10^{15}. redundancy = 12, relative reconstruction error < 10^{15}.
Implementation in [1].redundancy = 12, relative reconstruction error < 10^{15}. redundancy = 128, relative reconstruction error = 1.4 for a delay of 4 ms and no postprocessing correction of the filterbank delay. Accounting for the filterbank delay at the output of the resynthesizer module led to relative reconstruction errors of 4.11 x 10^{1}, 1.01 x 10^{1} and 2.86 x 10^{3} for delays of 4, 8 and 16 ms, respectively. Implementation in [2].
IMPORTANT NOTE: The Matlab/Octave toolboxes Linear TimeFrequency Analysis (LTFAT, version 1.2.0 and above) [3] and Auditory Modeling (AM) must be installed to run the ERBlet codes. These toolboxes are freely available at Sourceforge.
A mathematical background is very important and useful for all physical and engineering sciences. The connection between applied and mathematical research often leads to progress in both directions, due to natural synergy effects. The Acoustic Research Institute considers the investigation of the mathematical background of its numerous research projects, most prominently the signal processing aspects, as an important part of acoustic research.
Applicationoriented mathematics develops theoretical results, motivated by application, in contrast to “applied mathematics” focusing on tools for the applied sciences. The applicationoriented approach provides results significant both for the applied sciences and theoretical mathematics. The importance of applicationoriented mathematics was acknowledged by the Viennese Technology and Science Fund arranging a specific research programme titled ‘Mathematics and …' and is a current research focus both of the Academy of Sciences and the city of Vienna.
Complex experimental designs generate empirical data and often lead to heuristic models with a modest mathematical basis. Mathematically precise statements considerably enhance the precision and stability of established algorithms and can already be implemented at an early stage of model generation. Therefore, mathematics supports the software development in the modelling stage as well as the implementation stage (stability, precision)
The Acoustics Research Institute has strengthened its research in this area in recent years, and will continue to do so. The following goals are set:
The cooperation of the group ’Mathematics and Signal Processing’ with the other groups of the Institute has been proven to be very fruitful for all partners and will be further strengthened. While the other groups get methods to solve their relevant problems, wellbased in theory, the mathematicians can solve questions relevant for applications but still interesting in theory. This dialog increases the understanding of other fields enormously. It has allowed the successful application for the STARTproject 'FLAME: Frames and Linear Operators for Acoustical Modeling and Parameter Estimation' in 2011.
Nicki Holighaus  TimeFrequency Frames and Applications to Audio Analysis  Part 1
Peter Balazs  February Fourier Talks 2014
Hans G. Feichtinger  Mathematical and Numerical Aspects of Frame Theory  Part 1 (showing the institute own software STx!)
Georg Tauböck  WWTF Project INSIGHT
This page provides the sound files corresponding to the results of the irrelevance timescale filter reported in Necciari et al. "Perceptual optimization of audio representations based on timefrequency masking data for maximallycompact stimuli", presented at the AES 45th conference on Applications of TimeFrequency Processing in Audio, Helsinki, Finland, 2012 March 14.
This webpage is linked to the paper
ad Section 2.1. 'Phase vs. Amplitude Reconstruction in the STFT'
Audio Files:
This page provides resources for the research article:
to appear in the book "Excursions in Harmonic Analysis" published by Springer.
Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant viewpoint for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.
The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.
IMPORTANT NOTE: The Matlab/Octave toolbox Large TimeFrequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.
If you encounter any issue with the files, please do not hesitate to contact the authors.