Mathematics and Signal Processing in Acoustics

  • Basic Description:

    HASSIP is a Research Training Network funded by the European Commission within the Improving the Human Potential program. The aim of the HASSIP network is to develop research activities and systematic interactions in mathematical analysis and statistics that are directly connected to signal and image processing. Although the Acoustics Research Institute was not initially a partner of this network, P. Balazs became a fellow of this network through cooperation with the group NuHAG.

    Partners:

    • NuHAG, Faculty of Mathematics, University of Vienna
    • Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux des LMA / CRNS Marseille
    • Unité de physique théorique et de physique mathématique – FYMA

    Subprojects:

    • Basic Properties of Bessel and Frame Multipliers: For Bessel sequences, the investigation of operators M = ∑ mk < f , ψk > is very natural and useful. The above M are Bessel multipliers. The goal of this project is to set the mathematical basis for this kind of operator.
    • Best Approximation of Matrices by Frame Multipliers: Finding the best approximation by multipliers of matrices that represent time-variant systems gives a way to find efficient algorithms to implement such operators. 

    Publications:

    • P. Balazs, "Hilbert-Schmidt Operators and Frames - Classification, Approximation by Multipliers and Algorithms" , International Journal of Wavelets, Multiresolution and Information Processing, Vol. 6, No. 2, pp. 315 - 330, March 2008, preprint, Codes and Pictures: here
    • P. Balazs, "Basic Definition and Properties of Bessel Multipliers", Journal of Mathematical Analysis and Applications, 325, 1: 571--585. (2007) doi:10.1016/j.jmaa.2006.02.012, preprint

    Project-completion:

    This project ended on 01.01.2009. Its completion allowed the sucessfull application for a 'High Potential'-Project of the WWTF, see MULAC.

  • Objective:

    Head-related transfer functions (HRTF) describe the sound transmission from the free field to a place in the ear canal in terms of linear time-invariant systems. Due to the physiological differences of the listeners' outer ears, the measurement of each subject's individual HRTFs is crucial for sound localization in virtual environments (virtual reality).

    Measurement of an HRTF can be considered a system identification of the weakly non-linear electro-acoustic chain from the sound source room's HRTF microphone. An optimized formulation of the system identification with exponential sweeps, called the "multiple exponential sweep method" (MESM), was used for the measurement of transfer functions. For this measurement of transfer functions, either the measurement duration or the signal-to-noise ratio could be optimized.

    Initial heuristic experiments have shown that using Gabor multipliers to extract the relevant sweeps in the MESM post-processing procedure improves the signal-to-noise ratio of the measured data even further. The objective of this project is to study, in detail, how frame multipliers can optimally be used during this post-processing procedure. In particular, wavelet frames, which best fit the structure of an exponential sweep, will be studied.

    Method:

    Systematic numeric experiments will be conducted with simulated slowly time-variant, weakly non-linear systems. As the parameters of the involved signals are precisely known and controlled, an optimal symbol will automatically be created. Finally, the efficiency of the new method will be tested on a "real world" system, which was developed and installed in the semi-anechoic room of the Institute. It uses in-ear microphones, a subject turntable, 22 loudspeakers on a vertical arc, and a head tracker.

    Application:

    The new method will be used for improved HRTF measurement.

  • General Information

    Funded by the Vienna Science and Technology Fund (WWTF) within the  "Mathematics and …2016"  Call (MA16-053)

    Principal Investigator: Georg Tauböck

    Co-Principal Investigator: Peter Balazs

    Project Team: Günther Koliander, José Luis Romero  

    Duration: 01.07.2017 – 01.07.2021

    Abstract

    Signal processing is a key technology that forms the backbone of important developments like MP3, digital television, mobile communications, and wireless networking and is thus of exceptional relevance to economy and society in general. The overall goal of the proposed project is to derive highly efficient signal processing algorithms and to tailor them to dedicated applications in acoustics. We will develop methods that are able to exploit structural properties in infinite-dimensional signal spaces, since typically ad hoc restrictions to finite dimensions do not sufficiently preserve physically available structure. The approach adopted in this project is based on a combination of the powerful mathematical methodologies frame theory (FT), compressive sensing (CS), and information theory (IT). In particular, we aim at extending finite-dimensional CS methods to infinite dimensions, while fully maintaining their structure-exploiting power, even if only a finite number of variables are processed. We will pursue three acoustic applications, which will strongly benefit from the devised signal processing techniques, i.e., audio signal restoration, localization of sound sources, and underwater acoustic communications. The project is set up as an interdisciplinary endeavor in order to leverage the interrelations between mathematical foundations, CS, FT, IT, time-frequency representations, wave propagation, transceiver design, the human auditory system, and performance evaluation.

    Keywords

    compressive sensing, frame theory, information theory, signal processing, super resolution, phase retrieval, audio, acoustics

    Video

    Link

     

  • Objective:

    So-called Gabor multipliers are particular cases of time-variant filters. Recently, Gabor systems on irregular grids have become a popular research topic. This project deals with Gabor multipliers, as a specialization of frame multipliers on irregular grids.

    Method:

    The initial stage of this project aims to investigate the continuous dependence of an irregular Gabor multiplier on its parameter (i.e. the symbol), window, and lattice. Furthermore, an algorithm to find the best approximation of any matrix (i.e. any time-variant system) by such an irregular Gabor multiplier is being developed.

    Application:

    Gabor multipliers have been used implicitly for quite some time. Investigating the properties of these operators is a current topic for signal processing engineers. If the standard time-frequency grid is not useful to the application, it is natural to work with irregular grids. An example of this is the usage of non-linear frequency scales, like bark scales.

    Partners:

    H. G. Feichtinger, NuHAG, Faculty of Mathematics, University of Vienna

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC (WWTF 2007).

  • Objective:

    General frame theory can be more specialized if a structure is imposed on the elements of the frame in question. One possible, very natural structure is sequences of shifts of the same function. In this project, irregular shifts are investigated.

    Method:

    In this project, the connection to irregular Gabor multipliers will be explored. Using the Kohn Nirenberg correspondence, the space spanned by Gabor multipliers is just a space spanned by translates. Furthermore, the special connection of the Gramian function and the Grame matrix for this case will be investigated.

    Application:

    A typical example of frames of translates is filter banks, which have constant shapes. For example, the phase vocoder corresponds to a filter bank with regular shifts. Introducing an irregular shift gives rise to a generalization of this analysis / synthesis system.

    Partners:

    • S. Heineken, Research Group on Real and Harmonic Analysis, University of Buenos Aires
  • Objective:

    An irrelevance algorithm based on simultaneous masking is implemented In STx. In the years following its first development by Eckel, the efficiency of this algorithm has been clearly shown. In this project, this irrelevance model will be based on modern mathematic and psychoacoustic theories and knowledge.

    Method:

    This algorithm can be described as a Gabor multiplier with an adaptive symbol. With existing related theory, it becomes clear that a high redundancy must be selected. This guarantees:

    • perfect reconstruction synthesis
    • an under-spread operator for good time-frequency localization
    • a smoothing-out of easily detectable quick on/off cycles

    Furthermore, it can be shown that the model used for the spreading function here is mathematically equivalent to the excitation pattern.

    Application:

    This algorithm has been used for several years already for things such as:

    • automobile sound design
    • over-masking for background-foreground separation
    • improved speech recognition in noise
    • contrast increase for hearing-impaired persons

    Partners:

    • G. Eckel, Institut für Elektronische Musik und Akustik, Graz

    Publications:

    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing, Vol. 17 (7) , in press (2009) , preprint

    Project-completion:

    This project ended on 01.01.2010, and leads to a sub-project of the 'High Potential'-Project of the WWTF, MULAC.

  • Objective:

    It is known in psychoacoustics that not all information contained in a "real world" acoustic signal is processed by the human auditory system. More precisely, it turns out that some time-frequency components mask (overshadow) other components that are close in time or frequency.

    In the software S_TOOLS-STx developed by the Institute, an algorithm based on simultaneous masking has been implemented. This algorithm removes perceptually irrelevant time-frequency components. In this implementation, the model is described as a Gabor multiplier with an adaptive symbol.

    In this project, the masking model will be extended to a true time-frequency model, incorporating frequency and temporal masking.

    Method:

    Experiments have been conducted (in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille) to test the time-frequency masking properties of a single Gaussian atom, and to study the additivity of these masking properties for several Gaussian atoms.

    The results of these experiments will be used, in combination with theoretical results obtained in the parallel projects studying the mathematical properties of frame multipliers, to approximate or identify the masking model by wavelet and Gabor multipliers.

    The obtained model will then be validated by appropriate psychoacoustical experiments.

    Application:

    Efficient implementation of a masking filter offers many applications:

    • Sound / Data Compression
    • Sound Design
    • Back-and-Foreground Separation
    • Optimization of Speech and Music Perception

    After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx. 

    Publications:

    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing (2009), in press
    • B. Laback, P. Balazs, G. Toupin, T. Necciari, S. Savel, S. Meunier, S. Ystad and R. Kronland-Martinet, "Additivity of auditory masking using Gaussian-shaped tones", Acoustics'08, Paris, 29.06.-04.07.2008 (03.07.2008)
    • B. Laback, P. Balazs, T. Necciari, S. Savel, S. Ystad, S. Meunier and R. Kronland-Martinet, "Additivity of auditory masking for Gaussian-shaped tone pulses", preprint
  • Objective:

    This project is part of a project cluster that investigates time-frequency masking in the auditory system, in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille. While other subprojects study the spread of masking across the time-frequency plane using Gaussian-shaped tones, this subproject investigates how multiple Gaussian maskers distributed across the time-frequency plane create masking that adds up at a given time-frequency point. This question is important in determining the total masking effect resulting from the multiple time-frequency components (that can be modeled as Gaussian Atoms) of a real-life signal.

    Method:

    Both the maskers and the target are Gaussian-shaped tones with a frequency of 4 kHz. A two-stage approach is applied to measure the additivity of auditory masking. In the first stage, the levels of the maskers are adjusted to cause the same amount of masking in the target. In the second stage, various combinations of those maskers are tested to study their additivity.

    In the first study, the maskers are spread either in time OR in frequency. In the second study, the maskers are spread in time AND in frequency.

    Application:

    New insight into the coding of sound in the auditory system could help to design more efficient audio codecs. These codecs could take the additivity of time-frequency masking into account.

    Funding:

    WTZ (project AMADEUS)

    Publications:

    • Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference, Paris.
  • Objective:

    Many problems in physics can be formulated as operator theory problems, such as in differential or integral equations. To function numerically, the operators must be discretized. One way to achieve discretization is to find (possibly infinite) matrices describing these operators using ONBs. In this project, we will use frames to investigate a way to describe an operator as a matrix.

    Method:

    The standard matrix description of operators O using an ONB (e_k) involves constructing a matrix M with the entries M_{j,k} = < O e_k, e_j>. In past publications, a concept that described operator R in a very similar way has been presented. However, this description of R used a frame and its canonical dual. Currently, a similar representation is being used for the description of operators using Gabor frames. In this project, we are going to develop and completely generalize this idea for Bessel sequences, frames, and Riesz sequences. We will also look at the dual function that assigns an operator to a matrix.

    Application:

    This "sampling of operators" is especially important for application areas where frames are heavily used, so that the link between model and discretization is maintained. To facilitate implementations, operator equations can be transformed into a finite, discrete problem with the finite section method (much in the same way as in the ONB case).

    Publications:

    • P. Balazs, "Matrix Representation of Operators Using Frames", Sampling Theory in Signal and Image Processing (STSIP) (2007, accepted), preprint
    • P. Balazs, "Hilbert-Schmidt Operators and Frames - Classification, Approximation by Multipliers and Algorithms" , International Journal of Wavelets, Multiresolution and Information Processing, (2007, accepted)  preprint, Codes and Pictures: here
  • Objective:

    The Multiple Exponential Sweep Method (MESM) is an optimized method for the semi-simultaneous system identification of multiple systems. It uses an appropriate overlapping of the excitation signals. This leads to a faster identification of the weakly nonlinear systems that are retrieving the linear impulse response only. Using a Gabor multiplier in the post-processing procedure of the system identification may reduce the measurement noise. This may further improve the signal-to-noise ratio of the measured data.

    Method:

    A Gabor multiplier is used to cut the interesting parts out of the measured signals in the time-frequency plane. This allows a specific optimization of signal parts, independent of the frequency. Initial tests applying a Gabor multiplier to simulated data showed that the depth of spectral notches could be raised. A systematic investigation of this method is the main goal this project.

    Application:

    This method may improve the signal-to-noise ratio in system identification tasks of any weakly nonlinear system, such as those involving acoustic measurements with electric equipment.

    Publications:

    • P. Majdak, P. Balazs, B.Laback, "Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions", Journal of the Acoustical Engineering Society , Vol. 55, No. 7/8, July/August 2007, Pages 623 - 637 (2007)

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC (q.v.).

  • Basic Description:

    Time-variant filters are gaining importance in today's signal processing applications. Gabor multipliers in particular are popular in current scientific investigations. These multipliers are a specialization of Bessel multipliers to Gabor frames. These operators are interesting in regard to both theory and application:

    Theory of Multipliers:

    • Bessel and Frame Multipliers in Banach Spaces: In this project, the concept of frame multipliers should be generalized to work with Banach spaces.
    • Theory of Wavelet Multipliers: The concept of multipliers can be easily extended to wavelet frames. The influence of the special structures of these sequences will be investigated.
    • Basic Properties of Irregular Gabor Multipliers: Here multipliers for Gabor frames on irregular lattices are investigated.

    Application of Multipliers:

    • Time Frequency Masking: Gabor Multiplier Models and Evaluation: The symbol for the Gabor multiplier is calculated adaptively and the resulting model incorporates both time and frequency masking components. The goal is to obtain an algorithm using 2-D convolution.
    • Improving the Multiple Exponential Sweep Method (MESM) using Gabor Multipliers: The MESM is an efficient system identification method. Initial tests have shown that this method can be improved with a Gabor multiplier applied as a mask for the original sweep.
    • Wavelet Multipliers and Their Application to Reflection Measurements: One method to calculate the absorption coefficient of a sound proof wall requires separation of the impulse responses of different reflections. They can be easily separated in a scalogram and they can be extracted using a wavelet multiplier.
    • Mathematical Foundation of the Irrelevance Model: In this project, the theoretical foundation of the irrelevance algorithms implemented in STx is being developed.

    Partners:

    • H.G. Feichtinger, K. Gröchenig et al., NuHAG, Faculty of Mathematics, University of Vienna
    • R. Kronland-Martinet, S. Ytad, T. Necciari, Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux of the LMA / CRNS Marseille
    • S. Meunier, S. Savel, Acoustique perceptive et qualité de l’environnement sonore of the LMA / CRNS Marseille

    Publications:

    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing, Vol. 17 (7) , in press (2009) , preprint
    • P. Majdak, P. Balazs, B.Laback, "Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions", Journal of the Acoustical Engineering Society , Vol. 55, No. 7/8, July/August 2007, Pages 623 - 637 (2007)

    Project-completion:

    This project ended on 01.01.2010; most subprojects ended on 28.02.2008 and are incorporated into the 'High Potential'-Project of the WWTF, MULAC.

  • Basic Description:

    Signal processing has entered into today's life on a broad range, from mobile phones, UMTS, xDSL, and digital television to scientific research such as psychoacoustic modeling, acoustic measurements, and hearing prosthesis. Such applications often use time-invariant filters by applying the Fourier transform to calculate the complex spectrum. The spectrum is then multiplied by a function, the so-called transfer function. Such an operator can therefore be called a Fourier multiplier. Real life signals are seldom found to be stationary. Quasi-stationarity and fast-time variance characterize the majority of speech signals, transients in music, or environmental sounds, and therefore imply the need for non-stationary system models. Considerable progress can be achieved by reaching beyond traditional Fourier techniques and improving current time-variant filter concepts through application of the basic mathematical concepts of frame multipliers.

    Several transforms, such as the Gabor transform (the sampled version of the Short-Time Fourier Transformation), the wavelet transform, and the Bark, Mel, and Gamma tone filter banks are already in use in a large number of signal processing applications. Generalization of these techniques can be obtained via the mathematical frame theory. The advantage of introducing the frame theory consists particularly in the interpretability of filter and analysis coefficients in terms of frequency and time localization, as opposed to techniques based on orthonormal bases.

    One possibility to construct time-variant filters exists through the use of Gabor multipliers. For these operators the result of a Gabor transform is multiplied by a given function, called the time-frequency mask or symbol, followed by re-synthesis. These operators are already used implicitly in engineering applications, and have been investigated as Gabor filters in the fields of mathematics and signal processing theory. If alternative transforms are used, the concept of multipliers can be extended appropriately. So, for example, the concept of wavelet multipliers could be investigated for a wavelet transform.

    Different kinds of applications call for different frames. Multipliers can be generalized to the abstract level of frames without any further structure. This concept will be further investigated in this project. Its feasibility will be evaluated in acoustic applications using special cases of Gabor and wavelet systems.

    The project goal is to study both the mathematical theory of frame multipliers and their application among selected problems in acoustics. The project is divided into the following subprojects:

    Theory of Multipliers:

    1. General Frame Multiplier Theory
    2. Analytic and Numeric Properties of Gabor Multipliers
    3. Analytic and Numeric Properties of Wavelet Multipliers

    Application of Multipliers:

    1. Mathematical Modeling of Auditory Time-Frequency Masking Functions
    2. Improvement of Head-Related Transfer Function Measurements
    3. Advanced Method of Sound Absorption Measurements

    Partners:

    • H.G. Feichtinger et al., NuHAG, Faculty of Mathematics, University of Vienna
    • R. Kronland-Martinet et al., Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux of the LMA / CNRS Marseille
    • B. Torrésani et al., LATP Université de Provence / CNRS Marseille
    • J.P. Antoine et al., FYMA Université Catholique de Louvain

    Publications:

    • P. Balazs, J.-P. Antoine, A. Gryboś, "Weighted and Controlled Frames: Mutual relationship and first Numerical Properties",  accepted for publication in International Journal of Wavelets, Multiresolution and Information Processing (2009), preprint
    • P. Balazs, “Matrix Representation of Bounded Linear Operators By Bessel Sequences, Frames and Riesz Sequence“,SampTA'09, 8th International Conference on Sampling and Applications, May 2009, Marseille, France
    • A. Rahimi, P. Balazs, "Multipliers for  p-Bessel sequences in Banach spaces", submitted (2009)
    • D. Stoeva, P. Balazs, "Unconditional convergence and Invertibility of Multipliers", preprint (2009)
    • Monika Dörfler and Bruno Torrésani, “Representation of operators in the time-frequency domain and generalized Gabor multipliers”, J. Fourier Anal. Appl., 2009 (in press)
    • Yohan Frutiger: "Multiplicateurs de Gabor pour les transformations sonores" (Gabor Multipliers for sound transformations) Master thesis under the supervision of R. Kronland-Martinet, June 2008 
    • F. Jaillet, P. Balazs, M. Dörfler and N. Engelputzeder, “On the Structure of the Phase around the Zeros of the Short-Time Fourier Transform”, NAG/DAGA 2009, International Conference on Acoustics, March 2009, Rotterdam, Nederland
    • F. Jaillet, P. Balazs and M. Dörfler, “Nonstationary Gabor Frames”, SampTA'09, 8th International Conference on Sampling and Applications, May 2009, Marseille, France
    • P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing (2009), in press
    •  B. Laback, P. Balazs, G. Toupin, T. Necciari, S. Savel, S. Meunier, S. Ystad and R. Kronland-Martinet, "Additivity of auditory masking using Gaussian-shaped tones", Acoustics'08, Paris, 29.06.-04.07.2008 (03.07.2008)
    • B. Laback, P. Balazs, T. Necciari, S. Savel, S. Ystad, S. Meunier and R. Kronland-Martinet, "Additivity of auditory masking for Gaussian-shaped tone pulses", preprint
    • Anaïk Olivero: "Expérimentation des multiplicateurs temps-échelle" (On the time-scale multipliers) Master thesis under the supervision of R. Kronland-Martinet and B. Torrésani, June 2008
  • Objective:

    During the current project of efficiently calculating a resynthesis window and an iterative scheme for a finite element method algorithm for vibrations in soils and liquids, it became apparent that block matrices are a powerful tool to find numerically efficient algorithms.

    Method:

    In this project, the focus should be the investigation of the numeric features of block matrices. How can this structure be used to calculate or approximate the inverse of a matrix or its norm? How can this be used to speed up iterative schemes?

    Application:

    The results will be used for the two projects mentioned below:

    • double preconditioning for Gabor frames
    • vibrations in random layers
  • Basic Description:

    Practical experience quickly revealed that the concept of an orthonormal basis is not always useful. This led to the concept of frames. Models in physics and other application areas (for example sound vibration analysis) are mostly continuous models. Many continuous model problems can be formulated as operator theory problems, such as in differential or integral equations. Operators provide an opportunity to describe scientific models, and frames provide a way to discretize them.

    Sequences are often used in physical models, allowing numerically unstable re- synthesis. This can be called an "unbounded frame". How this inversion can be regularized is being investigated. For many applications, a certain frame is very useful in describing the model. Therefore, it is also beneficial to use the same sequence to find a discretization of involved operators.

    Subprojects:

    Frames in Finite Dimensional Spaces:

    In this project, the theory of frames in the finite discrete case is investigated further.

    Matrix Representation of Operators using Frames:

    The standard matrix description of operators using orthonormal bases is extended to the more general case of frames.

    Weighted and Controlled Frames:

    Weighted and controlled frames were introduced to speed up the inversion algorithm for the frame matrix of a wavelet frame. In this project, these kinds of frames are investigated further.

    Basic Properties of Unbounded Frames

    Irregular Frames of Translates:

    In this project, one function's sequences of irregular shifts are investigated.

    Partners:

    • S. Heineken, Research Group on Real and Harmonic Analysis, University of Buenos Aires
    • J. P. Antoine, Unité de physique théorique et de physique mathématique – FYMA
    • M. El-Gebeily,  Department of Mathematical Sciences, King Fahd University of Petroleum and Minerals, Saudi Arabia
  • Objective:

    In signal processing, synthesis is important in addition to analysis. This is especially true for the modification of data. For the Short-Time Fourier Transformation, the synthesis is often done using a simple overlap add (OLA), which is the sum of the outputs of the filter. Also, the output is re-weighted with the analysis window, such as occurs when using the phase vocoder. It is often presumed that with standard windows this will give satisfactory results.

    Aside from Gabor frame theory, if the well-known construction of synthesis windows was possible, it would guarantee perfect reconstruction. However, this method is not used often in signal processing algorithms.

    Method:

    In this project, we will systematically investigate if and for which parameters the respective OLA synthesis with the original window gives good reconstruction. We will compare it to the reconstruction with the dual window, introducing and motivating it as perfect reconstruction overlap add (PROLA). We will show that this method is always preferable to others and that it can be calculated very efficiently.

    Application:

    This is currently being implemented in STx. There the phase vocoder will have the option to guarantee perfect reconstruction, either with dual or tight windows.

    Partners:

    Department of Mathematics, University of Wisconsin-Eau Claire

  • Objective:

    The identification of the parameters of the vocal tract system can be used for speaker identification.

    Method:

    A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.

    A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.

    We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.

    Evaluation:

    In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:

    Re-Synthesized Sound

    Original Sound

    Publications:

  • French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

    Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

    Running period: 2014-2017 (project started on March 1, 2014).

    Abstract:

    One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

    1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
    2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

    Working program:

    POTION is structured in three main tasks:

    1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
    2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
    3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

    More information on the project can be found on the POTION web page.

    Publications:

    • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

    Related topics investigated at the ARI:

  • Objective:

    Numerous implementations and algorithms for time frequency analysis can be found in literature or on the internet. Most of them are either not well documented or no longer maintained. P. Soendergaard started to develop the Linear Time Frequency Toolbox for MATLAB. It is the goal of this project to find typical applications of this toolbox in acoustic applications, as well as incorporate successful, not-yet-implemented algorithms in STx.

    Method:

    The linear time-frequency toolbox is a small open-source Matlab toolbox with functions for working with Gabor frames for finite sequences. It includes 1D Discrete Gabor Transform (sampled STFT) with inverse. It works with full-length windows and short windows. It computes the canonical dual and canonical tight windows.

    Application:

    These algorithms are used for acoustic applications, like formants, data compression, or de-noising. These implementations are compared to the ones in STx, and will be implemented in this software package if they improve its performance.

    Partners:

    • H. G. Feichtinger et al., NuHAG, Faculty of Mathematics, University of Vienna
    • B. Torrèsani, Groupe de Traitement du Signal, Laboratoire d'Analyse Topologie et Probabilités, LATP/ CMI, Université de Provence, Marseille
    • P. Soendergaard, Department of Mathematics, Technical University of Denmark
  • This project consists of three subprojects:

    1.1 Frame & Gabor Multiplier:

    Recently Gabor Muiltipliers have been used to implement time-variant filtering as Gabor Filters.  This idea can be further generalized. To investigate the basic properties of such operators the concept of abstract, i.e. unstructured, frames is used. Such multipliers are operators, where a certain fixed mask, a so-called symbol, is applied to the coefficients of frame analysis , whereafter synthesis is done. The properties that can be found for this case can than be used for all kind of frames, for example regular and irregular Gabor frames, wavelet frames or auditory filterbanks.
     
    The basic definition of a frame multiplier follows: 
    FrameMultiplier
    As special case of such multipliers such operators for irregular Gabor system will be investigated and implemented. This corresponds to a irregular sampled Short-Time-Fourier-Transformation. As application  an STFT correpsonding to the bark scale can be examined.
    This mathematical and basic research-oriented project is important for many other projects like time-frequency-masking or system-identification.

    References:

    • O. Christensen, An Introduction To Frames And Riesz Bases, Birkhäuser Boston (2003)
    • M. Dörfler, Gabor Analysis for a Class of Signals called Music, Dissertation Univ. Wien (2002)
    • R.J. Duffin, A.C. Schaeffer, A Class of nonharmonic Fourier series, Trans.Amer.Math.Soc., vol.72, pp. 341-366 (1952)
    • H. G. Feichtinger, K. Nowak, A First Survey of Gabor Multipliers, in H. G. Feichtinger, T. Strohmer

    Dokumente:

    Kooperationen:

  • Objective:

    Measuring sound absorption is essential to performing acoustic measurements and experiments under controlled acoustic conditions, especially considering the acoustic influence of room boundaries.

    So-called "in-situ" methods allow measurement of the reflection and absorption coefficients under real conditions in a single measurement procedure. The method proposed captures the direct signal and reflections in one measurement. These reflections not only include the direct, interesting one, but also others from the surroundings. To separate the reflections coming from the tested surface, the influence of the direct signal and other reflections must be cancelled.

    One known separation method uses a time-windowing technique to separate the direct signal from the reflections. When the impulse response of the direct signal and reflections overlap in time, this method is no longer satisfactory. Frequency-dependent windowing is necessary to separate the different parts of the signal. However, in the wavelet domain, it is possible to observe separation of the interesting reflection.

    The objective of this project is to study how the use of wavelet multipliers could improve the efficiency of the in-situ methods in this context .

    Method:

    A demonstrator system will be built to acquire the necessary measurements for the evaluation of absorption coefficients. This demonstrator will be used to evaluate the usefulness of the new methods in a semi-anechoic room.

    A systematic numeric study will be carried out on the acquired signals, in order to manually determine the symbol of a wavelet multiplier for the extraction of the reflected signal. The best parameters for optimal separation will then be investigated. This, in combination with the use of physical models, will help design a semi-automatic method for the calculation of the optimal multiplier symbol.

    Application:

    The improved measurement method will be available for in-situ measurement of reflection and absorption coefficients