Mathematik und Signalverarbeitung in der Akustik - Laufende Projekte

French-Austrian bilateral research project funded by the French National Agency of Research (ANR) and the Austrian Science Fund (FWF, project no. I 1362-N30). The project involves two academic partners, namely the Laboratory of Mechanics and Acoustics (LMA - CNRS UPR 7051, France) and the Acoustics Research Institute. At the ARI, two research groups are involved in the project: the Mathematics and Signal Processing in Acoustics and the Psychoacoustics and Experimental Audiology groups.

Principal investigators: Thibaud Necciari (ARI), Piotr Majdak (ARI) and Olivier Derrien (LMA).

Running period: 2014-2017 (project started on March 1, 2014).

Abstract:

One of the greatest challenges in signal processing is to develop efficient signal representations. An efficient representation extracts relevant information and describes it with a minimal amount of data. In the specific context of sound processing, and especially in audio coding, where the goal is to minimize the size of binary data required for storage or transmission, it is desirable that the representation takes into account human auditory perception and allows reconstruction with a controlled amount of perceived distortion. Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection threshold of a sound in presence of another sound. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency (t-f) masking effects in perceptual audio codecs. We recently conducted psychoacoustical studies on t-f masking. They revealed the inaccuracy of those models which revealed the inaccuracy of such simple models. These new data on t-f masking represent a crucial basis to account for masking effects in t-f representations of sounds. Although t-f representations are standard tools in audio processing, the development of a t-f representation of audio signals that is mathematically-founded, perception-based, perfectly invertible, and possibly with a minimum amount of redundancy, remains a challenge. POTION thus addresses the following questions:

  1. To what extent is it possible to obtain a perception-based (i.e., as close as possible to “what we see is what we hear”), perfectly invertible, and possibly minimally redundant t-f representation of sound signals? Such a representation is essential for modeling complex masking interactions in the t-f domain and is expected to improve our understanding of auditory processing of real-world sounds. Moreover, it is of fundamental interest for many audio applications involving sound analysis-synthesis.
  2. Is it possible to improve current perceptual audio codecs by considering a joint t-f approach? To reduce the size of digital audio files, perceptual audio codecs like MP3 decompose sounds into variable-length time segments, apply a frequency transform, and use masking models to control the sub-quantization of transform coefficients within each segment. Thus, current codecs follow mainly a spectral approach, although temporal masking effects are taken into account in some implementations. By combining an efficient perception-based t-f transform with a joint t-f masking model in an audio codec, we expect to achieve significant performance improvements.

Working program:

POTION is structured in three main tasks:

  1. Perception-based t-f representation of audio signals with perfect reconstruction: A linear and perfectly invertible t-f representation will be created by exploiting the recently developed non-stationary Gabor theory as a mathematical background. The transform will be designed so that t-f resolution mimics the t-f analysis properties by the auditory system and possibly no redundancy is introduced to maximize the coding efficiency.
  2. Development and implementation of a t-f masking model: Based on psychoacoustical data on t-f masking collected by the partners in previous projects and on literature data, a new, complex model of t-f masking will be developed and implemented in the computationally efficient representation built in task 1. Additional psychoacoustical data required for the development of the model, involving frequency, level, and duration effects in masking for either single or multiple maskers will be collected. The resulting signal processing algorithm should represent and re-synthesize only the perceptually relevant components of the signal. It will be calibrated and validated by conducting listening tests with synthetic and real-world sounds.
  3. Optimization of perceptual audio codecs: This task represents the main application of POTION. It will consist in combining the new efficient representation built in task 1 with the new t-f masking model built in task 2 for implementation in a perceptual audio codec.

More information on the project can be found on the POTION web page.

Publications:

  • Chardon, G., Necciari, Th., Balazs, P. (2014): Perceptual matching pursuit with Gabor dictionaries and time-frequency masking, in: Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014). Florence, Italy, 3126-3130. (proceedings) ICASSP 2014: Perceptual matching pursuit results

Related topics investigated at the ARI:

START project of P. Balazs.

FLAME

 

Diese Seite ist eine Projektbeschreibung und als solche in englischer Sprache verfasst.

This international, multi-disciplinary and team-oriented project will expand the group Mathematics and Acoustical Signal Processing at the Acoustic Research Institute in cooperation with NuHAG Vienna (Hans G. Feichtinger, M. Dörfler, K. Gröchenig), Institute of Telecommunication Vienna (Franz Hlawatsch), LATP Marseille (Bruno Torrésani) LMA (Richard Kronland-Martinet). CAHR (Torsten Dau, Peter Soendergaard), the FYMA Louvain-la-Neuve (Jean-Pierre Antoine), AG Numerics (Stephan Dahlke), School of Electrical Engineering and Computer Science (Damian Marelli) as well as the BKA Wiesbaden (Timo Becker).

Within the institute the groups Audiological Acoustics and Psychoacoutics, Computational Acoustics, Acoustic Phonetics and Software Development are involved in the project.

This project is funded by the FWF as a START price . It is planned to run from May 2012 to April 2018.

 

Workshops:

 

Multipliers

 

General description:

We live in the age of information where the analysis, classification, and transmission of information is f essential importance. Signal processing tools and algorithms form the backbone of important technologieslike MP3, digital television, mobile phones and wireless networking. Many signal processing algorithms have been adapted for applications in audio and acoustics, also taking into account theproperties of the human auditory system.

The mathematical concept of frames describes a theoretical background for signal processing. Frames are generalizations of orthonormal bases that give more freedom for the analysis and modificationof information - however, this concept is still not firmly rooted in applied research. The link between the mathematical frame theory, the signal processing algorithms, their implementations andfinally acoustical applications is a very promising, synergetic combination of research in different fields.

Therefore the main goal of this multidisciplinary project is to

-> Establish Frame Theory as Theoretical Backbone of Acoustical Modeling

in particular in psychoacoustics, phonetic and computational acoustics as well as audio engineering.

Overview

 

For this auspicious connection of disciplines, FLAME will produce substantial impact on both the heory and applied research.

The theory-based part of FLAME consists of the following topics:

  • T1 Frame Analysis and Reconstruction Beyond Classical Approaches
  • T2 Frame Multipliers, Extended
  • T3 Novel Frame Representation of Operators Motivated by Computational Acoustics

The application-oriented part of FLAME consists of:

  • A1 Advanced Frame Methods for Perceptual Sparsity in the Time-Frequency Plane
  • A2 Advanced Frame Methods for the Analysis and Classification of Speech
  • A3 Advanced Frame Methods for Signal Enhancement and System Estimation

Press information:

 

 

 

Objective:

The identification of the parameters of the vocal tract system can be used for speaker identification.

Method:

A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.

A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.

We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.

Evaluation:

In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:

Re-Synthesized Sound

Original Sound

Publications:

Objective:

Gabor multipliers are an efficient tool for time-variant filtering. They are used implicitly in many engineering applications in signal processing. For these operators, the result of a Gabor transform (the sampled version of the Short Time Fourier Transform) is multiplied by a fixed function (called the time-frequency mask or symbol). Then the result is synthesized.

Other transforms beyond the Gabor transform, for example the wavelet transform, are more suitable for certain applications. The concept of multipliers can easily be extended to these transforms. More precisely, the concept of multipliers can be applied to general frames without any further structure. This results in the introduction of operators called frame multipliers, which will be investigated in detail in this project in order to precisely define their mathematical properties and optimize their use in applications.

Method:

The problem will be approached using modern frame theory, functional analysis, numeric tools, and linear algebra tools. Systematic numeric experiments will be conducted to observe the different properties of frame multipliers. This observations will support the analytical formulation and demonstration of these properties.

The following topics will be investigated in the project:

  • Eigenvalues and eigenvectors of frame multipliers
  • Invertibility, injectivity, and surjectivity of frame multipliers
  • Reproducing kernel invariance
  • Generalization of multipliers to Banach frames and p-frames
  • Connection of frame multipliers to weighted frames
  • Discretization and implementation of frame multipliers
  • Best approximation of operators by frame multipliers and identification of frame multipliers

Application:

The applications of frame multipliers in signal processing are numerous and include any application requiring time-variant filtering. Some applications of frame multipliers will be investigated further in the following parallel projects:

  • Mathematical Modeling of Auditory Time-Frequency Masking Functions
  • Improvement of Head-Related Transfer Function Measurements
  • Advanced Method of Sound Absorption Measurements

Publications:

  • P. Balazs, "Matrix Representation of Bounded Linear Operators By Bessel Sequences, Frames and Riesz Sequence", SampTA'09, 8th International Conference on Sampling and Applications, May 2009, Marseille, France 
  • P. Balazs, J.-P. Antoine, A. Grybos, "Weighted and Controlled Frames: Mutual relationship and first Numerical Properties", accepted for publication in International Journal of Wavelets, Multiresolution and Information Processing (2009), preprint
  • A. Rahimi, P. Balazs, "Multipliers for p-Bessel sequences in Banach spaces", submitted (2009)
  • D. Stoeva, P. Balazs, "Unconditional convergence and Invertibility of Multipliers", preprint (2009)

Objective:

From many previous applications, it is known that inverse problems often require a regularization that makes the inversion numerically stable. In this project, sequences that allow a bounded, injective analysis (that is not boundedly invertible) are investigated, .

Method:

Even for general sequences, analysis operator and synthesis operator can be defined. The first part of this project will investigate the most general results of these definitions. For example, it can be shown that the analysis operator is always a closed operator. Although it can be shown that the existence of another sequence that allows a perfect reconstruction fit can not be bounded, the question of how to construct such a "dual sequence" will be investigated.

Application:

Such sequences have already found applications in wavelet analysis, in which dual sequences were constructed algorithmically. Also, the original system investigated by Gabor with a redundancy of 1 satisfies this condition.

Partners:

  • M. El-Gebeily, Department of Mathematical Sciences, King Fahd University of Petroleum and Minerals, Saudi Arabia
  • J. P. Antoine, Unité de physique théorique et de physique mathématique – FYMA, Belgium

S&T cooperation project 'Amadee' Austria-France 2013-14, "Frame Theory for Sound Processing and Acoustic Holophony", FR 16/2013

Project Partner: The Institut de recherche et coordination acoustique/musique (IRCAM)

General Information

Funded by the Vienna Science and Technology Fund (WWTF) within the  "Mathematics and …2016"  Call (MA16-053)

Principal Investigator: Georg Tauböck

Co-Principal Investigator: Peter Balazs

Project Team: Günther Koliander, José Luis Romero  

Duration: 01.07.2017 – 01.07.2021

Abstract

Signal processing is a key technology that forms the backbone of important developments like MP3, digital television, mobile communications, and wireless networking and is thus of exceptional relevance to economy and society in general. The overall goal of the proposed project is to derive highly efficient signal processing algorithms and to tailor them to dedicated applications in acoustics. We will develop methods that are able to exploit structural properties in infinite-dimensional signal spaces, since typically ad hoc restrictions to finite dimensions do not sufficiently preserve physically available structure. The approach adopted in this project is based on a combination of the powerful mathematical methodologies frame theory (FT), compressive sensing (CS), and information theory (IT). In particular, we aim at extending finite-dimensional CS methods to infinite dimensions, while fully maintaining their structure-exploiting power, even if only a finite number of variables are processed. We will pursue three acoustic applications, which will strongly benefit from the devised signal processing techniques, i.e., audio signal restoration, localization of sound sources, and underwater acoustic communications. The project is set up as an interdisciplinary endeavor in order to leverage the interrelations between mathematical foundations, CS, FT, IT, time-frequency representations, wave propagation, transceiver design, the human auditory system, and performance evaluation.

Keywords

compressive sensing, frame theory, information theory, signal processing, super resolution, phase retrieval, audio, acoustics

Video

Link

 

Scientific and Technological Cooperation between Austria and Serbia (SRB 01/2018)

Duration of the project: 01.07.2018 - 30.06.2020

 

Project partners:

Acoustics Research Institute, ÖAW (Austria)

University of Vienna (Austria)

University of Novi Sad (Republic of Serbia)

 

Project website: http://nuhag.eu/anacres

This project consists of three subprojects:

1.1 Frame & Gabor Multiplier:

Recently Gabor Muiltipliers have been used to implement time-variant filtering as Gabor Filters.  This idea can be further generalized. To investigate the basic properties of such operators the concept of abstract, i.e. unstructured, frames is used. Such multipliers are operators, where a certain fixed mask, a so-called symbol, is applied to the coefficients of frame analysis , whereafter synthesis is done. The properties that can be found for this case can than be used for all kind of frames, for example regular and irregular Gabor frames, wavelet frames or auditory filterbanks.
 
The basic definition of a frame multiplier follows: 
FrameMultiplier
As special case of such multipliers such operators for irregular Gabor system will be investigated and implemented. This corresponds to a irregular sampled Short-Time-Fourier-Transformation. As application  an STFT correpsonding to the bark scale can be examined.
This mathematical and basic research-oriented project is important for many other projects like time-frequency-masking or system-identification.

References:

  • O. Christensen, An Introduction To Frames And Riesz Bases, Birkhäuser Boston (2003)
  • M. Dörfler, Gabor Analysis for a Class of Signals called Music, Dissertation Univ. Wien (2002)
  • R.J. Duffin, A.C. Schaeffer, A Class of nonharmonic Fourier series, Trans.Amer.Math.Soc., vol.72, pp. 341-366 (1952)
  • H. G. Feichtinger, K. Nowak, A First Survey of Gabor Multipliers, in H. G. Feichtinger, T. Strohmer

Dokumente:

Kooperationen:

This project ended in September 2011.

Media Coverage:


Meetings:

The final MulAc Meeting was in Vienna from 29th to 30th of August 2011.

The ARI Mulac Frame Meeting was held on Tuesday, June 15th 2010at ARI.

The MULAC Mid-term Meeting was held in Marseille from 12. to 13. April 2010. See the Registration-Webpage or the Program.

The FYMA Mulac seminar was held in Louvain-la-Neuve in the 11th of March, 2010. (Talks by Jean-Pierre Antoine, Jean-Pierre Gazeau, Diana Stoeva and Peter Balazs.)

The MULAC - Kick-Off Meeting took place at ARI in Vienna from September 23rd to 24th 2008.


This international, multi-disciplinary and team-oriented project allowed P. Balazs to form a small group 'Mathematics and Acoustical Signal Processing’ at the Acoustic Research Institute in cooperation with NuHAG Vienna (Hans G. Feichtinger), LMA (Richard Kronland-Martinet) and LATP Marseille (Bruno Torrésani) as well as the FYMA Louvain-la-Neuve (Jean-Pierre Antoine).

Within the institute the groups 'Audiological Acoustics' and 'Software Development' are involved.

This project is funded by the WWTF . It will run for 3,5 years and post-docs will be employed for six years total, as well as master students for 36 months total.

In December 2007 the Austrian Academy of Sciences was presenting 'mathematics in ...' as the topic of the month . This included 'mathematics at the Acoustics Research Institute', which describes this project.

General description:

"Frame Multipliers” are a promising mathematical concept, which can be applied to retrieve desired information out of acoustic signals. P. Balazs introduced them by successfully generalizing existing time-variant filter approaches. This project aims to establish new results in the mathematical theory of frame multipliers, to integrate them in efficient digital signal processing algorithms and to make them available for use in 'real-world' acoustical applications. A multi-disciplinary and international cooperation has been established and will be extended in the project to create new significant impulses for the involved disciplines: mathematics, numerics, engineering, physics and cognitive sciences. Various acoustical applications like modelling of auditory perception, measurement of sound absorption coefficients and system identification of the head related transfer functions are included. The results of the project will allow their future integration into practical areas such as audio coding, noise abatement, sound quality design, virtual reality and hearing aids. 

Media coverage: