Content coming soon
Millions of people use headphones everyday for listening to music, for watching movies, or when communicating with others. Nevertheless, the sounds presented via headphones are usually perceived inside the head and not at their actual natural spatial position. This limited perception is inherent and results in unrealistic listening situations.
When listening to a sound without headphones, the acoustic information of the sound source is modified by our head and our torso, an effect described by the head-related transfer functions (HRTFs). The shape of our ears contributes to that modification by filtering the sound depending on the source direction. But the ear is very listener-specific – its individuality is similar to that of a finger print, and thus HRTFs are very listener-specific. When listening to sounds via headphones, the listener-specific filtering is usually not available. One of the main reasons is the difficulty in the process of acquisition of the ear shape of a person, and thus in calculation of listener-specific HRTFs.
Thus, in softpinna, we will work on the development of new methods for a better acquisition of listener-specific ear shapes of a person. Specifically, we will investigate and improve the so-called "non-rigid registration" (NRR) algorithms, applied on 3-D ear geometries calculated from 2-D photos of a person’s ears. The improvement in the quality of the 3-D ear geometries acquisition will allow computer programs to accurately calculate the listener-specific HRTFs, thus enabling the incorporation of listener-specific HRTFs in future headphone systems providing realistic presentation of spatial sounds. The new ear-shape acquisition method will vastly reduce the technical requirements for accurate calculation of listener-specific HRTFs.
This project is done in collaboration with Dreamwaves GmbH. It is supported by the Bridge Programme of the FFG.
Machine learning has become an integral part of our everyday lives over the last few year. Whether we use a smartphone, shop online, consume media, drive a car or much more, machine learning (ML) and, more generally, artificial intelligence (AI) support, influence and analyze us in different life situations. In particular deep learning methods based on artificial neural networks are used in many areas.
Also in the sciences ML and AI have already generated important impulses and it is expected that this influence will spread in the future to an even wider field of scientific disciplines.
This increases both the interest in a deeper, science-based understanding of ML methods, as well as the need for scientists of various disciplines to develop a strong understanding of the application and design of such methods.
The Institute for Acoustic Research, which conducts application-oriented basic research in the field of acoustics, is rising to this challenge and as founded the Machine Learning research group.
It sheds light on the different aspects of machine learning and artificial intelligence, with a particular focus on potential applications in acoustics. The collaboration of scientists from different disciplines in the areas of ML and AI will not only enable the Institute for Acoustic Research to make pioneering progress in all areas of sound research, but will also make essential contributions to theoretical issues in the highly up-to-date research field of artificial intelligence.
AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.
AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).
Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki
Honorary member and founder: Jens Blauert
Annual meetings are held at the beginning of each year:
Contact person: Piotr Majdak
The Musicality and Bioacoustics group merges music and biology to study the origins of music through cross-species studies. Like language, music is found in all cultures around the world. Even isolated cultures have music, and all musical systems share important parallels such as the use of discrete notes and a steady beat.
Here we study other animals to try and understand what aspects of music are uniquely human and why humans may have developed these abilities. Specifically, here are some active research directions of the group:
The budgerigar laboratory facilities are currently in the Department of Cognitive Biology at the University of Vienna where we also have collaborations with some of the other species that are housed there.
This web page provides resources for the figures and the implementation of inversion of frame multipliers in the research manuscript:
"A survey on the unconditional convergence and the invertibility of multipliers with implementation"
Diana T. Stoeva and Peter Balazs
Abstract:
The paper presents a survey over frame multipliers and related concepts. In particular, it includes a short motivation of why multipliers are of interest to consider, a review as well as extension of recent results, devoted to the unconditional convergence of multipliers, sufficient and/or necessary conditions for the invertibility of multipliers, and representation of the inverse via Newmann-like series and via multipliers with particular parameters. Multipliers for frames with specific structure, namely, Gabor and wavelet multipliers, are also considered. Some of the results for the representation of the inverse multiplier are implemented in Matlab codes and the implementations are described.
Here we provide:
- the scripts which were used to generate Fig. 1 and Fig.2 in the paper;
- implementation of Propositions 8, 9, and 11, written in Matlab-codes using the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version ... and above).
In order to run the codes, provided below, first one needs to install the toolbox LTFAT, freely available at Sourceforge.
I. Fig. 1 in the paper and the script, which was used to generate this figure (an illustrative example to visualize a multiplier).
Fig.1 An illustrative example to visualize a multiplier.
(TOP LEFT) The time-frequency representation of the music signal $f$. (TOP RIGHT) The symbol $m$, found by a (manual) estimation of the
time-frequency region of the singer's voice. (BOTTOM LEFT) The multiplication in the TF domain. (BOTTOM RIGHT) Time-frequency representation
of $M_{m,\widetilde \Psi,\Psi}f$.
Fig. 1 was produced via the script testGabMulExp_new.m using the original sound-file originalsignal.wav and the manually determined symbol Symbol6_BW.png.
The script also provides the modified signal (obtained when applying the symbol/mask on the original signal) and you can listen it here.
II. Implementation of inversion of multipliers according to Section 3.2.3 of the paper.
II.1. Implementation of Proposition 8
(a) Implementation of inversion of multipliers $M_{m,\Phi,\Psi}$ (M1) and $M_{m,\Psi,\Phi}$ (M2) for positive m according to Proposition 8 is done in the program Prop8MultiplierInversionOp.m, which involves the function Prop8InvMultOp.m.
function [TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop8MultiplierInversionOp.m", the user will be required to enter the following parameters (which are the input-parameters for the function Prop8InvMultOp.m):
c - the number of the frame vectors;
r - the number of the coordinates of the frame vectors;
TPhi - the synthesis matrix (rxc) of the frame $\Phi$;
TG - the synthesis matrix (rxc) of a frame G (with the meaning of $\Psi-\Phi$);
m - the symbol of the multiplier (c numbers in a row);
e - the desired error bound.
Note:
- the program requires entries of m until positive m is entered;
- after entering TPhi, TG, and positive m, the program checks if they satisfy the assumptions of Prop. 8 and if not,
the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 8.
The implementation is done using an iterative algorithm according to Prop. 8, until one reaches the desired error-bound e.
The output of the program "Prop8MultiplierInversionOp.m'':
TPsi - the synthesis operator of $\Psi$,
M1 - the multiplier $M_{m,\Phi,\Psi}$,
M2 - the multiplier $M_{m,\Psi,\Phi}$,
M1inv - the iteratively inverted M1,
M2inv - the iteratively inverted M2,
M1invMatlab - the inversion of M1 using the matlab-command ``inv'' (for comparison reason),
M2invMatlab - the inversion of M2 using the matlab-command ``inv'' (for comparison reason),
n - the number of the iteration steps.
Note:
After presenting the output parameters, the program allows the user to
- either enter new $TG$ and new error-bound e, and repeat the inversion procedure,
- or to terminate the program by pressing zero.
A demo-file (applying "Prop8InvMultOp.m" with concrete parameters) is available in the script Prop8InvMultOpRun.m.
(b) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}f$ and $M_{m,\Psi,\Phi}^{-1}f$ for given f (and for positive m) is done in the program Prop8MultiplierInversionf.m, which involves the function Prop8InvMultf.m.
function [TPsi,M1,M2,M1invf,M2invf,n] = Prop8InvMultf(c,r,TPhi,TG,m,f,e)
The implementation goes in a similar way as in (a), requiring one more input, namely f, and using appropriate modification of the iteration steps.
A demo-file (applying Prop8InvMultf.m with concrete parameters) is available in the script Prop8InvMultfRun.m.
(c) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}$ and $M_{m,\Psi,\Phi}^{-1}$ for positive $m$ and Gabor frames $\Phi$ and $\Psi$ is done in the program Prop8MultiplierInversionOpGabor.m, which involves the function Prop8InvMultOpGabor.m.
function [TPhi,TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOpGabor(L,a, M,gPhi,gG,m,e)
The implementation of the inversion is like the one in (a), but using $\Phi$ and $\Psi$ which are Gabor frames.
The input parameters of "Prop8MultiplierInversionOpGabor.m'':
L - the length of the transform,
a - the time-shift (should be divisor of L),
M - the number of channels (should be divisor of L and bigger or equal to a),
gPhi - the window function of the Gabor frame Phi,
gG - the window function of the Gabor frame G(with the meaning of Psi-Phi),
TPhi - the synthesis matrix ($rxc$) of the frame $\Phi$,
TG - the synthesis matrix ($rxc$) of a frame $G$ (with the meaning of $\Psi-\Phi$),
m - the symbol of the multiplier (ML/a positive numbers),
e - the desired error bound.
The output parameters of "Prop8MultiplierInversionOpGabor.m'':
TPhi - the synthesis operator of the frame Phi,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying "Prop8InvMultOpGabor.m" with concrete parameters) is available in the code Prop8InvMultOpGaborRun.m.
For the convergence rate of this algorithm, see Fig.2 below and the script which was used to generate it.
II.2. Implementation of Proposition 9
Implementation of the inversion of $M_{m,\Phi,\Phi}$, $M_{m,\Phi,\Psi}$, and $M_{m,\Psi,\Phi}$ according to Proposition 9 is done in the program Prop9MultiplierInversionOp.m, which involves the function Prop9InvMultOp.m.
function [m,TPsi,M0,M1,M2,M0inv,n0,M1inv,M2inv,n] = Prop9InvMultOp(c,r,TPhi,TG,m,e)
Running the program "Prop9MultiplierInversionOp.m'', the user will be required to enter the same parameters as the ones for "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
Note:
- the program checks whether the entered TPhi and m satisfy the assumpitons of Prop. 9
and if not, the program adjusts m to be within the settings of Prop. 9;
- the program checks whether the entered TPhi, TG, and the adjusted m satisfy the assumpitons of Prop. 9
and if not, the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 9.
The input parameters of "Prop9MultiplierInversionOp.m'' are like the ones in "Prop8MultiplierInversionOp.m" (see above, the implementation of Proposition 8(a)).
The output parameters of "Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
M0 - the multiplier $M_{m,\Phi,\Phi}$,
M0inv - the iteratively inverted M0,
n0 - the number of the iteration steps for the inversion of M0,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop9InvMultOp.m with concrete parameters) is available in the code Prop9InvMultOpRun.m.
II.3. Implementation of Proposition 11
Implementation of the inversion of $M_{m,\Phi,\Psi}$ and $M_{m,\Psi,\Phi}$ according to Proposition 11 is done in the program Prop11MultiplierInversionOp.m, which involves the function Prop11InvMultOp.m.
function [m,TPsi,M1,M2,M1inv,M2inv,n] = Prop11InvMultOp(c,r,TPhi,TPsi,m,e)
Running the program "Prop11MultiplierInversionOp.m'', the user will be required to enter the following parameters (which are the input-parameters for the function "Prop11InvMultOp.m"):
c, r, TPhi, m, e - like the ones in "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
TPsi - the synthesis matrix (rxc) of an aproximate dual $\Psi$ of the frame $\Phi$.
Note:
- using the entered TPhi and TPsi, the program checks whether $\Psi$ is an approximate dual of $\Phi$
and if not, the program replaces $\Psi$ with the canonical dual of $\Phi$;
- after that the program checks whether $\Phi$, $\Psi$, and m satisfy the assumptions of Prop. 11
and if not, the program adjusts m.
The output parameters of the program "Prop11MultiplierInversionOp.m'': The output parameters of ``Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
A demo-file (applying Prop11InvMultOp.m with concrete parameters) is available in the code Prop11InvMultOpRun.m.
I. Fig. 2 in the paper and the script, which was used to generate this figure (the convergence rate of the algorithm in II.1.(c) above).
Fig. 2. The convergence rate of Alg. 3 using base-10 logarithmic scale in the vertical axis and a
linear scale in the horizontal axis. Here the absolute error in each iteration is plotted in red, and
the convergence value predicted in Proposition 8 is plotted in blue.
Fig. 2 was produced using the script Prop8InvMultOpGaborPlotFigure.m which involves the function Prop8InvMultOpGaborForFigure.m.
This is the companion Webpage of the manuscript:
Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.
Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis-synthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis-synthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.
Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture -- target
Rt | β = 1 | β = 1/6 | 1024-point STFT | ||||
1.1 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
1.5 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
4.0 | trev_gfb | Audlet_gfb | Audlet_hann | trev_gfb | Audlet_gfb | Audlet_hann | STFT_hann |
Spatial hearing is important to monitor the environment for interesting or hazardous sounds and to selectively attend to them. The spatial separation between the two ears and the complex geometry of the human body provide auditory cues about the location of a sound source. Depending on where a sound is coming from, the pinna (or auricle) changes the sound spectrum before the sound reaches the eardrum. Since the shape of a pinna is highly individual (even more so than a finger print) it also affects the spectral cues in a very individual manner. In order to produce realistic auditory perception artificially, this individuality needs to be reflected as precisely as required, whereby the actual requirements are currently unclear. That is why SpExCue was about finding electrophysiological measures and prediction models of how spatially realistic (“externalized”) a virtual sound source is perceived to be.
Virtual and augmented reality (VR/AR) systems aim to immerse a listener into a well-externalized 3D auditory space. This requires a perceptually accurate simulation of the listener’s natural acoustic exposure. Particularly challenging is to appropriately represent the high-frequency spectral cues induced by the pinnae. To simplify this task, we aim at developing a phenomenological computational model for sound externalization with a particular focus on spectral cues. The model will be designed to predict the listener’s degree of externalization based on binaural input signals and the listener’s individual head-related transfer functions (HRTFs) under static listening conditions.
The naturally externalized auditory perception can be disrupted, for instance, when listening via headphones or hearing-assistive devices, and instead sounds are heard inside the head. Because of this change in externalization or perceived distance, our investigations of spectral cues also served to study the phenomenon of auditory looming bias (Baumgartner et al., 2017 PNAS): sounds approaching the listener are perceived more intensely than those that are receding from the listener. Previous studies demonstrated auditory looming bias exclusively by loudness changes (increasing/decreasing loudness used to simulate approaching/receding sounds). Hence, it was not clear whether this bias truly reflects perceptual differences in sensitivity to motion direction rather than changes in loudness. Our spectral cue changes were perceived as either approaching or receding at steady loudness and evoked auditory looming bias both on a behavioral level (approaching sounds easier to recognize than receding sounds) and an electrophysiological level (larger neural activity in response to approaching sounds). Therefore, our study demonstrated that the bias is truly about perceived motion in distance, not loudness changes.
Further, SpExCue investigated how the combination of different auditory spatial cues affects attentional control in a speech recognition task with simultaneous talkers, which requires spatial selective attention like in a cocktail party (Deng et al., in prep). We found that natural combinations of auditory spatial cues caused larger neural activity in preparation to the test signal and optimized the neural processing of the attended speech.
SpExCue also compared different computational modeling approaches that aim to predict the effect of spectral cue changes on how spatially realistic a sound is perceived (Baumgartner et al., 2017 EAA-ASA). Although many previous experimental results could be predicted by at least one of the models, none of them alone could explain these results. In order to assist the future design of more general computational models for spatial hearing, we finally created a conceptual cognitive model for the formation of auditory space (Majdak et al., in press).
Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.
Follow-up funding provided by Oculus VR, LLC, since March 2018. Project Investigator: Robert Baumgartner.
This page provides resources for the research article:
to appear in the book "Excursions in Harmonic Analysis" published by Springer.
Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.
The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.
IMPORTANT NOTE: The Matlab/Octave toolbox Large Time-Frequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.
If you encounter any issue with the files, please do not hesitate to contact the authors.
14th March 2020
10:00 am – 6:00 pm
Prechtl-Saal, TU Wien
Karlsplatz 13, 1040 Wien
23. March 2020
Imitation of novel conspecific and human speech sounds in the killer whale (Orcinus orca) - José Francisco Zamorano Abramson
14.00 o'clock,
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Read more ...31. March 2020
Causal Inference in Neuroimaging - Moritz Grosse-Wentrup
14.00 o'clock,
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Read more ...Aktionstag des ÖAW-Institut für Schallforschung
am 29. April 2020, 9:30 - 17:30 Uhr
Sep 8th - 12th, 2020
Vienna, AUSTRIA
https://www.facebook.com/DAFx2020/
Read more ...11. February 2020
Heute spricht Eva Reinisch vom Institut für Schallforschung der ÖAW über die kürzlich veröffentlichte Studie, die sich u.a. mit der Selbstüberschätzung von Englisch Lernenden beschäftigt hat. In der...
Read more ...31. January 2020
Im folgenden Interview von Peter Balazs gibt es Interessantes zur aktuellen Schallforschung und dem diesjährigen Jahr des Schall zu erfahren. Das Auftaktevent des Internationalen Jahr des Schalls...
Read more ...04. February 2020
The Austrian Academy of Sciences (ÖAW), Austria’s central non-university research and science institution is offering a position as a Academy Scientist (f*m) (40 hours per week) at the Acoustics...
Read more ...09. January 2020
Am Donnerstag 9. Jänner 2020 hält Peter Balazs seinen öffentlichen Vortrag über Mathematik und Akustik an der TU Wien im Rahmen des Forum Mathematik....
Read more ...15. January 2020
Dr. Peter Balazs lecture on mathematics and acoustics held at the Technical University of Vienna on 9th January 2020.
Read more ...27. November 2019
Last Thursday it was time. The opening of the exhibition of Hermann Nitsch's orgies Myterien Theater Synesthesia took place there. The spectrogram by Peter Balazs and Anton Noll with the institute's...
Read more ...17. September 2019
We are proud to share the most recent findings of our research group Psychoacoustics and Experimental Audiology showing that unrealistically isolated acoustic spatial cues adversely affect auditory...
Read more ...