Project

  • Randelementemethode im Zeitbereich

    Ziel

    Die Randelemente-Methode (BEM) wird oft zur Simulation von akustischen Abstrahl- und Reflektionsproblemen benutzt. Im Allgemeinen wird eine Formulierung im Frequenzbereich verwendet, wenn jedoch kurze Impulsantworten oder eine Kopplung mit nichtlinearem Strukturverhalten in Interesse sind, ist eine Formulierung im Zeitbereich zielführender.

    Methode:

    Die für die BEM notwendigen Randintegralgleichungen und Fundamentallösungen werden mittels inverser Fourier-Transformation der äquivalenten Formulierungen im Frequenzbereich ermittelt. Diese Gleichungen werden dann mittels Galerkin-Methode im Ortsbereich und Kollokation im Zeitbereich diskretisiert. Die MOT (Marching-On-in-Time) Methode wird verwendet um das durch die Diskretisierung erhaltene lineare Gleichungssystem zu lösen. Die bekannten Stabilitätsprobleme der MOT-Methode werden mittels einer Burton-Miller Formulierung im Ortsbereich und höhere Interpolationsordnung im Zeitbereich behandelt.

    Zusätzlich ist geplant, die Effizienz des Codes mittels eines modifizierten Plane-Wave-Time-Decomposition Algorithmus zur erhöhen.

  • Pole-Zero-Vokaltraktmodell

    Beschreibung

    Computermodelle für Sprachproduktion und Sprachanalyse sind sein den 1960er Jahren von wissenschaftlichen Interesse. Viele Modelle ersetzen den Vokaltrakt durch eine segementierte Röhre, wenn aber Nasale wie /n/ und /m/ oder nasaliesierte Vokale betrachtet werden sollen, sind Ein-Rohr Modelle nicht mehr ausreichend, weil durch die Nase ein zusätzlicher Resonanzkörper an den Vokaltrakt gekoppelt wird. Daher ist es notwendig, ein verzweigtes Rohrmodell zu betrachten, bei denen die Bestimmung der Querschnittsflächen aus einen vorgegebenen Sprachsignal nicht mehr trivial ist, und im Allgemeinen die Lösung eines nicht-linearen Gleichungssystems voraussetzt. 

    Das Gleichungssystem ist überbestimmt, und wir führen z.B. mittels probabilistischen Ansätzen (Bayesscher Statistik) zusätzliche Bedingungen ein, z.B. obere und untere Beschränkungen der Flächenfunktionen oder Glattheitsannahmen.

  • RailVib - Vibrationen von Eisenbahntunnels

    Beschreibung

    Eisenbahntunnel vermeiden direkte akustische Beeinträchtigungen durch den Bahnverkehr. Schwingungen aus Tunneln breiten sich jedoch im Boden aus und führen zu Störungen durch wahrgenommene niedrigfrequente Vibrationen.

    Ziel dieses Projektes ist es, ein mathematisches Modell zu entwickeln und zu implementieren, das eine bewegte schwingende Last berücksichtigt. Außerdem wird der umgebende Boden als anisotropes Material modelliert, das aus beliebig orientierten Schichten besteht.

     

    Methoden

    Die Ausbreitung der Vibrationen im Tunnelinneren werden mittels einer finiten Elemente Methode (FEM) berechnet, in der auch die "Superstruktur" des Tunnels und der Gleisanlagen berücksichtigt werden können. Schwingungen außerhalb des Tunnels, im Erdreich, werden durch die Randelementemethode (boundary element method (BEM)) modelliert. Für ein detailiertes Model des ganzen Systems müssen beide Ansätze miteinander gekoppelt werden.

  • Start des FWF-Projekts "Time-Frequency Implementation of HRTFs"

    The FWF project "Time-Frequency Implementation of HRTFs" has started.

    Principal Investigator: Damian Marelli

    Co-Applicants: Peter BalazsPiotr Majdak

  • Wavelets and Frames zur Darstellung von akustischen Wellenfeldern in der Raum-Zeit-Frequenz Ebene

    Computer werden ständig schneller und die schnelle Entwicklung von Audio-Interface and Audio-Transmissions Technologien haben zu einem neuen Zeitalter von Audio-Systemen geführt, die mittels Surround Lautsprechern räumliche Schallerlebnisse reproduzieren können.

    Viele dieser Anwendungen benötigen eine genaue, effiziente und robuste Darstellung des Schalls in der Raum-Zeit-Frequenzebene. Das gemeinesame Projekt von ISF und IRCAM verbindet die mathematischen Konzepte, die am ARI verwendet und entwickelt werden mit der profunden Kenntnis in Signalverarbeitung in Echtzeit am IRCAM. Das Projekt versucht grundlegende Fragen in beiden Forschungsfeldern zu beantworten und hat als Ziel die Entwicklung von verbesserten Methoden für die oben erwähnten Anwendungen.

    Spezielle Fragen, die in diesem Projekt geklärt werden sollen, sind:

    • Kann mittels Wavelets und Frames eine effiziente Raum-Zeit-Frequenz Darstellung von Wellenfeldern gefunden werden, die robuster als derzeitig existente Methoden sind?
    • Ist es möglich, auf Frames basierenden Methoden an ein (sphärisches) Lautsprecher-, bzw. Mikrophonarray mit vorgegeben Anordnung von Lautsprechen, bzw. Mikrophonen anzupassen (z.B. das 64 Kanal Array am IRCAM)
    • Wie kann das akustische Feld auf einer Kugel mit Frames dargestellt werden, um bessere Raum-Zeit-Frequenz Darstellung des akustischen Felds an bestimmten Teilen der Kugel zu erhalten?
    • Ist es möglich, diese Raum-Zeit-Frequenz Darstellung in mehreren Auflösungen für Raumaufnahmen mittles sphärischen Mehrkanal-Mikrophonarray zu verwenden (z.B. um eine höhere räumliche Auflösung von frühen Raumreflexionen zu erreichen)?
  • ZK Dynamates: Dynamiken der auditiven Prädiktion in menschlichen und anderen Primaten

    Stellen Sie sich vor, Sie befinden sich im dichten Straßenverkehr, inmitten von Fußgängern, Radfahrern und Autos, die sich alle in unterschiedliche Richtungen bewegen. In dieser und vielen anderen Situationen ist es überlebenswichtig, genau zu wissen wo und wann Ereignisse in unserer Umgebung stattfinden. Um möglichst schnell und korrekt auf externe Reize zu reagieren, erzeugt unser Gehirn dabei ständig Vorhersagen über zukünftige Ereignisse. Zum Beispiel, wo ein heranfahrendes Auto sich zu dem Zeitpunkt befinden wird, wenn wir die Straße überqueren wollen. Nicht nur für uns Menschen sind diese Vorhersagen zentral. Auch andere Primaten könnten ähnliche Mechanismen verwenden, etwa wenn sie sich durch dichtes Dschungelgebiet bewegen. Inwiefern die Evolution diese Mechanismen bei Menschen im Vergleich zu anderen Spezies geformt hat, ist bis heute unklar.

    Des Weiteren sind unsere Sinnesinformationen oft mehrdeutig, sodass unser Gehirn mehrere parallele Interpretationen und Vorhersagen erzeugt und sich letztlich auf eine festlegen muss. Gegenwärtig stammt der Großteil unseres Wissens über diese Wahrnehmungsprozesse aus Studien zum Sehsinn. Vergleichsweise wenig ist darüber für unseren Hörsinn bekannt, welcher aber gleichermaßen zentral für unser Überleben und Sozialverhalten ist.

    Das Zukunftskolleg Dynamates möchte diese zentralen Wissenslücken zur Hörwahrnehmung schließen indem es die Vorhersagemechanismen nahe verwandter Spezies in realistischen aber hochkontrollierbaren virtuellen akustischen Umgebungen testen und mit Computermodellen abbilden wird. Zusätzlich wird Dynamates mittels hochauflösender Elektroenzephalographie (EEG) bei Menschen die neuronalen Grundlagen der zugrunde liegenden Prozesse untersuchen. Das Projekt basiert damit auf einer interdisziplinären Zusammenarbeit zwischen Expertinnen und Experten aus dem Bereich der Computermodellierung (Robert Baumgartner), der Neurowissenschaft (Ulrich Pomper), und der Kognitionsbiologie (Michelle Spierings).

    Dynamates wird somit den ersten systematischen Vergleich von dynamischen Vorhersage- und Entscheidungsprozessen des Hörsinns zwischen Menschen und nicht-menschlichen Primaten durchführen. Ein besseres Verständnis der neuronalen Grundlagen dieser Prozesse kann Anwendung bei der Behandlung von Personen mit Störungen im Wahrnehmungs- und Entscheidungsverhalten (z.B. bei Autismus oder Schizophrenie) finden. Die erstellten mathematischen Modelle lassen sich in Zukunft auch in anderen Spezies oder bei komplexeren Entscheidungsprozessen (z.B. in sozialen Interaktionen) testen, und können direkte Anwendung in der Entwicklung von künstlicher Intelligenz und virtuellen Realitäten finden.

    In folgender Online-Lecture erklärt Robert Baumgartner weitere Hintergründe zu dieser Forschung: ÖAW Science Bites: Gefahr - wie wir sie hören.

    Offene Stellen (sofort besetzbar, aber offen bis passende Personen gefunden wurden, 1. Evaluierungsrunde am Sept 1, 2020):

    • Studentische Hilfskraft für Datenerhebung (v.a. EEG) 
    • Doktorand*in mit Fokus auf Computermodelle für vergleichende Verhaltensanalysen
    • Postdoktorand*in mit Fokus auf Computermodelle für kausale Zusammenhänge zwischen neuronaler Aktivität (EEG) und Verhalten im Menschen

  • 04.02.2015 Master Studentship offer at the ARI

    Proposal for a Master studentship (f/m)

     

    Title: Measurements of auditory time-frequency masking kernels for various masker frequencies and levels.

     

    Duration: 6 months, working time = 20 hours/week.

     

    Starting date: ASAP.

     

    Closing date for applications: until the position is filled.

    Description

     

    Background:Over the last decades, many psychoacoustical studies investigated auditory masking, an important property of auditory perception. Masking refers to the degradation of the detection of a sound (referred to as the “target”) in presence of another sound (the “masker”). In the literature, masking has been extensively investigated with simultaneous (spectral masking) and non-simultaneous (temporal masking) presentation of masker and target. The results were used to develop models of either spectral or temporal masking. Attempts were made to simply combine these models to account for time-frequency masking in perceptual audio codecs like mp3. However, a recent study on time-frequency masking conducted at our lab [1] revealed the inaccuracy of such simple models. The development of an efficient model of time-frequency masking for short-duration and narrow-band signals still remains a challenge. For instance, such a model is crucial for the prediction of masking in time-frequency representations of sounds and is expected to improve current perceptual audio codecs.

     

    In the previous study [1], the time-frequency masking kernel for a 10-ms Gaussian-shaped sinusoid was measured at a frequency of 4 kHz and a sensation level of 60 dB. A Gaussian envelope is used because it allows for maximum compactness in the time-frequency domain. While these data constitute a crucial basis for the development of an efficient model of time-frequency masking, additional psychoacoustical data are required, particularly the time-frequency masking kernels for different Gaussian masker frequencies and sensation levels.

     

    The proposed work is part of the ongoing research project POTION: “Perceptual Optimization of audio representaTIONs and coding”, jointly funded by the Austrian Science Fund (FWF) and the French National Research Agency (ANR).

     

    Aims:The first goal of the work is to conduct psychoacoustical experiments to measure the time-frequency masking kernels for three masker sensation levels (20, 40, and 60 dB) and three masker frequencies (0.75, 4.0, and 8.0 kHz) following the methods in [1]. This part will consist in experimental design, programming, and data collection. The second goal of the work is to interpret the data and compare them to literature data for maskers with various spectro-temporal shapes. This step shall involve the use of state-of-the-art models of the auditory periphery to predict the data.

     

    Applications:The data will be used to develop a new model of time-frequency masking that should later be implemented and tested in a perceptual audio codec.

     

    Required skills: Qualification for a Master thesis, knowledge in psychophysical methods andpsychoacoustics, experience with auditory models would be a plus, Matlab programming, good communication, proper spoken/written English.

     

    Gross salary: 948.80€/month.

     

    Supervisors: Thibaud Necciari and Bernhard Laback
    Emails: Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! / Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!
    Tel: +43 1 51581-2538

     

    Reference:

    [1] T. Necciari. Auditory time-frequency masking: Psychoacoustical measures and application to the analysis-synthesis of sound signals. PhD thesis, Aix-Marseille I University, France, October 2010. Available online.

  • AABBA - Aural Assessment By means of Binaural Algorithms

    AABBA is an intellectual open group of scientists collaborating on development and applications of models of human spatial hearing

    AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.

    AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).

    Structure

    • Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki

    • Regular members:
      • Aachen: Janina Fels, ITA, RWTH Aachen
      • Bochum: Dorothea Kolossa, Ruhr-Universität Bochum
      • Cardiff: John Culling, School of Psychology, Cardiff University
      • Copenhagen: Torsten Dau & Tobias May, DTU, Lyngby
      • Dresden: Ercan Altinsoy, TU Dresden
      • Ghent: Sarah Verhulst & Alejandro Osses, Ghent University
      • Guangzhou: Bosun Xie, South China University of Technology, Guangzhou
      • Helsinki: Ville Pulkki & Nelli Salminen, Aalto University
      • Ilmenau: Alexander Raake, TU Ilmenau
      • Kosice: Norbert Kopčo, Safarik University, Košice
      • London: Lorenzo Picinali, Imperial College, London
      • Lyon: Mathieu Lavandier, Université de Lyon
      • Munich I: Werner Hemmert, TUM München
      • Munich II: Bernhard Seeber, TUM München 
      • Oldenburg I: Bernd Meyer, Carl von Ossietzky Universität Oldenburg
      • Oldenburg II: Mathias Dietz, Carl von Ossietzky Universität Oldenburg
      • Oldenburg-Eindhoven: Steven van de Par & Armin Kohlrausch, Universität Oldenburg
      • Paris: Brian Katz, Sorbonne Université
      • Patras: John Mourjopoulos, University of Patras
      • Rostock: Sascha Spors, Universität Rostock
      • Sheffield: Guy Brown, The University of Sheffield
      • Tabriz: Masoud Geravanchizadeh, University of Tabriz
      • Toulouse: Patrick Danès, Université de Toulouse
      • Troy: Jonas Braasch, Rensselaer Polytechnic Institute, Troy
      • Vienna: Bernhard Laback & Robert Baumgartner, Austrian Academy of Sciences, Wien
      • The AMT (Umbrella Project): Piotr Majdak
    • Honorary member and founder: Jens Blauert

    AABBA Group 2020
    AABBA group as of the 12th meeting 2020 in Vienna.

    Meetings

    Annual meetings are held at the beginning of each year:

    • 12th meeting: 16-17 January 2020, ViennaScheduleGroup photo
    • 11th meeting: 19-20 February 2019, Vienna. ScheduleGroup photo
    • 10th meeting: 30-31 January 2018, Vienna. Schedule. Group photo
    • 9th meeting: 27-28 February 2017, Vienna. Schedule.
    • 8th meeting: 21-22 January 2016, Vienna. Schedule.
    • 7th meeting: 22-23 February 2015, Berlin.
    • 6th meeting: 17-18 February 2014, Berlin.
    • 5th meeting: 24-25 January 2013, Berlin.
    • 4th meeting: 19-20 January 2012, Berlin.
    • 3rd meeting: 13-14 January 2011, Berlin.
    • 2nd meeting: 29-30 September 2009, Bochum.
    • 1st meeting: 23-26 March 2009, Rotterdam.

    Activities

    • Upcoming: Structured Session "Binaural models: development and applications" at the Forum Acusticum 2020, Lyon.
    • Special Session "Binaural models: development and applications" at the ICA 2019, Aachen.
    • Special Session "Models and reproducible research" at the Acoustics'17 (EAA/ASA) 2017, Boston.
    • Structured Session "Applied Binaural Signal Processing" at the Forum Acusticum 2014, Krakòw.
    • Structured Session "The Technology of Binaural Listening & Understanding" at the ICA 2016, Buenos Aires.

    Contact person: Piotr Majdak

  • Acoustic cues in production and perception of irony in normal-hearing and CI-listeners

    Introduction:

    The ability of listeners to discriminate literal meanings from figurative language, affective language, or rhetorical devices such as irony is crucial for a successful social interaction. This discriminative ability might be reduced in listeners supplied with cochlear implants (CIs), widely used auditory prostheses that restore auditory perception in the deaf or hard-of-hearing. Irony is acoustically characterised by especially a lower fundamental frequency (F0), a lower intensity and a longer duration in comparison to literal utterances. In auditory perception experiments, listeners mainly rely on F0 and intensity values to distinguish between context-free ironic and literal utterances. As CI listeners have great difficulties in F0 perception, the use of frequency information for the detection of irony is impaired. However, irony is often additionally conveyed by characteristic facial expressions.

    Objective:

    The aim of the project is two-fold: The first (“Production”) part of the project will study the role of paraverbal cues in verbal irony of Standard Austrian German (SAG) speakers under well-controlled experimental conditions without acoustic context information. The second (“Perception”) part will investigate the performance in recognizing irony in a normal-hearing control group and a group of CI listeners.

    Method:

    Recordings of speakers of SAG will be conducted. During the recording session, the participants will be presented with scenarios that evoke either a literal or an ironic utterance. The response utterances will be audio- and video-recorded. Subsequently, the thus obtained context-free stimuli will be presented in a discrimination test to normal-hearing and to postlingually deafened CI listeners in three modes: auditory only, auditory+visual, visual only.

    Application:

    The results will not only provide information on irony production in SAG and on multimodal irony perception and processing, but will, most importantly, identify the cues that need to be improved in cochlear implants in order to allow CI listeners full participation in daily life.

  • Acoustic Features for Speaker Models

    Objective:

    Speaker models generated from training recordings of different speakers should differentiate between speakers. These models are estimated using feature vectors that are based on acoustic observations. So, the feature vectors should themselves show a high degree of inter-speaker variability and a low degree of intra-speaker variability.

    Method:

    Cepstral coefficients of transformed short-time spectra (e.g. Mel-Frequency Cepstral Coefficients - MFCC) are experimentally developed features that are widely used in the domain of automatic speech and speaker detection. Because of the manifold possibilities of parameters for the feature extraction process and the lack of theoretically motivated explanations for the determination of the last-mentioned, only a stepwise investigation of the extraction process can lead to stable acoustic features.

    Application:

    Optimized acoustic features for the representation of speakers enables the improvement of automatic speaker identification and verification. Additionally, the development of methods for forensic investigation of speakers (manually and automatically) is supported.

  • Acoustic Holography

    Objective:

    Acoustic holography is a mathematical tool for the localization of sources in a coherent sound field.

    Method:

    Using the information of the sound pressure in one plane, the whole three-dimensional sound field is reconstructed. The sound field must be coherent and the half-space in which the sources are situated must be known.

    Application:

    Acoustic holography is used to calculate the sound field in planes parallel to the measured plane. Normally, a plane near the hull of the structure is chosen. Concentrations in the plane are assumed to be the noise source.

  • Acoustic Measurement Tool at Acoustics Research Institute (AMTatARI)

    Objective:

    The Acoustic Measurement Tool at the Acoustics Research Institute (AMTatARI) has been developed for the automatic measurement of system properties of electro-acoustic systems like loudspeakers and microphones. As a special function, this tool allows an automatic measurement of Head Related Transfer Functions (HRTF). 

    Measurement of the following features has been implemented so far:

    • total harmonic distortion (THD)
    • signal in noise and distortions (SINAD)
    • impulse response

    The impulse responses can be measured with the Maximum Length Sequences (MLS) or with exponential sweeps. Whereas, in case of the sweeps, the new multiple exponential sweep method (MESM) is available. This method is also used to measure HRTFs with AMTatARI.

  • Adaptive Audio-Visuelle Sprachsynthese von Dialekten (AVDS)

    Objective:  

    The aim of this project is to conduct basic research on the audio-visual speech synthesis of Austrian dialects. The project extends our previous work on

    Method:

    10 speakers (5 male and 5 female) will be recorded for each dialect. The recordings comprise spontaneous speech, read speech and naming tasks, eliciting substantial phonemic distinctions and phonotactics. Consequently, a detailed acoustic-phonetic and phonological analysis will be performed for each dialect. Based on the acoustic-phonetic and phonological data analysis, 600 phonetically balanced sentences will be created and recorded with 4 speakers (2 male, 2 female) for each dialect. In these recordings the acoustic and the visual signal, resulting from the same speech production process, will be recorded jointly to account for the multimodal nature of human speech. The recorded material will serve as a basis for the development, training, and testing of speech synthesizers at the Telecommunications Research Center.

    Funding:

    FWF (Wissenschaftsfonds): 2011-2013

    Project Manager: Michael Pucher, Telecommunications Research Center, Vienna

    Project Partner: Sylvia Moosmüller, Acoustics Research Institute, Austria Academy of Sciences, Vienna

  • ADesc Command in STx

    Overview:

    ADesc is a facility (technically, a class library) for storing numeric parameters with an unlimited number of independent and dependent axes and a large - and theoretically unlimited - amount of data. It has been developed as a part of the Noidesc project, whose large amounts of numeric data have been expected to stress the existing, purely XML-based APar class to-and-beyond its limits. In practice, ADesc has proven to be highly efficient with parameters consisting of hundreds of thousands of values, thereby fully meeting the demands of Noidesc. It is expected to meet the demands of challenging future projects as well.

    ADesc fits into the existing STx design by offering an alternative to the existing APar class. Just like APar, the new ADesc stores parameters in the existing STx XML database. There are two ways of storing the numeric data:

    1. In-place in the XML database: This is the conventional way. It keeps all the benefits of XML storage (readable and editable, simple export and import to/from other software) without impairing performance for small and medium-sized parameters.
    2. Binary storage: For large parameters, there is an optional feature for binary storage. With ADesc binary storage, the parameter itself is still part of the XML database, keeping the advantages of the XML organization fully intact. Only the numeric data values of the axes themselves are stored as an external binary file. The XML axis data contains only a reference to that file and the position within the file. This keeps the XML database small and allows for very fast random access to data values.

    The user must decide which kind of storage to use. For large parameters containing hundreds of thousands of numerical values, the performance gain of binary storage may be significant (up to a factor of three for loading and saving the data). At the same time, the saving of space in the XML database by about the same factor (or, more accurately, quotient) increases the speed of the general handling of the XML database.

    Aside from performance, the main design criteria for the ADesc class library were flexibility and ease of use. ADesc provides for automatic unit conversion with most regularly used and predefined domains and units. More unusual situations may be handled with user-defined converter classes. There is even room for completely user-defined axes, thereby enabling things such as dynamically supplied data (e.g. live spectrogram) or data calculated on-the-fly.

    As a result of the positive experiences with the ADesc class and its performance, plans are in place to fully replace the existing APar class over time.

    Object Model:

    Each parameter is modeled by an instance of the ADesc class or of one of its derivations. There are several such classes derived from ADesc, each one optimized for a number of common cases. At this time, the following ADesc classes exist:

    1. ADesc: ADesc is the most general parameter class. It handles parameters with an arbitrary number of independent and dependent axes. It is also prepared for handling even infinite axes and dynamic axes, like axes whose values are supplied or computed at run-time.
    2. ADescX: AdescX is a simpler, less general variation of the most general ADesc, supporting neither infinite nor dynamic axes. Its internal storage is organized such that it matches the current way STx handles large tables. In the long run, it is expected to optimize the STx table handling, thereby possibly rendering ADescX redundant.
    3. ADesc0: ADesc0 models the special case of parameters without any independent axes.
    4. ADesc1: ADesc1 optimizes handling of parameters with exactly one independent axis and an arbitrary number of dependent axes. Storage organization is much simpler, rendering ADesc1 by far the fastest kind of ADesc parameter.
    5. ADesc2: ADesc2 efficiently handles parameters with exactly two independent axes and an arbitrary number of dependent axes. Storage organization is simpler and hence faster than with the general classes. The dedicated ADesc2 class has been supplied, because most parameters encountered so far have proven to have two axes.

    The axes of a parameter are modeled by classes derived from AAxis. In general, each axis has a domain (e.g. time or frequency), a unit (e.g. ms or Hz) and, if applicable, a reference value, i.e. a constant value based upon the axis values that have been computed. At this time, the following kinds of axes exist:

    1. AOrderAxis: The AOrderAxis is the only axis without a domain and unit. Its only property is its cardinality.
    2. AIncrementAxis: The AIncrementAxis has a fixed start value, a fixed offset, and a cardinality. Each value of the axis equals the sum of its predecessor and the offset value.
    3. AEnumerationAxis: The AEnumerationAxis stores a finite number of arbitrary values.
    4. ASegmentIncrementAxis: The ASegmentIncrementAxis is an AIncrementAxis whose values are relative to the beginning of a given STx audio segment.
    5. ASegmentEnumerationAxis: The ASegmentEnumerationAxis is an AEnumerationAxis whose values are relative to the beginning of a given STx audio segment.#
    6. ADependentAxis: Each dependent axis of a parameter is modeled by an instance of an ADependentAxis. The number of dependent axes and their data are restricted by the choice of the respective ADesc class used.

    The hierarchy of the most important classes making up the ADesc library is the following:

    Programming Interface:

    The ADesc programming interface is as orthogonal a design as possible. The basic access functions are called getValue, setValue, getValues, setValues, getValueMatrix, setValueMatrix, getNativeValues, and setNativeValues. They are available both for the whole parameter and for its individual axes. Depending on which object they are called upon, they also set or retrieve one or more values of the desired axis or axes.

    If the parameter modeled by ADesc is considered to be an n-dimensional space (n being the number of independent axes), each point in this space is uniquely described by an n-tuple of coordinates which is the argument to the respective get and set function. The coordinates may be supplied either as an STx vector or as a textual list.

    If there is only one dependent axis, the value for each given coordinate is the value of this axis at the respective coordinate. If there is more than one dependent axis, the value for a given coordinate is a vector of length m, such that m is the number of dependent axes. By specifying the index or the name of a desired dependent axis, the user gets the value of this axis at the respective coordinates. By not specifying this information, the caller gets the whole vector of dependent values at the respective coordinates. This maximizes the flexibility for the ADesc user and requires awareness of fewer distinct functions.

    Other than functions for retrieving one or more parameter values for a specific coordinate, there are also functions for retrieving a larger number of data at the same time. For example, with two-dimensional parameters (i.e. parameters with exactly two independent axes), there are the functions getValueMatrix and setValueMatrix for efficiently setting all of the data of an independent axis. For all parameters with at least one independent axis, there are the functions getValueVector and setValueVector for accessing the whole of an axis.

  • Amadee: Frame Theory for Sound Processing and Acoustic Holophon

    S&T cooperation project 'Amadee' Austria-France 2013-14, "Frame Theory for Sound Processing and Acoustic Holophony", FR 16/2013

    Project Partner: The Institut de recherche et coordination acoustique/musique (IRCAM)

  • Amadee: Time-Frequency Representation and Audio Perception

    Objective:

    For many important applications - like virtual reality, communication, sound design, audio compression & coding, and hearing aids - a mathematical representation that matches or approximates the perception of the human auditory system is needed. For solving this critical and prominent problem, a trans-disciplinary approach is necessary. The goals of this project are:

    • design, development and evaluation of new representations of audio signals,
    • development of new tools based on the mathematic theory of time-frequency (or time-scale) representations (Gabor, wavelet or other),
    • development of the mathematic background for applications in audio perception and psychoacoustics,
    • and evaluation of these representations with auditory and psychoacoustic tests.

    Partners:

    • R. Kronland-Martinet, S. Ystad, T. Necciari, Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux of the LMA / CRNS Marseille
    • S. Meunier, S. Savel, Acoustique perceptive et qualité de l’environnement sonore of the LMA / CRNS Marseille

    This is a partner project of the ANR project senSons.

  • ANACRES - Analysis and Acoustics Research

    Scientific and Technological Cooperation between Austria and Serbia (SRB 01/2018)

    Duration of the project: 01.07.2018 - 30.06.2020

     

    Project partners:

    Acoustics Research Institute, ÖAW (Austria)

    University of Vienna (Austria)

    University of Novi Sad (Republic of Serbia)

     

    Link to the project website: http://nuhag.eu/anacres

  • Automatic Speaker Identification

    Objective:

    The generation of speaker models is based on acoustic features obtained from speech corpora. From a closed set of speakers, the target speaker has to be identified in an unsupervised identification task.

    Method:

    Training and comparison recordings exist for every speaker. The training set is used to generate parametric speaker models (Gaussian Mixture Models – GMMs), while the test set is needed for the comparisons of all test recordings to all models. The model with the highest similarity (maximum likelihood) is chosen as the target speaker. The efficiency of the identification task is measured as the identification rate (i.e. the number of correctly chosen target speakers).

    Application:

    Aside from biometric commercial applications, the forensic domain is another important field where speaker identification is used. Because speaker identification is a closed-set classification task, it is useful in cases where a target speaker has to be selected from a set of known speakers (e.g. in the case of hidden observations).

  • Automatic Speaker Verification

    Objective:

    The offender recording (a speaker recorded at the scene of a crime) is verified by determining the similarity of the offender recording's typicality to a recording of a suspect.

    Method:

    A universal background model (UBM) is generated via the training of a parametric Gaussian Mixture Model (GMM) that reflects the distribution of feature vectors in a reference population. Every comparison recording is used to derive a GMM from the UBM by adaptation of the model parameters. Similarity is measured through computation of the likelihoods of the offender recordings in the GMM while typicality is measured by computation of the likelihoods of the offender recordings in the UBM. The verification is expressed as the likelihood ratio of these likelihood values.

    Application:

    While fully unsupervised automatic verification is performed with a binary decision using a likelihood ratio threshold and is used in biometric commercial applications, the usage of the likelihood ratio as an expression of the strength of the evidence in forensic speaker verification has become an important issue.

  • BanachFrameMul: Bessel and Frame Multipliers in Banach Spaces

    Objective:

    Another project has investigated the basic properties of frame and Bessel multipliers. This project aims to generalize this concept so that it will work with Banach spaces also.

    Method:

    As the Gram matrix plays an important role in the investigation of multipliers, it is quite natural to look at the connection to localized frames and multipliers. The dependency of the operator class on the symbol class can be researched.

    The following statements will be investigated:

    • Theorem: If G is a localized frame and a is a bounded sequence, then the frame multiplier Ta is bounded on all associated Banach spaces (the associated co-orbit spaces).
    • Theorem: If G is a localized frame and a is a bounded sequence, such that the frame multiplier Ta is invertible on the Hilbert space H, then Ta is simultaneously invertible on the associated Banach spaces.

    The applications of these results to Gabor frames and Gabor multipliers will be further investigated.

    Application:

    Although Banach spaces are more general a concept than Hilbert spaces, Banach theory has found applications. For example, if any norm other than L2 (least square error) is used for approximation, Banach theory tools have to be applied.

    Partners:

    • K. Gröchenig, NuHAG, Faculty of Mathematics, University of Vienna

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC.