Completed

  • Objective:

    The boundary element method (BEM) is an often used tool for numerically solving acoustic radiation and reflection problems. Most of the time, a formulation in the frequency domain can be used, however, for short impulses  or when the acoustic simulation is coupled with a non-linear behaviour caused by structure deformation, a formulation in the time domain is necessary.

    Method:

    The boundary integral equations and the fundamental solution necessary for the BEM in the time domain are derived by inverse Fourier transformation of the corresponding formulations in the frequency domain. These equations are then discretized using the Galerkin method in the spatial dimensions and the collocation method in the time dimension. The MOT (Marching-On-in-Time) method is used to solve the resulting system of equations. The well known stability problem of the MOT-method is handled by using the Burton-Miller approach in combination with the Galerkin method in the spatial discretization and high order temporal interpolations. It is well known that these measures enhance the stability of MOT.

    Additionally it is planned to enhance the efficiency of the method by using a modified plane wave time decomposition (PWTD) algorithm.

  • The FWF project "Time-Frequency Implementation of HRTFs" has started.

    Principal Investigator: Damian Marelli

    Co-Applicants: Peter BalazsPiotr Majdak

  • Objective:

    Speaker models generated from training recordings of different speakers should differentiate between speakers. These models are estimated using feature vectors that are based on acoustic observations. So, the feature vectors should themselves show a high degree of inter-speaker variability and a low degree of intra-speaker variability.

    Method:

    Cepstral coefficients of transformed short-time spectra (e.g. Mel-Frequency Cepstral Coefficients - MFCC) are experimentally developed features that are widely used in the domain of automatic speech and speaker detection. Because of the manifold possibilities of parameters for the feature extraction process and the lack of theoretically motivated explanations for the determination of the last-mentioned, only a stepwise investigation of the extraction process can lead to stable acoustic features.

    Application:

    Optimized acoustic features for the representation of speakers enables the improvement of automatic speaker identification and verification. Additionally, the development of methods for forensic investigation of speakers (manually and automatically) is supported.

  • Objective:

    Acoustic holography is a mathematical tool for the localization of sources in a coherent sound field.

    Method:

    Using the information of the sound pressure in one plane, the whole three-dimensional sound field is reconstructed. The sound field must be coherent and the half-space in which the sources are situated must be known.

    Application:

    Acoustic holography is used to calculate the sound field in planes parallel to the measured plane. Normally, a plane near the hull of the structure is chosen. Concentrations in the plane are assumed to be the noise source.

  • Objective:

    The Acoustic Measurement Tool at the Acoustics Research Institute (AMTatARI) has been developed for the automatic measurement of system properties of electro-acoustic systems like loudspeakers and microphones. As a special function, this tool allows an automatic measurement of Head Related Transfer Functions (HRTF). 

    Measurement of the following features has been implemented so far:

    • total harmonic distortion (THD)
    • signal in noise and distortions (SINAD)
    • impulse response

    The impulse responses can be measured with the Maximum Length Sequences (MLS) or with exponential sweeps. Whereas, in case of the sweeps, the new multiple exponential sweep method (MESM) is available. This method is also used to measure HRTFs with AMTatARI.

  • Objective:  

    The aim of this project is to conduct basic research on the audio-visual speech synthesis of Austrian dialects. The project extends our previous work on

    Method:

    10 speakers (5 male and 5 female) will be recorded for each dialect. The recordings comprise spontaneous speech, read speech and naming tasks, eliciting substantial phonemic distinctions and phonotactics. Consequently, a detailed acoustic-phonetic and phonological analysis will be performed for each dialect. Based on the acoustic-phonetic and phonological data analysis, 600 phonetically balanced sentences will be created and recorded with 4 speakers (2 male, 2 female) for each dialect. In these recordings the acoustic and the visual signal, resulting from the same speech production process, will be recorded jointly to account for the multimodal nature of human speech. The recorded material will serve as a basis for the development, training, and testing of speech synthesizers at the Telecommunications Research Center.

    Funding:

    FWF (Wissenschaftsfonds): 2011-2013

    Project Manager: Michael Pucher, Telecommunications Research Center, Vienna

    Project Partner: Sylvia Moosmüller, Acoustics Research Institute, Austria Academy of Sciences, Vienna

  • Overview:

    ADesc is a facility (technically, a class library) for storing numeric parameters with an unlimited number of independent and dependent axes and a large - and theoretically unlimited - amount of data. It has been developed as a part of the Noidesc project, whose large amounts of numeric data have been expected to stress the existing, purely XML-based APar class to-and-beyond its limits. In practice, ADesc has proven to be highly efficient with parameters consisting of hundreds of thousands of values, thereby fully meeting the demands of Noidesc. It is expected to meet the demands of challenging future projects as well.

    ADesc fits into the existing STx design by offering an alternative to the existing APar class. Just like APar, the new ADesc stores parameters in the existing STx XML database. There are two ways of storing the numeric data:

    1. In-place in the XML database: This is the conventional way. It keeps all the benefits of XML storage (readable and editable, simple export and import to/from other software) without impairing performance for small and medium-sized parameters.
    2. Binary storage: For large parameters, there is an optional feature for binary storage. With ADesc binary storage, the parameter itself is still part of the XML database, keeping the advantages of the XML organization fully intact. Only the numeric data values of the axes themselves are stored as an external binary file. The XML axis data contains only a reference to that file and the position within the file. This keeps the XML database small and allows for very fast random access to data values.

    The user must decide which kind of storage to use. For large parameters containing hundreds of thousands of numerical values, the performance gain of binary storage may be significant (up to a factor of three for loading and saving the data). At the same time, the saving of space in the XML database by about the same factor (or, more accurately, quotient) increases the speed of the general handling of the XML database.

    Aside from performance, the main design criteria for the ADesc class library were flexibility and ease of use. ADesc provides for automatic unit conversion with most regularly used and predefined domains and units. More unusual situations may be handled with user-defined converter classes. There is even room for completely user-defined axes, thereby enabling things such as dynamically supplied data (e.g. live spectrogram) or data calculated on-the-fly.

    As a result of the positive experiences with the ADesc class and its performance, plans are in place to fully replace the existing APar class over time.

    Object Model:

    Each parameter is modeled by an instance of the ADesc class or of one of its derivations. There are several such classes derived from ADesc, each one optimized for a number of common cases. At this time, the following ADesc classes exist:

    1. ADesc: ADesc is the most general parameter class. It handles parameters with an arbitrary number of independent and dependent axes. It is also prepared for handling even infinite axes and dynamic axes, like axes whose values are supplied or computed at run-time.
    2. ADescX: AdescX is a simpler, less general variation of the most general ADesc, supporting neither infinite nor dynamic axes. Its internal storage is organized such that it matches the current way STx handles large tables. In the long run, it is expected to optimize the STx table handling, thereby possibly rendering ADescX redundant.
    3. ADesc0: ADesc0 models the special case of parameters without any independent axes.
    4. ADesc1: ADesc1 optimizes handling of parameters with exactly one independent axis and an arbitrary number of dependent axes. Storage organization is much simpler, rendering ADesc1 by far the fastest kind of ADesc parameter.
    5. ADesc2: ADesc2 efficiently handles parameters with exactly two independent axes and an arbitrary number of dependent axes. Storage organization is simpler and hence faster than with the general classes. The dedicated ADesc2 class has been supplied, because most parameters encountered so far have proven to have two axes.

    The axes of a parameter are modeled by classes derived from AAxis. In general, each axis has a domain (e.g. time or frequency), a unit (e.g. ms or Hz) and, if applicable, a reference value, i.e. a constant value based upon the axis values that have been computed. At this time, the following kinds of axes exist:

    1. AOrderAxis: The AOrderAxis is the only axis without a domain and unit. Its only property is its cardinality.
    2. AIncrementAxis: The AIncrementAxis has a fixed start value, a fixed offset, and a cardinality. Each value of the axis equals the sum of its predecessor and the offset value.
    3. AEnumerationAxis: The AEnumerationAxis stores a finite number of arbitrary values.
    4. ASegmentIncrementAxis: The ASegmentIncrementAxis is an AIncrementAxis whose values are relative to the beginning of a given STx audio segment.
    5. ASegmentEnumerationAxis: The ASegmentEnumerationAxis is an AEnumerationAxis whose values are relative to the beginning of a given STx audio segment.#
    6. ADependentAxis: Each dependent axis of a parameter is modeled by an instance of an ADependentAxis. The number of dependent axes and their data are restricted by the choice of the respective ADesc class used.

    The hierarchy of the most important classes making up the ADesc library is the following:

    Programming Interface:

    The ADesc programming interface is as orthogonal a design as possible. The basic access functions are called getValue, setValue, getValues, setValues, getValueMatrix, setValueMatrix, getNativeValues, and setNativeValues. They are available both for the whole parameter and for its individual axes. Depending on which object they are called upon, they also set or retrieve one or more values of the desired axis or axes.

    If the parameter modeled by ADesc is considered to be an n-dimensional space (n being the number of independent axes), each point in this space is uniquely described by an n-tuple of coordinates which is the argument to the respective get and set function. The coordinates may be supplied either as an STx vector or as a textual list.

    If there is only one dependent axis, the value for each given coordinate is the value of this axis at the respective coordinate. If there is more than one dependent axis, the value for a given coordinate is a vector of length m, such that m is the number of dependent axes. By specifying the index or the name of a desired dependent axis, the user gets the value of this axis at the respective coordinates. By not specifying this information, the caller gets the whole vector of dependent values at the respective coordinates. This maximizes the flexibility for the ADesc user and requires awareness of fewer distinct functions.

    Other than functions for retrieving one or more parameter values for a specific coordinate, there are also functions for retrieving a larger number of data at the same time. For example, with two-dimensional parameters (i.e. parameters with exactly two independent axes), there are the functions getValueMatrix and setValueMatrix for efficiently setting all of the data of an independent axis. For all parameters with at least one independent axis, there are the functions getValueVector and setValueVector for accessing the whole of an axis.

  • S&T cooperation project 'Amadee' Austria-France 2013-14, "Frame Theory for Sound Processing and Acoustic Holophony", FR 16/2013

    Project Partner: The Institut de recherche et coordination acoustique/musique (IRCAM)

  • Objective:

    The generation of speaker models is based on acoustic features obtained from speech corpora. From a closed set of speakers, the target speaker has to be identified in an unsupervised identification task.

    Method:

    Training and comparison recordings exist for every speaker. The training set is used to generate parametric speaker models (Gaussian Mixture Models – GMMs), while the test set is needed for the comparisons of all test recordings to all models. The model with the highest similarity (maximum likelihood) is chosen as the target speaker. The efficiency of the identification task is measured as the identification rate (i.e. the number of correctly chosen target speakers).

    Application:

    Aside from biometric commercial applications, the forensic domain is another important field where speaker identification is used. Because speaker identification is a closed-set classification task, it is useful in cases where a target speaker has to be selected from a set of known speakers (e.g. in the case of hidden observations).

  • Objective:

    The offender recording (a speaker recorded at the scene of a crime) is verified by determining the similarity of the offender recording's typicality to a recording of a suspect.

    Method:

    A universal background model (UBM) is generated via the training of a parametric Gaussian Mixture Model (GMM) that reflects the distribution of feature vectors in a reference population. Every comparison recording is used to derive a GMM from the UBM by adaptation of the model parameters. Similarity is measured through computation of the likelihoods of the offender recordings in the GMM while typicality is measured by computation of the likelihoods of the offender recordings in the UBM. The verification is expressed as the likelihood ratio of these likelihood values.

    Application:

    While fully unsupervised automatic verification is performed with a binary decision using a likelihood ratio threshold and is used in biometric commercial applications, the usage of the likelihood ratio as an expression of the strength of the evidence in forensic speaker verification has become an important issue.

  • Objective:

    Another project has investigated the basic properties of frame and Bessel multipliers. This project aims to generalize this concept so that it will work with Banach spaces also.

    Method:

    As the Gram matrix plays an important role in the investigation of multipliers, it is quite natural to look at the connection to localized frames and multipliers. The dependency of the operator class on the symbol class can be researched.

    The following statements will be investigated:

    • Theorem: If G is a localized frame and a is a bounded sequence, then the frame multiplier Ta is bounded on all associated Banach spaces (the associated co-orbit spaces).
    • Theorem: If G is a localized frame and a is a bounded sequence, such that the frame multiplier Ta is invertible on the Hilbert space H, then Ta is simultaneously invertible on the associated Banach spaces.

    The applications of these results to Gabor frames and Gabor multipliers will be further investigated.

    Application:

    Although Banach spaces are more general a concept than Hilbert spaces, Banach theory has found applications. For example, if any norm other than L2 (least square error) is used for approximation, Banach theory tools have to be applied.

    Partners:

    • K. Gröchenig, NuHAG, Faculty of Mathematics, University of Vienna

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC.

  • Objective:

    The applications involving signal processing algorithms (like adaptive or time variant filters) are numerous. If the STFT, the Short Time Fourier Transformation, is used in its sampled version, the Gabor transform, the use of Gabor multipliers creates a possibility to construct a time-variant filter. The Gabor transform is used to calculate time frequency coefficients, which are multiplied with a fixed time-frequency mask. Then the result is synthesized. If another way of calculating these coefficients is chosen or if another synthesis is used, many modifications can still be implemented as multipliers. For example, it seems quite natural to define wavelet multipliers. Therefore, for this case, it is quite natural to continue generalizing and look at multipliers with frames lacking any further structure.

    Method:

    Therefore, for Bessel sequences, the investigation of operators

    M = ∑ mk < f , ψk > φk

    where the analysis coefficients, < f , ψk >, are multiplied by a fixed symbol mk before resynthesis (with φk), is very natural and useful. These are the Bessel multipliers investigated in this project. The goal of this project is to set the mathematical basis to unify the approach to the Bessel multipliers for all possible analysis / synthesis sequences that form a Bessel sequence.

    Application:

    Bessel sequences and frames are used in many applications. They have the big advantage of allowing the possibility to interpret the analysis coefficients. This makes the formulation of a multiplier concept for other analysis / synthesis systems very profitable. One such system involves gamma tone filter banks, which are mainly used for analysis based on the auditory system.

    Publications:

    • Balazs, P. (2007), "Basic Definition and Properties of Bessel Multipliers", Journal of Mathematical Analysis and Applications, 325, 1: 571--585. doi:10.1016/j.jmaa.2006.02.012, preprint

    Project-completion:

    This project ended on 01.01.2007. Its completion allowed the sucessfull application for a 'High Potential'-Project of the WWTF, see MULAC.

  • Objective:

    Practical experience has shown that the concept of an orthonormal basis is not always useful. This led to the concept of frames. Models in physics and other application areas, including sound vibration analysis, are mostly continuous models. Many continuous model problems can be formulated as operator theory problems, as in differential or integral equations. An interesting class of operators is the Hilbert Schmidt class. This project aims to find the best approximation of any matrix by a frame multiplier, using the Hilbert Schmidt norm.

    Method:

    In finite dimensions, every sequence is a frame sequence, so the best approximation of any element can be found only via the frame operator using the dual frame for synthesis. Furthermore, the present best approximation algorithm involves the following steps: 1) The Hilbert-Schmidt inner product of the matrix and the projection operators involved is calculated in an efficient way; 2) Then the pseudo inverse of the Grame matrix is used to avoid the so-called calculation of the dual frames; The pseudo inverse is applied to the coefficients found above to find the lower symbol of the multiplier.

    Application:

    To find the best approximation of matrices via multipliers gives a way to find efficient algorithms to implement such operators. Any time-variant linear system can be modeled by a matrix. Time-invariant systems can be described as circulating matrices. Slowly-time-varying linear systems have a good chance at closely resembling Gabor multipliers. Other matrices can be well approximated by a "diagonalization" using other frames.

    Publications:

    • P. Balazs, "Hilbert-Schmidt Operators and Frames - Classification, Approximation by Multipliers and Algorithms" , International Journal of Wavelets, Multiresolution and Information Processing, (2007, accepted)  preprint, Codes and Pictures: here

    Project-completion:

    This project ended on 01.01.2009. Its completion allowed the sucessfull application for a 'High Potential'-Project of the WWTF, see MULAC

  • Objective:

    The dependency of perceived loudness from electrical current in Cochlear Implant (CI) stimulation has been investigated in several existing studies. This investigation has two main goals:

    1. To study the efficiency of an adaptive method to determine the loudness function.
    2. To measure the loudness function in binaural as well as monaural stimulation.

    Method:

    Loudness functions are measured at single electrodes (or interaural electrode pairs) using the method of categorical loudness scaling. The efficiency of this method for hearing impaired listeners has been demonstrated in previous studies (Brand and Hohmann, JASA 112, p.1597-1604). Both an adaptive method and the method of constant stimuli are used. Binaural functions are measured subsequently to monaural function, including monaural measurements as control conditions.

    Application:

    The results indicate the suitability and efficiency of the adaptive categorical loudness scaling method as a tool for the fast determination of the loudness function. This can be applied to the clinical fitting of implant processors as well as for pre-measurements in psychoaoustic CI studies. The measurement results also provide new insights into monaural and binaural loudness perception of CI listeners.

    Funding:

    internal

    Publications:

    • Wippel, F., Majdak, P., and Laback, B. (2007). Monaural and binaural categorical loudness scaling in electric hearing, presented at Conference on Implantable Auditory Prostheses (CIAP), Lake Tahoe.
    • Wippel, F. (2007). Monaural and binaural loudness scaling with cochlea implant listeners, master thesis, Technical University Vienna, Autrian Academy of Sciences (in German)
  • Objective:

    In order to numerically calculate individual head-related transfer functions (HRTFs), a boundary element model (BEM) was developed. This model makes it possible to calculate the sound pressure at the head that is caused by different external sound sources with frequencies up to 20,000 Hz.

    Method:

    In engineering, the traditional BEM is widely used for solving problems. However, the computational effort of the BEM grows quadratically with the number of unknowns. This is one reason why the traditional BEM cannot be used for large models, even on highly advanced computers. In order to calculate the sound pressure at the head at high frequencies, very fine meshes need to be used. These meshes result in large systems of equations. Nevertheless, to be able to use the BEM, the equations must be combined with the Fast Multipole Method (FMM). With the FMM, the resulting matrices can be kept smaller, thus allowing the numeric solving of the Helmholtz equation with feasible effort and almost no accuracy loss as compared to the traditional BEM.

    Application:

    The geometry of the head (especially the form of the outer ear or pinna) acts as a kind of filter. This geometry is very important in localizing sound in the vertical direction and distinguishing between sounds coming from the front or the back. The BEM model can be used to numerically calculate these filter functions, which are dependent on the position and the frequency of the sound source.

    Funding:

    FWF (Austrian Science Fund): Project #P18401-B15

    Publications:

    • Kreuzer, W., Majdak, P., Chen, Z. (2009): Fast multipole boundary element method to calculate head-related transfer functions for a wide frequency range, in: J. Acoust .Soc. Am. 126, 1280-1290.
    • Kreuzer, W.  and Chen, Z. S. (2008). "A Fast Multipole Boundary Element Method for calculating HRTFs," AES preprint  7020, AES Convention, Vienna.
  • Objective:

    Upon first investigation, the design of new outward-curved noise barriers has an improved noise-shielding effect if absorbing material is applied. Further investigation shall prove this ability. Numeric simulations and measurements are being processed.

    Method:

    Advanced boundary element methods (BEM) in two dimensions will prove the noise-shielding ability of the sound barrier. Different curvy and straight designs are compared to each other with respect to their shielding effect in the spectrum. Measurements at existing walls are processed and compared. Measurements are conducted without a noise barrier. A simulated softening affect of the noise barrier walls is used to simulate the noise signal behind the new barriers.

    Application:

    Calma Tec has patented the designs and will offer new designs in practice.

    List of Deliverables:

    01. Traffic Noise Recording Plan. 02. Sound Data Storage, Retrieval and Spectrographic Description. 03. Descriptive Noise Statistics. 04. Pricipal Component Analysis. 05. Sound Barrier Mesh Models. 06. Simulation, Transfer Functions & Clustering. 07. Visualization. 08. Psychoacoustic Irrelevance. 09 Modulation Effects. 10. Subjective Preference Tests. 11. Conclusions

  • Objective:

    A recently developed stimulation strategy for cochlear implants attempts to encode temporal fine structure information, which is known to be important in perceiving pitch and interaural time differences (ITD). So-called "sequences" of pulses are triggered with each zero-crossing of the acoustic input waveform. It is expected that adaptation effects at the auditory nerve level limit the information flow. The goal of this project is to find optimum parameter values for this new stimulation strategy, which is intended to be applied in clinical applications.

    Method:

    The effects of a parameter's pulse rate within each sequence, the number of sequences per second, and the temporal shape of the sequence on ITD perception are studied systematically.

    Application:

    The optimum parameter values determined in the experiments are intended to be used in the clinical application of the new stimulation strategy.

  • HIGH SPEED TRAINS

    The Austrian OeBB-HL-AG company performed tests with high-speed train ICE-S in 2004. A test rail section was adapted to the for a time period of a week. The train was driven with speed from 200 to over 300 km/h.

    We had the opportunity to record the noise emissions caused by the train. This was a great chance to test our equipment such as microphone array and outdoor microphone recording system.

  • Objective:

    Redesign the Institute's homepage using a Content Management System (CMS) to facilitate easy actualization by all Institute employees, easy extension of the homepage functionality and a consistent style.

    Method:

    The CMS 'Mambo' (today: Joomla) was chosen from the available open-source systems. The homepage was redesigned. The homepage content was transferred.

    Application:

    If employees can easily update their content from any web browser, the homepage will be more up-to-date.

  • Objective:

    The tsetse fly genus Glossina is a carrier of the sleeping sickness and of the Nagana epidemic, which affect the ungulates. Over the past years, the sicknesses carried by the tsetse flies has spread so rapidly that intensified disease-fighting measures were necessary. One of the most effective methods is the exposure of a sterile male. The sterilized flies are raised on a large scale using radiation and then released. During the culture, a continuous control of the quality of the flies is necessary. The objective of this project is to develop an acoustic quality check for the sterile males. The research has demonstrated that the quality control is only possible using the sound activity of the flies.

    Method:

    The tsetse fly uses its flying apparatus to produce sounds in addition to flying. Whereas the flying noise consists mainly of low frequent parts (<2000Hz) with only a few tonal parts, the "singing" consists of tonal components in the range of ca. 1-8kHz. For the detection of the singing, a high-pass-filtered spectrum of the interested frequency range is calculated (using DCT). From this spectrum, three parameters are extracted (energy in local peaks, 95 percent energy bandwidth, variance of the amplitudes), which are suitable for the determination of sounds with distinctive components. These single parameters are converted in weight values between 0 and 1 by using trigger functions. Afterwards, they are merged. The thresholds of the trigger functions are investigated in a separate measuring run from the background signal. The test version of this method was implemented in STx.

    Application:

    The program will be tested on the testse flies at the laboratory in the 2006/2007 winter semester. As a result of initial tests, it will probably be enhanced. As of 2007, the program is planned to be put into practice in an African institute.