Project

  • Project Objective:

    This project aims to implement generally applicable database functions in STOOLS-STx as, for example, the access to sound and metadata management of segments and annotations (list, sort, select etc.). The essence (sound data), segmentations, and manually compiled and calculated annotations (e.g. wave band level) are the basis of an integrated sound database. The essential demand is that the original sound data remain unchanged. Therefore the metadata have to be filed in separate *.xml files. In this way, a dynamic management of the metadata will be possible, but only the sound files should be opened in a write-protected way.

    Method:

    The segment lists implemented in STx keep all the annotations linked to each individual sound segment. The segments stay in the context of the continuous sound recordings and enable segment addressing, which is made of the numeric terms, treatment, and tapping of the segments (including the acoustic surrounding).

    Application:

    Signal databases are the basis for practically all applications that use realistic sound material. The time frequency representation, statistics, main component analysis, cluster analysis, etc. belong to these signal databases. Furthermore, signal databases are used for the realization of subjective evaluations and psychoacoustic experiments. The STx databases use more than 100 sound files and thousands of segments in a very short time.

    Ref.:

    PACS: 43.50.Rq; Project: NOIDESc: Deskriptoren zur Bewertung von Lärmsignalen (FFG-809085, bmvit-isb2). PACS: 43.72.Fx; Project: Akustische Phonetik, Sprechererkennung.

  • Beschreibung

    Entwicklung eines Frameworks zur Analyse und Resynthese (Phasenvocoder) und zur graphisch unterstützten Modifikation von Signalen. Die Implementierung erfolgt als STx Script.

    Zur Realisierung dieses Projekts wird die STx Grafik um ein allgemeines Selektions-Tool erweitert (Selektion beliebiger Bereiche). Außerdem wird ein File-Objekt entwickelt, das gleichzeitig vom Script (zur Datengenerierung bzw. Modifikation) und von der Grafik verwendet werden kann.

  • Objective:

    Extend the S_TOOLS-STx graphics library to enable graphs to be rotated and flipped horizontally or vertically without having to recalculate the data.

    Method:

    The base graphics object class was modified, adding rotation and flip settings. A new macro command was implemented to set a graph's rotation-and-flip values. Each graphic object was then modified to display itself correctly according to these settings.

    Application:

    Rotating and flipping graphical representations of data can now be achieved with on-macro command, rather than having to modify the data itself.

  • Description:

    A console is extremely helpful when developing scripts and applications. This new script console can execute (almost) all STx commands and allows access to variables and objects. The console was implemented with the help of an extended edit control and a macro class. Command line oriented versions of current STx functionality will be developed as and when needed.

    The script debugger was extended during the console development, and a number of thread synchronisation problems were solved.

  • Description:

    The "STx Script Programming" guide written for internal workshops has been integrated into the online help with the title "Becoming an STx Guru (in 538.338 simple steps)". This chapter contains a general description of the STx script language (including tips and tricks) and an overview of several important STx components and attributes (e.g. shell items).

  • Description:

    The methods for extracting speech parameters, most notably formants have been improved. Firstly, a new method was implemented, which measures formants taking some formant attributes (a model) into account. The model uses frequency range and rate of change attributes. Tracking is limited to the voiced parts of the signal. A test version of this method was integrated into the STx speech analysis application SPEXL.

  • Project Objective:

    The project aims to implement an automatic sound recording system that would allow continuous sound recordings for any length of time (several weeks) without user intervention. The long-term investigation of sound data is used for the observation of noise emission from machines in continuous operation and for the documentation of noise situations. The hardware and system complexity are to be restricted to standard measurement microphones and standard PCs or for continuously running, suitable laptops with external disk storage units.

    Method:

    The recorder contained in the standard S_TOOLS-STx software package is controlled by the macro programming so that the sound files (about the length of one hour) are generated consecutively with date-and-time specification and then written on the hard disk. The intelligent segmentation algorithms insert automatic "tags" and annotations in real-time or in post-processing. The segmentation data are administrated dynamically and enable the direct inclusion of the recorded sound events in the sound data files, signal analytical processes, and statistical processes. Using the currently available storage units, e.g. disk storage on the scale of 1.28 TByte, continuous measuring (2-canal stereo, 44.1. kHz, 16bit) over a period of 2.5 months is possible.

    Application:

    For the investigation of noise emissions, e.g. traffic and environmental noise, permanent control stations that measure all sound sources in their time context are needed. Only with a broad analysis of the whole situation can noise pollution and health risks be collected.

    Ref.:

    PACS: 43.50.Rq; Project: NOIDESc: Deskriptoren zur Bewertung von Lärmsignalen (FFG-809085, bmvit-isb2).

  • Description:

    Implement a method for up-sampling and down-sampling a signal. The conversion is done by resampling the median of the original signal after applying an ideal TP filter (sinx/x). The method is implemented as an SPAtom and integrated into the SPEXL speech analysis script.

    Application:

    Convert recordings with unusual sampling rates, match recordings with different sampling rates, and correct 'incorrect' sampling rates.

  • Objective:

    STx includes a number of applications with which one can segment and transcribe signals. All of these applications, however, have additional functionality not necessary for transcription (e.g. parameter extraction), and have not been optimised with transcription in mind. Since transcription and segmentation represents a large part of the time spent on speach analysis, this projects aim is to develop an application which includes everything to need for the job, and is easy to use. The application is included in the speach analysis script SPEXL.

    Features:

    • waveform and spectrogram signal representations
    • convenient signal bracketing, zoom and playback functionality
    • existing segments displayed in graphs and as list
    • direct input of text and metadata
    • configurable attributes (segment templates)
    • interface to all STx analysis applications
    • hotkeys for all essential functions
  • Description:

    The time/frequency transformations (wavelets, Cohen's class distributions developed in earlier projects (2006/2007) have been integrated into the spectrogram and parameter viewer in STx.

  • Objective:

    SysBahnLärm was a joint project of the ARI with the TU Vienna the Austrian Railways and industrial partners funded by the FFG as well as the ÖBB. Aim of the project was to create a handbook on the systemic reduction of railway noise. The ARI was responsible for the psychoacoustic evaluation of the effects of noise from wheels with different roughness and of different noise reduction systems e.g. rail damping systems. Further, the ARI investigated the emission pattern of the rail-wheel contact using our 64-channel microphone array.

    Method:

    Using measured train pass-by signals, a psychoacoustic testing procedure was developed and stimuli for this test were selected. Subjects had to rate the relative annoyance of different trains or different noise reduction systems with respect to each other.
    For investigating the rail-wheel contact, a beamforming technique was used in order to determine the point of the maximal emission relative to the top of the rail.

    Application:

    The handbook should act as a guideline for the different noise reduction measures and their respective advantages and problems.

  • Objective:

    A Gaussian Atom is suitable as an ideal atom for the time frequency representation of the human audio perception. This is not only because of the Gaussian Atom's special mathematic features, but also because of results from existing psychoacoustic studies. Developing a time-frequency mask (occlusion) requires testing the time-frequency masking effects of this atom. So far, short-tape limited signals have not been investigated in masking experiments. Relatively few psychoacoustic experiments have been explored completely, and these have been combined with time-frequency effects.

    Method:

    In cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille, an experimental protocol was developed for testing the time-frequency method of a singular Gaussian atom. Experiments were made for the first time in 2006, and gave the first results concerning the hearing threshold and the existence of such a signal. The experiments that included the masking threshold began as a PhD project before the end of 2006 in Marseille.

    Application:

    Efficient implementation of a masking filter offers many applications:

    • Sound / Data Compression
    • Sound Design
    • Back-and-Foreground Separation
    • Optimization of Speech and Music Perception

    After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx

    Subprojects:

    • Amadée: Time Frequency Representations and Auditory Perception
    • Cotutelle de thèse
    • Experiments studying additivity of masking for multiple maskers

    Funding:

    WTZ (project AMADEUS)

    Publications:

    • Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference.
  • Objective:

    The most basic model for convolution algorithms is an extension of the simultaneous irrelevance model. A triangle-like function describes the masking effect in the frequency and time direction. Combined, they result in a 2-D function, which is used as convolution on the time-frequency coefficients of the given signal. The resulting information is then used to calculate a threshold function. This can be implemented as a Gabor multiplier. This very simple function should be exchanged for a more elaborate 2-D kernel. A more elaborate 2-D kernel can be developed from the first time frequency masking effect measurements of a Gaussian atom.

    Method:

    An extension of the simultaneous irrelevance model is used as the most basic model for the convolution algorithm under investigation. A triangle-like function describes the masking effect in the frequency and time direction. Combined, they result in a 2-D function, which is used as convolution on the time-frequency coefficients of the given signal to calculate a threshold function. This can be implemented as a Gabor multiplier. This very simple function should be exchanged for a more elaborate 2-D kernel developed from the first time-frequency masking effect measurements of a Gaussian atom.

    Application:

    After thoroughly testing this algorithm in psychoacoustic experiments, it will be implemented in STx.

    Partners:

    • R. Kronland-Martinet, S. Ytad, T. Necciari, Modélisation, Synthèse et Contrôle des Signaux Sonores et Musicaux of the LMA / CRNS Marseille
    • S. Meunier, S. Savel, Acoustique perceptive et qualité de l’environnement sonore of the LMA / CRNS Marseille

    Project-completion:

    This project ended on 28.02.2008 and is incorporated into the 'High Potential'-Project of the WWTF, MULAC.

  • Objective:

    Up to now, a thorough phonetic-acoustic and phonological description of the vowels and the vowel system of Standard Austrian German has not been provided.

    Method:

    Approximately 11,000 vowels of three female and three male speakers of Standard Austrian German have been segmented and analyzed acoustically.

    Results:

    Standard Austrian German discerns 13 vowels on five constriction locations:

    • pre-palatal for the /i/ and the /y/ vowels
    • mid-palatal for the /e/ and the /ø/ vowels
    • velar for the /u/ vowels
    • upper pharyngeal for the /o/ vowels
    • lower pharyngeal for /ɑ/

    Each vowel pair consists of a constricted and an unconstricted vowel. The front vowels (pre-palatal and mid-palatal) additionally distinguish rounded and unrounded vowels. The following articulatory features sufficiently discriminate all vowels:

    • [± constricted]
    • [± front]
    • [± prepalatal]
    • [± pharyngeal]
    • [± round]

    Contrary to general assumptions, F1 and F2 do not sufficiently discern the vowels of Standard Austrian German; F3 is necessary as well. Discriminatory ability is maintained over all speaking styles and prosodic positions.

  • Multilateral Scientific and Technological Cooperation in the Danube Region 2017-2018
    Austria, Czech Republic, Republic of Serbia, and Slovak Republic
    Project duration: 01.01.2017 - 31.12.2018

    Project website: nuhag.eu/tifmofus

  • Objectives:

    In the context of binaural virtual acoustics, a sound source is positioned in a free-field 3-D space around the listener by filtering it via head-related transfer functions (HRTFs). In a real-time application, numerous HRTFs need to be processed. The long impulse responses of the HRTFs require a high computational power, which is difficult to directly implement on current processors in situations involving more than a few simultaneous sources.

    Technically speaking, an HRTF is a linear time-invariant (LTI) system. An LTI system can be implemented in the time domain by direct convolution or recursive filtering. This approach is computationally inefficient. A computationally efficient approach consists of implementing the system in the frequency domain; however, this approach is not suitable for real-time applications since a very large delay is introduced. A compromise solution of both approaches is provided by a family of segmented-FFT methods, which permits a trade-off between latency and computational complexity. As an alternative, the sub-band method can be applied as a technique to represent linear systems in the time-frequency domain. Recent work has showed that the sub-band method offers an even better tradeoff between latency and computational complexity than segmented-FFT methods. However, the sub-band analysis is still mathematically challenging and its optimum configuration is dependant on the application under consideration.

    Methods:

    TF-VA involves developing and investigating new techniques for configuring the sub-band method by using advanced optimization methods in a functional analysis context. As a result, an optimization technique that minimizes the computational complexity of the sub-band method will be obtained.

    Two approaches will be considered: The first approach designs the time-frequency transform for minimizing the complexity of each HRTF. In the second approach, we will design a unique time-frequency transform, which will be used for a joint implementation of all HRTFs of a listener. This will permit an efficient implementation of interpolation techniques while moving sources spatially in real-time. The results will be evaluated in subjective localization experiments and in terms of localization models.

    Status:

    • Main participator: Damian Marelli (University of Newcastle, Australia)
    • Co-applicants: Peter Balazs, Piotr Majdak
    • Project begin: November 2011
    • Funding: Lise-Meitner-Programm of the Austrian Science Fund (FWF) [M 1230-N13]
  • Project Part 02 of the special research area German in Austria. Variation - Contact - Perception funded by FWF (FWF6002) in cooperation with the University of Salzburg

    Principal Investigators: Stephan Elspaß, Hannes Scheutz, Sylvia Moosmüller

    Start of the project: 1st of January 2016

    Project description:

    The diversity and dynamics of the various dialects in Austria are the topic of this project. Based on a new survey, different research questions will be addressed in the coming years, such as: What are the differences and changes (e.g. through processes of convergence and divergence) that can be observed within and between the Austrian dialect regions? What are the alterations in dialect change between urban and rural areas? Are there noticeable generational and gender differences with regard to dialect change? What can a comprehensive comparison of ‘real-time’ and ‘apparent-time’ analyses contribute to a general theory of language change?

    To answer these questions, speech samples from a total of 160 dialect speakers, balanced for age and gender, are collected and analysed within the first four years at 40 locations in Austria. Furthermore, samples from selected speakers will be recorded and valuated under laboratory conditions to determine phonetic peculiarities as precisely as possible. In the second survey phase complementary recordings are carried out at another 100 locations in Austria in order to analyse differences and changes between the dialect landscapes in more detail. State-of-the-art dialectometric methods will be used to arrive at probabilistic statements regarding dialect variation and change in Austria. The analyses will include all linguistic levels from phonetics to syntax and lexis. A documentation of these data will be carried out on the first visual and ‘talking’ dialect atlas of Austria.

    Project page of the project partners in Salzburg

     

  • Objective:

    One of the biggest problems encountered when building numerical models for layers is the lack of exact deterministic material parameters. Therefore, stochastic models should be use. However, these models have the general drawback of overusing computer resources. This project developed a stochastic model with the ability to use a shear modulus in conjunction with a special iteration scheme allowing efficient implementation.

    Method:

    With the Karhunen Loeve Expansion (KLE), it is possible to split the stochastic shear modulus, and therefore the whole system, into a deterministic and a stochastic part. These parts can then be transformed into a linear system of equations using finite elements and Chaos Polynomial Decomposition. Combining the KLE and the Fourier Transformation in combination with Plancherel's theorem enables decoupling of the deterministic part into smaller subsystems. An iteration scheme was developed which narrows the application of "costly" routines to only these smaller deterministic subsystems, instead of the whole higher dimensional (up to a dimension of 10,000) system matrix.

    Application:

    As concerns about vibrations produced by machinery and traffic have increased in past years, models that can predict vibrations in soil became more important. However, since material parameters for soil layers cannot be measured exactly in practice, it is reasonable to use stochastic models.

  • Vowel and consonant quantity in Southern German varieties: D - A - CH project granted by DFG, FWF, SNF

    Principal investigators: Felicitas Kleber, Michael Pucher, Sylvia Moosmüller†, Stephan Schmid 

    Start of the project: 1st of June 2015

    Project description:

    Introduction:

    The Central Bavarian varieties, to which the Viennese varieties belong, seem to have changed diachronically. From the first phonetic descriptions (Pfalz 1913) to more current descriptions (Moosmüller & Brandstätter 2014) the diachronic change becomes visible on several levels of the varieties.

    In this project we focus on the (in)stability of the timing system, or more precise, the quantity relations in Vowel + Consonant sequences and compare our results with the project partners in Zurich and Munich.

    Aims:

    The aims of this project are two-fold. The first aim is to develop a typology of the Vowel + Consonant quantities in Southern German varieties (Bavarian (Munich + Vienna) and Alemannic (Zurich)) in C1V1C2V2contexts (where C2can be either fricatives or nasals or plosives) and in consonant cluster sequences with increasing initial and final consonant cluster complexity. The second aim is to investigate prosodic changes in an apparent-time study and to examine the influence of internal factors (eg. speech rate) and external factors (language attitudes) on the production of speech.

    Method:

    Recordings and analyses of 40 speakers of the Viennese varieties (balanced for age, gender, and educational background) will be conducted. During the recording sessions the speakers are asked to read and repeat sentences in two speech rates. Furthermore a subset of speakers is asked to participate in an articulatory recording with an electromagnetic articulograph (EMA). These recordings take place at our project partners’ laboratory in Munich.

    Application:

    The results will not only provide insight in the current timing system of speakers of the Viennese varieties but also enable us to draw conclusions about sound changes in progress.

     

  • Objective:

    This project describes vowel systems of several languages acoustically and compares them. The project's main interest is focused on languages with acoustically insufficient descriptions thus far, e.g. Albanian, Romanian, Ful, Mandinka, or Crioulo.

    Method:

    Selected speakers are asked to perform a reading task and to speak spontaneously. Vowels in all positions are segmented, labeled, and analyzed. Formant frequencies (F1, F2, F3) are extracted and the vowel systems are defined.

    Language specificity affects not only the number of vowels and their features, but also the extent of variability and stability of certain vowels. A given vowel of language A might be quite stable, whereas the same vowel might exert high variability in language B. In the same way, vowels might be discerned differently. For example, pre-palatal /i/ and mid-palatal /e/ are discerned by F3 in Standard Austrian German (see diagram on SAG), whereas both mid-palatal /i/ and /e/ are predominantly discerned by F2 in Modern Standard Albanian (see diagram on MSA).

    Application:

    In forensic speaker identification, thorough descriptions of the languages in question are often needed in order to conduct a thorough comparison.