Denoising (Spectral Subtraction)

From STX Wiki
Jump to navigation Jump to search

Noise reduction systems frequently use spectral averaging and adaptive filters (spectral subtraction). It is advisable to prewhiten the broadband background noise by adding its inverted spectral magnitude. The frequency spectrum of the background noise is obtained by averaging the short time spectra of silent segments of a recording. If system transfer functions, such as horn resonances of historical sound recording devices are known, corresponding correction filters can be included in this processing step.

The STx denoising module uses the difference between the statistical characteristics of the noise and the signal. Noise, such as surface noise of a historical sound recording, is added to the signal and has frame to frame randomness. The signal is assumed to remain locally stable to that extent, that its amplitude spectra resemble frame to frame. The gliding spectral average is used to generate an adaptive filter, which is applied to the prewhitened input signal. The degree of noise reduction depends on the count of averaging steps and the length of the short time frames. A balance between the range of averaging and the non-stationarity of the signal has to be found in order to avoid time smearing effects.

Dsp denoising.png

Figure: functional diagram of the STx denoising module using spectral averaging and spectral subtraction.

Denoising processing steps

From practical experience processing noisy speech the following steps and parameter settings for the sampling rate of 44100 Hz are proposed:

  • Normalize the noisy input signal to 0.8 peak.
  • Compute the stationary noise spectrum from the waveform parts in which there is definitely no signal is present. Average as many noise samples as possible. Use FFT-length 8192 points or larger (~ 5Hz bandwidth) for the estimation of the noise amplitude spectrum. Save the averaged spectrum in the STx DataSet.
  • Prewhiten the signal: Apply an STx spectrum filter by using the averaged noise spectrum, selecting an appropriate filter length (8192), using the Phase Vocoder, Input/Output Shift = 1024 / 1024 samples, gain = 0, normalize (filter), invert (filter). This step prewhitens the noise floor. Store the prewhitened signal as new segment appended to the sound file or save it into a new sound file.
  • Apply STx Noise Reduction to the prewhitened signal, select Phase Vocoder, 8192 / 1024 / 1024 points, offset = 10 dB, range = 5 dB; define the main frequency range of the signal such as 100–4000 Hz and the noise 4000–20000 Hz. Store the denoised signal as new segment appended to the sound file or save it into a new sound file.

O596.png

Figure: Spectral subtraction is widely used in digital signal processing for enhancing corrupted signals. Hum and broadband noise are frequently present in real life sound recordings. If a proper distinction between stationary background noise and the signal can be made, the noise spectrum is estimated from signal free parts of a sound recording. Broadband noise is additionally reduced by means of locally averaging amplitude spectra. STx provides tools for „prewhitening" and denoising.Note that for sampling rates other than 44100 Hz the parameter setting given above has to be adjusted accordingly. Changing the offset / range parameter values to 3 / 5 dB in step 4 may result in the well known moderate "pumping" effect which may or may not be advisable. To specify optimum parameter settings of step 4, start with short test signals because the time of computing necessary may take twice the time of the total signal duration, depending on the processing speed of your computer. Currently this procedure is available for offline sound file processing only. Denoising generally makes the signals perceptually more acceptable. Whether or not the intelligibility of a speech signal can be increased by denoising strongly depends on the character of the noise and the signal-to-noise ratio (SNR). The denoising procedure described is designed to eliminate stationary hum, hiss and broadband noise; it only reduces impulsive distortions of high amplitude to a certain extent.