EP2006841A1 - Signal processing method and device and training method and device - Google Patents

Signal processing method and device and training method and device Download PDF

Info

Publication number
EP2006841A1
EP2006841A1 EP06007389A EP06007389A EP2006841A1 EP 2006841 A1 EP2006841 A1 EP 2006841A1 EP 06007389 A EP06007389 A EP 06007389A EP 06007389 A EP06007389 A EP 06007389A EP 2006841 A1 EP2006841 A1 EP 2006841A1
Authority
EP
European Patent Office
Prior art keywords
signal
quantized
spectrum
vad
wanted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06007389A
Other languages
German (de)
French (fr)
Inventor
Suhadi Suhadi
Sorel Stan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BenQ Corp
Original Assignee
BenQ Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BenQ Corp filed Critical BenQ Corp
Priority to EP06007389A priority Critical patent/EP2006841A1/en
Priority to PCT/EP2007/003189 priority patent/WO2007115823A1/en
Publication of EP2006841A1 publication Critical patent/EP2006841A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • the invention relates to a signal processing method and a signal processing device. It further relates to a training method and a training device.
  • Wiener filters are characterized by an assumption that the signal and the additive noise are stochastic processes with known spectral characteristic or known autocorrelation or cross-correlation. They are further characterized by performance criteria like minimum mean-square error and an optimal such filter may be determined from a solution based on scalar methods. The goal of the Wiener filter is to filter out noise that has corrupted a signal by statistical means.
  • a further approach is to model the spectral of clean speech and noise using probability density functions (PDF).
  • PDF probability density functions
  • the probability density functions of the real and imaginary part of the clean speech spectrum may be modelled as Gaussian, which is disclosed [2,3] but more recently shows that a Gamma PDF [insert paper 5!] or a super-Gaution PDF [insert paper 6] leads to better results.
  • a signal processing method and a corresponding signal processing device comprises the steps of an acquisition of an audio signal. It further comprises periodically digitizing the audio signal resulting in frames of the digitized audio signal.
  • a noisy audio signal spectrum is determined for each frame of the digitized audio signal.
  • Quantized a priori and a posteriori signal to noise ratios are determined depending on the noisy audio signal spectrum for the provided discrete frequencies of each frame.
  • For the provided discrete frequencies given associated Perceptual scale values are determined dependent on the quantized a priori and a posteriori signal to noise ratios.
  • the given Perceptual scale gain values may be provided on a Bark scale for respective Bark scale subbands.
  • the Bark scale is a psychoacoustical scale.
  • the scale ranges from 1 to 24 and corresponds to the first 24 critical bands of hearing.
  • the subsequent band edges are in hertz, 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000 and 15500.
  • the perceptual scale gain values may however also be provided on a Me1 scale or some other type of perceptual scale.
  • the given Perceptual scale gain values are provided on a Perceptual scale for respective Perceptual scale subbands.
  • the respective spectral values of the noisy audio signal spectrum of the respective frame are multiplied with the determined respective Perceptual scale gain values resulting in estimated wanted spectrum values.
  • An estimated digitized wanted signal is determined dependent on the estimated wanted spectrum values.
  • the associated Perceptual scale values are determined from an approximating function associated to the respective quantized a posteriori signal to noise ratio.
  • the approximating function is dependent on the respective quantized a priori signal to noise ratio.
  • the approximating function is a polynomial function. This uses the insight that a polynomial function is typically well-suited for approximating the associated Perceptual scale gain values associated to a respective a posteriori signal to noise ratio. It is in particular advantageous, if the approximating function is a polynomial function and a saturation level. This uses the insight that typically at the given point of the quantized a posteriori signal to noise ratio the Perceptual scale gain values reach a saturation level and may be therefore simply approximating by the saturation level.
  • the polynomial function has an order of between 4 and 12. In this range a reasonable trade-off between performance and storage requirements is obtained.
  • the estimated digitized wanted signal is a digitized speech signal and the estimated wanted spectrum is an estimated speech spectrum. This enables to enhance a speech signal.
  • the quantization of the quantized a priori signal to noise ratio and/or the quantized a posteriori signal to noise ratio are on a logarithmic scale. This further enables to limit the necessary memory space.
  • the Perceptual scale gain values are determined depending on a wanted signal activity detector. This enables to further enhance the noise reduction and the overall signal quality.
  • the training method and a corresponding training device comprises the steps of the provision of frames of a digitized audio signal, provision of frames of a digitized wanted signal and provision of frames of a digitized noise.
  • the digitized audio signal, the digitized wanted signal and the digitized noise are recorded in an environment where the signal processing method is to be conducted.
  • a noisy audio signal spectrum is determined for each frame of the digitized audio signal.
  • a wanted signal spectrum is determined for each frame of the digitized wanted signal.
  • a noise spectrum is determined for each frame of the digitized noise.
  • Quantized a priori and a posteriori signal to noise ratios are determined depending on the noisy audio signal spectrum for the provided discrete frequencies of each frame and depending on the wanted signal spectrum for the provided discrete frequencies of each frame.
  • Gain values for the provided discrete frequencies are determined depending on the noise spectra and wanted signal spectra associated to the respective discrete frequencies.
  • the quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies are associated to the respective gain values for the provided discrete frequencies.
  • Perceptual scale gain values are determined associated to the quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies depending on the respective gain values being associated to the respective discrete frequencies falling within the respective Perceptual scale subband. They are associated to the quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies.
  • advantage is also taken of the relatively low number of Perceptual scale subbands for determining the Perceptual scale gain values and in that way greatly reducing the memory space needed for storing the Perceptual scale gain values without having to accept a subjective loss in the enhancement of the signal when using the Perceptual scale gain values for the signal processing.
  • parameters of an approximating function are determined by curve fitting of Perceptual scale gain values associated to a respective quantized a posteriori signal to noise ratio. In this way the memory space can further be reduced.
  • the approximating function is a polynomial function.
  • the approximating function is a polynomial function and a saturation level. In this respect it is particularly advantageous if the polynomial function has an order between 4 and 12.
  • the quantization of the quantized a priori signal to noise ratio and the quantized a posteriori signal to noise ratio are on a logarithmic scale.
  • the estimated digitized wanted signal is a digitized speech signal and the estimated wanted spectrum is an estimated speech spectrum. It is in particular advantageous if the Perceptual scale gain values are determined depending on a wanted signal activity detector. It is also in particular advantageous if the parameters of the approximating function are determined depending on a wanted signal activity detector.
  • a computer program product comprising a computer readable medium embodying program instructions executable by a computer in order to conduct the signal processing method according the first aspect of the invention.
  • a computer program product comprising a computer readable medium embodying program instructions executable by a computer in order to conduct the training method according the second aspect of the invention.
  • Figure 1 shows a signal processing device. It comprises a block B1, which is operable to sense an audio signal A1 y(t) and may be embodied as a microphone.
  • Block B2 comprises an analog/digital converter ADC and block B3 comprises single sample processing and block B4 comprises an echo cancellation.
  • the output of block B4 is then a digitized audio signal A4 y 1 (n).
  • the audio signal A1 y(t) is periodically digitized resulting in frames 1 of the digitized audio signal A4 y 1 (n).
  • Each frame 1 therefore comprises a set of values of the digitized audio signal A4 y 1 (n).
  • the reference numeral 1 for a frame is also used as an index.
  • a n is a place holder for the respective value of the digitized audio signal A4 y 1 (n).
  • the echo cancellation in block B4 may be accomplished by a preprocessing filter suitable for echo cancellation.
  • a block B5 is operable to conduct noise reduction and is described in further detail by the aid of Figure 6. Further blocks may follow and a further block B6 comprises an encoder which may encode the estimated digitized wanted signal A48 x ⁇ l (n), for example in order to send it via an antenna.
  • the signal processing device may be embodied in a cell phone, it may however, for example, also be part of a hands-free speaking system or may also be embodied in another mobile communication device. It may however also be embodied in a non-mobile communication or a device known to a person skilled in the art.
  • the signal processing device comprises a storage device for storing data and a program code being run on a processor of the signal processing device during operation of the signal processing device.
  • the processor may preferably comprise a digital signal processor (DSP).
  • FIG. 2 shows a block diagram of a training device.
  • a speech database comprising the wanted signal (B10) and a block B12 comprising a noise database.
  • the speech database may in general comprise the wanted signal, which is not limited to being a speech signal. It is preferably a speech signal, may however also be of a different kind, for example a music signal.
  • the noise database comprises preferably typical car noise, such car noise signals may be taken from for example NTT and NTT-AT databases [[11] NTT-AT Speech Database, "Multi-Lingual Speech Database for Telephonometry 1994," http://www.ntt-at.com/produets_e/ speech/index.html, 1994.; [12] NTT-AT Noise Database, "Ambient Noise Database for Telephonometry 1996,” http://www.ntt-at.com/ products_e/noise-DB/index.html, 1996].
  • the speech database may, in the case of speech being the wanted signal, comprise various utterances spoken by different speakers, in particular male and female.
  • Block B14 may comprise an analog/digital converter ADC and comprise the functionality of single sample processing and echo cancellation.
  • frames 1 of the digitized wanted signal (A34 x 1 (n)) and frames of a digitized noise A38 n 1 (n) are determined.
  • Each frame 1 preferably has a same given length, for example 200 samples.
  • each frame 1 of a digitized audio signal (A4 y 1 (n)) is obtained by summing respective frames 1 of the digitized noise (A38 n 1 (n)) and the digitized wanted signal (A34 x 1 (n)).
  • Block B16 This may also be accomplished in a block B16, the block B16 is designed to determine gain values A26 G VAD ( k ) and is in further detail explained by the aid of Figure 3 below.
  • a block B18 is designed to determine Perceptual scale gain values A28 G VAD Bark m .
  • a block B20 is provided for determining parameters of an approximating function by curve fitting and is further described by the aid of Figure 5 below.
  • the determined parameters are then stored in a block B20, which may be part of a data storage device.
  • the parameters are then preferably stored in the respective storage device of the signal processing device for conducting the noise reduction of block B5.
  • the respective frames 1 of the digitized noise A38 n 1 (n), the digitized wanted signal A34 x 1 (n) and the digitized audio signal A4 y 1 (n) are all subjected to a discrete Fourier transformation DFT in a block B24.
  • the outputs of the block B24 are then the noise spectra A40 N 1 (k), the wanted signal spectra A36 X 1 (k) and noisy audio signal spectra A6 Y 1 (k) each associated to respective frame 1.
  • a k represents the respective discrete frequency.
  • the amplitudes of the noise spectra A40 N 1 (k), the wanted signal spectra A36 X 1 (k) and the noisy audio signal spectra A6 Y 1 (k) are determined by the respective absolute values and squaring them.
  • respective ideal gains A ⁇ 22 ⁇ G l id k are then determined by aid of the formula shown in block B28.
  • a block B30 is operable to conduct a minimum statistics.
  • the output of block B30 is a noise estimate (A18 ⁇ Nl ( k ) ).
  • the minimum statistics is conducted by searching for a minimum value of the respective values of the noisy audio signal spectra A6 Y 1 (k) going through the provided frames 1 at always a given discrete frequency k. In this way stationary and non-stationary noise may be estimated with relatively high quality.
  • a block B32 is operable to determine a quantized a priori signal to noise ratio A14 ⁇ l ( k ) and a quantized a posteriori signal to noise ratio A16 ⁇ l ( k ).
  • An a posteriori signal to noise ratio A12 ⁇ l (k) is preferably determined by aid of the formula shown in block B34.
  • the quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) is then obtained from the a posteriori signal to noise ratio A12 ⁇ l ( k ) in a block B36 by quantizing the a posteriori signal to noise ratio A12 ⁇ l ( k ) preferably on a logarithmic scale with each discrete step preferably having a distance A52 ⁇ , e.g. 1 dB.
  • An interim a priori signal to noise ratio A54 ⁇ l ( k ) is preferably obtained by the formula shown in block B38.
  • a w denotes a weighting factor, which may for example have a value of 0.98.
  • max denotes a maximum value function and ensures that the interim a priori signal to noise ratio A54 ⁇ l ( k ) is hot calculated with a negative value of the a posteriori signal to noise ratio A12 ⁇ l ( k ), which might occur due to an error in the noise estimate A18 ⁇ Nl ( k ) .
  • An a priori signal to noise ratio A10 ⁇ l ( k ) is also determined in block B38 preferably by aid of the shown formula, which comprises a maximum value function and with a limitation value A56 ⁇ min .
  • the limitation value is set such that, if the interim a priori signal to noise ratio A54 ⁇ l ( k ) was quantized on a logarithmic scale, it would have a value of - 15 dB.
  • the a priori signal to noise ratio A10 ⁇ l (k) is then quantized in a block B40 in a corresponding way to the way it is done in the block B36 then resulting in a a quantized a priori signal to noise ratio A14 ⁇ l ( k ) .
  • a wanted signal activity detector VAD is determined.
  • a speech signal being the wanted signal
  • a speech absence probability is then determined.
  • VAD speech absence probability
  • a priori signal to noise ratio A14 ⁇ l ( k ) for the respective frame is preferably smoothed and is then compared with a given threshold being representative for a wanted signal presence or absence.
  • the wanted signal activity detector VAD is assigned a value of preferably either 1 or 0.
  • a value of 1 represents the presence of the wanted signal and preferably a value of 0 represents the absence of the wanted signal.
  • a block B44 the respective ideal gain A22 G l ld k , a quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) and a quantized a priori signal to noise ratio A14 ⁇ l (k) of the respective frame 1 for each discrete frequency k then are associated to each other and preferably buffered in a buffer shown in block B46.
  • a buffer for each of the distinctions between the values of the wanted signal activity detector VAD.
  • Respective triplets of the ideal gain A22 G l ld k , the quantized a priori signal to noise ratio A14 ⁇ l ( k ) and the quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) are determined in the blocks B24 to B46 for all the discrete frequencies k for all the frames 1.
  • VAD id k refers to the ideal gain associated to wanted signal absence or respectively presence.
  • A24 G l . VAD id k refers to the ideal gain associated to wanted signal absence or respectively presence.
  • gain value A26 G VAD (k) associated to the respective quantized a priori signal to noise ratios (A14 ⁇ l ( k )) and the respective a posteriori signal to noise ratios (A16 ⁇ l ( k )) are then determined for each discrete frequency k and also each value of the quantized a priori signal to noise ratio A14 ⁇ l ( k ) and the quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) .
  • the quantized a priori signal to noise ratios A14 ⁇ l ( k ) and the quantized a posteriori signal to noise ratios A16 ⁇ l ( k ) have a value range between 20 and - 15 dB with a distance of 1 dB resolution.
  • the gain values A26 G VAD ( k ) are preferably determined by averaging all ideal gains A24 G l .
  • a resulting value range of the gain value A26 G VAD (k) for one given discrete frequency and one value of the wanted signal activity detector is shown. For all the other discrete frequencies k separated by the value of the wanted signal activity detector VAD respective gain values A26 G VAD (k) are determined in this way.
  • Perceptual scale gain values A28 G VAD Bark m are determined.
  • the Perceptual scale is psychoacoustical scale. It has up to 24 subbands m and corresponds to the first 24 critical bands of hearing. If f s represents the sampling frequency used to obtain the digitized audio signal A4 y 1 (n), the digitized noise A38 n 1 (n) and the digitized wanted signal A34 xi(n), it may for example be in the range of 8 KHz.
  • a "G" with directly following brackets with a place holder for the respective discrete frequency k represents a matrix of the gain values A26 G VAD (k) associated to the respective discrete frequencies k.
  • the Perceptual scale gain values A28 G VAD Bark m for the respective Perceptual scale subbands m are then determined for all the associated quantized a priori signal to noise ratios A10 ⁇ l ( k ) and a posteriori signal to noise ratios A12 ⁇ l ( k ) of respective discrete frequencies k dependent on the respective gain values A26 G VAD (k) being associated to the respective discrete frequencies k falling within the respective Perceptual scale subbands m and being associated to the respective quantized a priori signal to noise ratios (A14 ⁇ l ( k )) and the respective a posteriori signal to noise ratios (A16 ⁇ l ( k )) of the respective discrete frequency k. This is preferably achieved by respective averaging of the respective gain values A26 G VAD ( k ).
  • a capital G followed by a raised 'Perceptual' with a place holder behind them represents the matrix of the Perceptual scale gain values A28 G VAD Bark m for the respective Perceptual scale subband m.
  • a parameterisation of the Perceptual scale gain values A28 G VAD Bark m is accomplished in block B20.
  • parameters of an approximating function are determined by curve fitting of Perceptual scale values A28 G VAD Bark m associated to a respective quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) .
  • the polynomial coefficients A30 C ⁇ l ( k ) are preferably determined by a way known to the person skilled in the art for curve fitting, in particular by utilizing the principle of minimizing the least mean square error.
  • the saturation level A32 H sat may be determined by searching for the respective maximum of the respective Perceptual scale gain values A28 G VAD Bark m which then also determines the second range parameter A46 b ⁇ l(k) .
  • the polynomial coefficients A30 C ⁇ l (k) are preferably determined by a way known to the person skilled in the art for curve fitting, in particular by utilizing the principle of minimizing the least mean square error.
  • the saturation level A32 H sat may be determined by searching for the respective maximum of the respective Perceptual scale gain values A28 G VAD Bark m which then also determines the second range parameter A46 b ⁇ l ( k ) in the range of values between the first range parameter A44 a ⁇ l ( k ) and the second range parameter A46 b ⁇ l ( k ) of the quantized a priori signal to noise ratio A14 ⁇ l ( k ) the curve fitting of the Perceptual scale gain values A28 G VAD Bark m is then conducted.
  • the Perceptual scale gain values A28 G VAD Bark m in speech absence do not completely suppress the wanted signal.
  • a non-zero weighting rule value, that is a non-zero Perceptual scale value 28, in the wanted signal absence may help to preserve the wanted signal and noise naturalness, in particular in the transition from the wanted signal presence to the wanted signal absence or vice versa.
  • a capital P stands for a polynom obtained by the approximation of the respective polynomial coefficients A30 C ⁇ 1 ( k ) .
  • a capital P with brackets behind and a place holder for the subband m then represents the respective polynomial associated to the respective subband m.
  • FIG. 6 shows in more detail block B5 of Figure 4.
  • a block B50 is operable to conduct a discrete Fourier transformation DFT of the respective frame 1 of the digitized audio signal A4 y 1 (n). The output of the block B50 is then the respective noisy audio signal spectrum A6 Y l (k) associated to the respective frame 1.
  • the amplitude of the noisy audio signal spectrum A6 Y 1 (k) for the respective discrete frequency k computed by taking its absolute value and squaring it. This is also conducted for all the other discrete frequencies.
  • a block B54 comprises the conduction of minimum statistics in order to obtain the noise estimate A18 ⁇ Nl ( k ) and it is operable in the same way as block B30.
  • a block B56 the quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) and the quantized a priori signal to noise ratio A14 ⁇ l ( k ) are obtained.
  • the a posteriori signal to noise ratio A12 ⁇ l ( k ) is obtained by the formulas by calculating it from the formulas of block B34 and B36.
  • the quantized a priori signal to noise ratio A14 ⁇ l (k) is obtained by calculating it from the formulas of block B58, which differs from the one of the block B38 in that instead of the wanted signal spectrum A36 X 1 (k) an estimated wanted signal spectrum A50 X ⁇ l ( k ) is used which is recursively obtained by the procedure of the following blocks within the block B5.
  • the quantized a priori signal to noise ratio A14 ⁇ l ( k ) is obtained by applying the formula of the block B40.
  • the wanted signal activity detector VAD is estimated in the same way as in block B42.
  • the approximating function for the Perceptual scale gain values A28 G VAD Bark m is determined depending on the quantized a posteriori signal to noise ratio A16 ⁇ l ( k ) and the wanted signal activity detector VAD by retrieving the associated parameters of the approximating function, preferably the respective polynomial coefficients A30 C ⁇ l(k) and the respective saturation level A32 H sat preferably together with the first and second range parameters A44 a ⁇ l ( k ), A46 b ⁇ l( k ) .
  • a block B64 the Perceptual scale gain value A28 G VAD Bark m associated to the actual quantized a priori signal to noise ratio A14 ⁇ l ( k ) is then calculated and is then multiplied in a multiplication place Ml with a respective value of the noisy audio signal spectrum A6 Y 1 (k) and this is done for all the discrete frequencies k of the respective frame 1.
  • these obtained values, representing the estimated wanted signal spechtrum A50 X ⁇ l ( k ) are subjected to an inverse discrete Fourier transformation IDFT which then results in an estimated digitized wanted signal A48 x ⁇ l ( n ).
  • the input of the block B66 is the estimated wanted signal spectrum A50 X ⁇ l (k) for the respective frame 1.
  • the input data for the training device provided by the blocks B10 and B12 may be of four different utterances spoken by different speakers, four male and four female and 84 car noise signals, taken from for example NTT-AT databases.
  • the polynomial function used for approximation purposes preferably has an order between 4 and 12, it may however also have an order higher than 12 if enough memory space is available.
  • the wanted signal activity detector may also be referred to as wanted signal activity detection. In a particular case it may be the voice activity detector or also a wanted signal absence probability.

Abstract

A Signal processing method comprises the steps of
- acquisition of an audio signal (A1 y(t)),
- periodically digitizing the audio signal (A1 y(t)) resulting in frames (1) of the digitized audio signal (A4 y1(n)),
- determining a noisy audio signal spectrum (A6 Y1(k)) for each frame (1) of the digitized audio signal (A4 y1(n)),
- determining quantized a priori and a posteriori signal to noise ratios (A10 ξ̂1 A12 γ̂,(k)) depending on the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1),
- determining for the provided discrete frequencies (k) given associated Perceptual scale gain values A 28 G VAD Bark m
Figure imga0001
dependent on the quantized a priori and a posteriori signal to noise ratios (A14 ξ̃1(k), A16 γ̃ l (k)), the given Perceptual scale gain values A 28 G VAD Bark m
Figure imga0002
being provided on a Perceptual scale for respective Perceptual scale subbands (m),
- multiplying the respective spectral values of the noisy audio signal spectrum (A6 Y1(k)) of the respective frame (1) with the determined respective Perceptual scale gain values A 28 G VAD Bark m
Figure imga0003
resulting in estimated wanted spectrum values (A50 1 (k)) and
- determining an estimated digitized wanted signal (A48 1 (n)) dependent on the estimated wanted spectrum values (A50).

Description

  • The invention relates to a signal processing method and a signal processing device. It further relates to a training method and a training device.
  • In signal processing it is commonly important to accomplish noise reduction. This may in particular be important for the purpose of speech enhancement when processing a speech signal which comprises a certain amount of noise. In order to ensure a good speech quality, for example when having a mobile phone being operated within a car being operated via a hands-free speaking system the background noise from the car may add a substantive amount of noise to the speech signal and thereby decrease its quality. A common approach for the purpose of speech enhancement by way of noise reduction is the Wiener filter. Wiener filters are characterized by an assumption that the signal and the additive noise are stochastic processes with known spectral characteristic or known autocorrelation or cross-correlation. They are further characterized by performance criteria like minimum mean-square error and an optimal such filter may be determined from a solution based on scalar methods. The goal of the Wiener filter is to filter out noise that has corrupted a signal by statistical means.
  • Environment noise degrades both speech quality and intelligibility for voice calls from mobile phones. Methods for speech enhancement aim at reducing the noise down to a reasonable level while maintaining as much as possible the speech signal undistorted.
  • Approaches in order to achieve this have been to apply a weighting rule to the noisy speech spectral amplitudes for estimating the clean speech component. The derivation of the waiting rule may be formulated as an optimization problem using criteria such as minimum mean square error of spectral amplitudes, logged-spectral amplitudes or perceptually motivated variants of these. Such approaches have been disclosed in:
    1. [1] P. Scalart and J.V. Filho, "Speech Enhancement Based on A Priori Signal to Noise Estimation," in Proc. of ICASSP'96, Atlanta, GA, May 1996, pp. 629-632.
    2. [2] Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 32, no. 6, pp. 1109-1121, Dec. 1984.
    3. [3] Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 33, no. 2, pp. 443-445, Apr. 1985.
    4. [4] P.C. Loizou, "Speech Enhancement Based on Perceptually Motivated Bayesian Estimators of the Magnitude Spectrum,"IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 857-869, Sept. 2005.
  • A further approach is to model the spectral of clean speech and noise using probability density functions (PDF). The probability density functions of the real and imaginary part of the clean speech spectrum may be modelled as Gaussian, which is disclosed [2,3] but more recently shows that a Gamma PDF [insert paper 5!] or a super-Gaution PDF [insert paper 6] leads to better results.
  • The selection of the error criterion and the PDF modelling the clean speech spectrum, since wrong choices lead to higher residual noise and distortion of speech. To circumvent this problem, general estimators were derived to compute awaiting rule by considering training speech data instead of any explicit formulation of the clean speech spectrum PDF [[7] J.E. Porter and S.F. Boll, "Optimal Estimators for Spectral Restoration of Noisy Speech," in Proc. of ICASSP'84, San Diego, California, Mar. 1984, pp. 18A.2.1.-18A.2.4. ]
  • It is an object of the invention to create a signal processing method and a signal processing device which needs a feasible memory space. According to a further aspect of the invention it is an object to provide a training method and a training device being designed for providing means for enabling a signal enhancement with feasible memory space needed.
  • The object is achieved by the features of the independent claims.
  • According to a first aspect of the invention a signal processing method and a corresponding signal processing device are provided. The signal processing method comprises the steps of an acquisition of an audio signal. It further comprises periodically digitizing the audio signal resulting in frames of the digitized audio signal. A noisy audio signal spectrum is determined for each frame of the digitized audio signal. Quantized a priori and a posteriori signal to noise ratios are determined depending on the noisy audio signal spectrum for the provided discrete frequencies of each frame. For the provided discrete frequencies given associated Perceptual scale values are determined dependent on the quantized a priori and a posteriori signal to noise ratios. The given Perceptual scale gain values may be provided on a Bark scale for respective Bark scale subbands. The Bark scale is a psychoacoustical scale. The scale ranges from 1 to 24 and corresponds to the first 24 critical bands of hearing. The subsequent band edges are in hertz, 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000 and 15500. The perceptual scale gain values may however also be provided on a Me1 scale or some other type of perceptual scale. The given Perceptual scale gain values are provided on a Perceptual scale for respective Perceptual scale subbands. The respective spectral values of the noisy audio signal spectrum of the respective frame are multiplied with the determined respective Perceptual scale gain values resulting in estimated wanted spectrum values. An estimated digitized wanted signal is determined dependent on the estimated wanted spectrum values.
  • Dependent on the sampling frequency of the digitized audio signal not even all the subbands of the Perceptual scale are used. Even if all subbands of the Perceptual scale are used due to the maximum number of 24 Perceptual scale subbands, in the case of the Bark scale, the number of Perceptual scale gain values is much lower than, if gain values directly associated to each discrete frequency were employed. Therefore the memory space needed for storing the Perceptual scale gain values is fairly low.
  • According to a preferred embodiment of the first aspect of the invention the associated Perceptual scale values are determined from an approximating function associated to the respective quantized a posteriori signal to noise ratio. The approximating function is dependent on the respective quantized a priori signal to noise ratio. By this the amount of memory space needed to store the data needed for performing the signal processing may even be further greatly reduced.
  • According to a further preferred embodiment the approximating function is a polynomial function. This uses the insight that a polynomial function is typically well-suited for approximating the associated Perceptual scale gain values associated to a respective a posteriori signal to noise ratio. It is in particular advantageous, if the approximating function is a polynomial function and a saturation level. This uses the insight that typically at the given point of the quantized a posteriori signal to noise ratio the Perceptual scale gain values reach a saturation level and may be therefore simply approximating by the saturation level.
  • According to a further preferred embodiment the polynomial function has an order of between 4 and 12. In this range a reasonable trade-off between performance and storage requirements is obtained.
  • According to a further preferred embodiment the estimated digitized wanted signal is a digitized speech signal and the estimated wanted spectrum is an estimated speech spectrum. This enables to enhance a speech signal.
  • According to a further preferred embodiment the quantization of the quantized a priori signal to noise ratio and/or the quantized a posteriori signal to noise ratio are on a logarithmic scale. This further enables to limit the necessary memory space.
  • According to a further preferred embodiment the Perceptual scale gain values are determined depending on a wanted signal activity detector. This enables to further enhance the noise reduction and the overall signal quality.
  • According to a second aspect of the invention the training method and a corresponding training device is provided. The method comprises the steps of the provision of frames of a digitized audio signal, provision of frames of a digitized wanted signal and provision of frames of a digitized noise. Preferably the digitized audio signal, the digitized wanted signal and the digitized noise are recorded in an environment where the signal processing method is to be conducted.
  • A noisy audio signal spectrum is determined for each frame of the digitized audio signal. A wanted signal spectrum is determined for each frame of the digitized wanted signal. A noise spectrum is determined for each frame of the digitized noise. Quantized a priori and a posteriori signal to noise ratios are determined depending on the noisy audio signal spectrum for the provided discrete frequencies of each frame and depending on the wanted signal spectrum for the provided discrete frequencies of each frame. Gain values for the provided discrete frequencies are determined depending on the noise spectra and wanted signal spectra associated to the respective discrete frequencies. The quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies are associated to the respective gain values for the provided discrete frequencies. Perceptual scale gain values are determined associated to the quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies depending on the respective gain values being associated to the respective discrete frequencies falling within the respective Perceptual scale subband. They are associated to the quantized a priori and a posteriori signal to noise ratios of respective discrete frequencies.
  • In this respect advantage is also taken of the relatively low number of Perceptual scale subbands for determining the Perceptual scale gain values and in that way greatly reducing the memory space needed for storing the Perceptual scale gain values without having to accept a subjective loss in the enhancement of the signal when using the Perceptual scale gain values for the signal processing.
  • According to a preferred embodiment of the second aspect of the invention parameters of an approximating function are determined by curve fitting of Perceptual scale gain values associated to a respective quantized a posteriori signal to noise ratio. In this way the memory space can further be reduced. In this respect it is particularly advantageous if the approximating function is a polynomial function. According to a further preferred embodiment the approximating function is a polynomial function and a saturation level. In this respect it is particularly advantageous if the polynomial function has an order between 4 and 12.
  • According to a further preferred embodiment the quantization of the quantized a priori signal to noise ratio and the quantized a posteriori signal to noise ratio are on a logarithmic scale.
  • According to a further preferred embodiment the estimated digitized wanted signal is a digitized speech signal and the estimated wanted spectrum is an estimated speech spectrum. It is in particular advantageous if the Perceptual scale gain values are determined depending on a wanted signal activity detector. It is also in particular advantageous if the parameters of the approximating function are determined depending on a wanted signal activity detector.
  • According to a further aspect of the invention a computer program product is provided comprising a computer readable medium embodying program instructions executable by a computer in order to conduct the signal processing method according the first aspect of the invention.
  • According to a further aspect of the invention a computer program product is provided comprising a computer readable medium embodying program instructions executable by a computer in order to conduct the training method according the second aspect of the invention.
  • Exemplary embodiments of the invention are explained in the following with the aid of schematic drawings. These are as follows:
  • Figure 1,
    a block diagram of a signal processing device,
    Figure 2,
    a block diagram of a training device,
    Figure 3,
    a detailed block diagram of the training device,
    Figure 4,
    a further detailed block diagram of further parts of the training device,
    Figure 5,
    a further block diagram of further parts of the training device,
    Figure 6,
    a detailed block diagram of parts of the signal processing device,
    Figures 7A to 7D,
    diagrams of Perceptual scale gain values,
    Figure 8,
    a further Perceptual scale gain value diagram,
    Figure 9,
    a further Perceptual scale gain value,
    Figures 10A and 10B,
    original gain values,
    Figures 10C and 10D,
    approximated gain values,
    Figures 10E and 10F,
    approximation errors,
    Figure 11,
    segmental SSDRs and
    Figure 12,
    segmental SSDRs in speech presence.
  • Elements of the same design or function that appear in different illustrations are identified with the same reference characters.
  • Figure 1 shows a signal processing device. It comprises a block B1, which is operable to sense an audio signal A1 y(t) and may be embodied as a microphone. Block B2 comprises an analog/digital converter ADC and block B3 comprises single sample processing and block B4 comprises an echo cancellation. The output of block B4 is then a digitized audio signal A4 y1(n). The audio signal A1 y(t) is periodically digitized resulting in frames 1 of the digitized audio signal A4 y1(n). Each frame 1 therefore comprises a set of values of the digitized audio signal A4 y1(n). The reference numeral 1 for a frame is also used as an index. A n is a place holder for the respective value of the digitized audio signal A4 y1(n). The echo cancellation in block B4 may be accomplished by a preprocessing filter suitable for echo cancellation.
  • A block B5 is operable to conduct noise reduction and is described in further detail by the aid of Figure 6. Further blocks may follow and a further block B6 comprises an encoder which may encode the estimated digitized wanted signal A48 l(n), for example in order to send it via an antenna.
  • The signal processing device may be embodied in a cell phone, it may however, for example, also be part of a hands-free speaking system or may also be embodied in another mobile communication device. It may however also be embodied in a non-mobile communication or a device known to a person skilled in the art.
  • The signal processing device comprises a storage device for storing data and a program code being run on a processor of the signal processing device during operation of the signal processing device. The processor may preferably comprise a digital signal processor (DSP).
  • Figure 2 shows a block diagram of a training device. A speech database comprising the wanted signal (B10) and a block B12 comprising a noise database. The speech database may in general comprise the wanted signal, which is not limited to being a speech signal. It is preferably a speech signal, may however also be of a different kind, for example a music signal. The noise database comprises preferably typical car noise, such car noise signals may be taken from for example NTT and NTT-AT databases [[11] NTT-AT Speech Database, "Multi-Lingual Speech Database for Telephonometry 1994," http://www.ntt-at.com/produets_e/ speech/index.html, 1994.; [12] NTT-AT Noise Database, "Ambient Noise Database for Telephonometry 1996," http://www.ntt-at.com/ products_e/noise-DB/index.html, 1996].
  • The speech database may, in the case of speech being the wanted signal, comprise various utterances spoken by different speakers, in particular male and female.
  • Block B14 may comprise an analog/digital converter ADC and comprise the functionality of single sample processing and echo cancellation. In block B14 frames 1 of the digitized wanted signal (A34 x1(n)) and frames of a digitized noise A38 n1(n) are determined. Each frame 1 preferably has a same given length, for example 200 samples. Preferably each frame 1 of a digitized audio signal (A4 y1(n)) is obtained by summing respective frames 1 of the digitized noise (A38 n1(n)) and the digitized wanted signal (A34 x1(n)). This may also be accomplished in a block B16, the block B16 is designed to determine gain values A26 GVAD (k) and is in further detail explained by the aid of Figure 3 below. A block B18 is designed to determine Perceptual scale gain values A28 G VAD Bark m .
    Figure imgb0001
  • A block B20 is provided for determining parameters of an approximating function by curve fitting and is further described by the aid of Figure 5 below. The determined parameters are then stored in a block B20, which may be part of a data storage device. The parameters are then preferably stored in the respective storage device of the signal processing device for conducting the noise reduction of block B5.
  • The respective frames 1 of the digitized noise A38 n1(n), the digitized wanted signal A34 x1(n) and the digitized audio signal A4 y1(n) are all subjected to a discrete Fourier transformation DFT in a block B24. The outputs of the block B24 are then the noise spectra A40 N1(k), the wanted signal spectra A36 X1(k) and noisy audio signal spectra A6 Y1(k) each associated to respective frame 1. A k represents the respective discrete frequency.
  • Preferably also the amplitudes of the noise spectra A40 N1(k), the wanted signal spectra A36 X1(k) and the noisy audio signal spectra A6 Y1(k) are determined by the respective absolute values and squaring them. In a block B26 respective ideal gains A 22 G l id k
    Figure imgb0002
    are then determined by aid of the formula shown in block B28. A k is always a place holder for the discrete frequency and may be dependent on the amount of samples associated to the respective frame obtained values from k=0 up to K-1.
  • A block B30 is operable to conduct a minimum statistics. The output of block B30 is a noise estimate (A18λ̂ Nl(k)). Preferably the minimum statistics is conducted by searching for a minimum value of the respective values of the noisy audio signal spectra A6 Y1(k) going through the provided frames 1 at always a given discrete frequency k. In this way stationary and non-stationary noise may be estimated with relatively high quality. A block B32 is operable to determine a quantized a priori signal to noise ratio A14 ξ̃ l (k) and a quantized a posteriori signal to noise ratio A16 γ̃ l (k). An a posteriori signal to noise ratio A12 γ̂ l(k) is preferably determined by aid of the formula shown in block B34. The quantized a posteriori signal to noise ratio A16 γ̃ l (k) is then obtained from the a posteriori signal to noise ratio A12 γ̂ l (k) in a block B36 by quantizing the a posteriori signal to noise ratio A12 γ̂ l (k) preferably on a logarithmic scale with each discrete step preferably having a distance A52 Δ, e.g. 1 dB.
  • An interim a priori signal to noise ratio A54 ζ̂ l (k) is preferably obtained by the formula shown in block B38. A w denotes a weighting factor, which may for example have a value of 0.98. max denotes a maximum value function and ensures that the interim a priori signal to noise ratio A54 ζ̂ l (k) is hot calculated with a negative value of the a posteriori signal to noise ratio A12 γ̂ l (k), which might occur due to an error in the noise estimate A18 λ̂ Nl(k).
  • An a priori signal to noise ratio A10 ξ̂ l (k) is also determined in block B38 preferably by aid of the shown formula, which comprises a maximum value function and with a limitation value A56 ζmin. The limitation value is set such that, if the interim a priori signal to noise ratio A54 ζ̂ l (k) was quantized on a logarithmic scale, it would have a value of - 15 dB.
  • The a priori signal to noise ratio A10 ξ̂l(k) is then quantized in a block B40 in a corresponding way to the way it is done in the block B36 then resulting in a a quantized a priori signal to noise ratio A14 ξ̃ l(k).
  • In a block B42 a wanted signal activity detector VAD is determined. In the preferred case of a speech signal being the wanted signal a speech absence probability is then determined. For determining the speech absence probability VAD of the quantized a priori signal to noise ratio A14 ξ̃ l (k) for the respective frame is preferably smoothed and is then compared with a given threshold being representative for a wanted signal presence or absence. The wanted signal activity detector VAD is assigned a value of preferably either 1 or 0. Preferably a value of 1 represents the presence of the wanted signal and preferably a value of 0 represents the absence of the wanted signal.
  • In a block B44 the respective ideal gain A22 G l ld k ,
    Figure imgb0003
    a quantized a posteriori signal to noise ratio A16 γ̃ l (k) and a quantized a priori signal to noise ratio A14 ξ̃ l (k) of the respective frame 1 for each discrete frequency k then are associated to each other and preferably buffered in a buffer shown in block B46. Preferably there is a buffer for each of the distinctions between the values of the wanted signal activity detector VAD. Respective triplets of the ideal gain A22 G l ld k ,
    Figure imgb0004
    the quantized a priori signal to noise ratio A14 ξ̃ l (k) and the quantized a posteriori signal to noise ratio A16 γ̃ l (k) are determined in the blocks B24 to B46 for all the discrete frequencies k for all the frames 1. A24 G l . VAD id k
    Figure imgb0005
    refers to the ideal gain associated to wanted signal absence or respectively presence. A24 G l . VAD id k .
    Figure imgb0006
  • In block B46 gain value A26 GVAD(k) associated to the respective quantized a priori signal to noise ratios (A14 ξ̃ l (k)) and the respective a posteriori signal to noise ratios (A16 γ̃ l (k)) are then determined for each discrete frequency k and also each value of the quantized a priori signal to noise ratio A14 ξ̃ l (k) and the quantized a posteriori signal to noise ratio A16 γ̃ l (k). Preferably the quantized a priori signal to noise ratios A14 ξ̃ l (k) and the quantized a posteriori signal to noise ratios A16 γ̃ l (k) have a value range between 20 and - 15 dB with a distance of 1 dB resolution. The gain values A26 GVAD (k) are preferably determined by averaging all ideal gains A24 G l . VAD id k
    Figure imgb0007
    associated to wanted signal activity detector values of the respective discrete frequency k and of the respective associated quantized a priori signal to noise ratio A14 ξ̃ l (k) and the associated quantized a posteriori signal to noise ratio A16 γ̃ l (k). In block B46 a resulting value range of the gain value A26 GVAD(k) for one given discrete frequency and one value of the wanted signal activity detector is shown. For all the other discrete frequencies k separated by the value of the wanted signal activity detector VAD respective gain values A26 GVAD(k) are determined in this way.
  • In block B18 Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0008
    are determined. The Perceptual scale is psychoacoustical scale. It has up to 24 subbands m and corresponds to the first 24 critical bands of hearing. If fs represents the sampling frequency used to obtain the digitized audio signal A4 y1(n), the digitized noise A38 n1(n) and the digitized wanted signal A34 xi(n), it may for example be in the range of 8 KHz.
  • Depending on the sampling frequency fs only some of the subbands of the Perceptual scale may be used in case of a sampling frequency of 8 KHz, for example the first nineteen subbands m of the Perceptual scale may be used. A "G" with directly following brackets with a place holder for the respective discrete frequency k represents a matrix of the gain values A26 GVAD(k) associated to the respective discrete frequencies k. The Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0009
    for the respective Perceptual scale subbands m are then determined for all the associated quantized a priori signal to noise ratios A10 ξ̂ l ( k ) and a posteriori signal to noise ratios A12 γ̂ l (k) of respective discrete frequencies k dependent on the respective gain values A26 GVAD(k) being associated to the respective discrete frequencies k falling within the respective Perceptual scale subbands m and being associated to the respective quantized a priori signal to noise ratios (A14 ξ̃ l (k)) and the respective a posteriori signal to noise ratios (A16 γ̃ l (k)) of the respective discrete frequency k. This is preferably achieved by respective averaging of the respective gain values A26 G VAD(k).
  • A capital G followed by a raised 'Perceptual' with a place holder behind them represents the matrix of the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0010
    for the respective Perceptual scale subband m.
  • A parameterisation of the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0011
    is accomplished in block B20. In block B20 parameters of an approximating function are determined by curve fitting of Perceptual scale values A28 G VAD Bark m
    Figure imgb0012
    associated to a respective quantized a posteriori signal to noise ratio A16 γ̃l (k).
  • It is visible in Figures 7A to 7D that such a parameterisation may well be accomplished by a polynomial function preferably in a given range from a first range parameter A44 aγ̂l(k) to a second range parameter B46 by a saturation level A32 Hsat.
  • The polynomial coefficients A30 Cγ̂l(k) are preferably determined by a way known to the person skilled in the art for curve fitting, in particular by utilizing the principle of minimizing the least mean square error. The saturation level A32 Hsat may be determined by searching for the respective maximum of the respective Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0013
    which then also determines the second range parameter A46 bγ̃ l(k). In the range of values between the first range parameter A44 aγ̃ l(k) and the second range parameter A46 bγ̂ l(k) of the quantized a priori signal to noise ratio A14 ξ̂ l (k) the curve fitting of the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0014
    is then conducted.
  • The polynomial coefficients A30 Cγ̂l(k) are preferably determined by a way known to the person skilled in the art for curve fitting, in particular by utilizing the principle of minimizing the least mean square error. The saturation level A32 Hsat may be determined by searching for the respective maximum of the respective Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0015
    which then also determines the second range parameter A46 bγ̂ l(k) in the range of values between the first range parameter A44 aγ̂ l(k) and the second range parameter A46 bγ̂ l(k) of the quantized a priori signal to noise ratio A14 ξ̃ l (k) the curve fitting of the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0016
    is then conducted. For the range of the quantized a posteriori signal to noise ratio A16 γ̃ l (k) having values of an effective a posteriori signal to noise ratio A58 of for example 22 given value may be associated to all the respective Perceptual scale gain values A28 G VAD Bark m .
    Figure imgb0017
    . The same is true for a value of 1 and lower of the quantized a posteriori signal to noise ratio A16 γ̃ l (k).#
  • In the Figures 7A to 7D one may note that the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0018
    in speech absence do not completely suppress the wanted signal. A non-zero weighting rule value, that is a non-zero Perceptual scale value 28, in the wanted signal absence may help to preserve the wanted signal and noise naturalness, in particular in the transition from the wanted signal presence to the wanted signal absence or vice versa.
  • It may also be noted that during a wanted signal pause the Perceptual scale gain value A28 G VAD Bark m
    Figure imgb0019
    at the Perceptual subband m = 1 exhibits lower values than at the Perceptual subband m = 14 indicating that the noise is more strongly suppressed at lower frequencies than at higher frequencies. The explanation for this is that in particular car noises are concentrated in low frequencies.
  • A capital P stands for a polynom obtained by the approximation of the respective polynomial coefficients A30 Cγ̂ 1 (k).
  • A capital P with brackets behind and a place holder for the subband m then represents the respective polynomial associated to the respective subband m.
  • Figure 6 shows in more detail block B5 of Figure 4. A block B50 is operable to conduct a discrete Fourier transformation DFT of the respective frame 1 of the digitized audio signal A4 y1(n). The output of the block B50 is then the respective noisy audio signal spectrum A6 Yl(k) associated to the respective frame 1. In a block B52 the amplitude of the noisy audio signal spectrum A6 Y1(k) for the respective discrete frequency k computed by taking its absolute value and squaring it. This is also conducted for all the other discrete frequencies.
  • A block B54 comprises the conduction of minimum statistics in order to obtain the noise estimate A18 λ̂ Nl(k) and it is operable in the same way as block B30.
  • In a block B56 the quantized a posteriori signal to noise ratio A16 γ̃ l (k) and the quantized a priori signal to noise ratio A14 ξ̃ l (k) are obtained. The a posteriori signal to noise ratio A12 γ̂ l (k) is obtained by the formulas by calculating it from the formulas of block B34 and B36.
  • The quantized a priori signal to noise ratio A14 ξ̃ l (k) is obtained by calculating it from the formulas of block B58, which differs from the one of the block B38 in that instead of the wanted signal spectrum A36 X1(k) an estimated wanted signal spectrum A50 l (k) is used which is recursively obtained by the procedure of the following blocks within the block B5. In addition to that the quantized a priori signal to noise ratio A14 ξ̃ l (k) is obtained by applying the formula of the block B40.
  • In a block B60 the wanted signal activity detector VAD is estimated in the same way as in block B42. In a block B62 the approximating function for the Perceptual scale gain values A28 G VAD Bark m
    Figure imgb0020
    is determined depending on the quantized a posteriori signal to noise ratio A16 γ̃ l (k) and the wanted signal activity detector VAD by retrieving the associated parameters of the approximating function, preferably the respective polynomial coefficients A30 Cγ̂ l(k) and the respective saturation level A32 Hsat preferably together with the first and second range parameters A44 aγ̂ l(k), A46 bγ̂l(k).
  • In a block B64 the Perceptual scale gain value A28 G VAD Bark m
    Figure imgb0021
    associated to the actual quantized a priori signal to noise ratio A14 ξ̃ l (k) is then calculated and is then multiplied in a multiplication place Ml with a respective value of the noisy audio signal spectrum A6 Y1(k) and this is done for all the discrete frequencies k of the respective frame 1. After that in a block B66 these obtained values, representing the estimated wanted signal spechtrum A50 l (k) are subjected to an inverse discrete Fourier transformation IDFT which then results in an estimated digitized wanted signal A48 x̂ l (n). The input of the block B66 is the estimated wanted signal spectrum A50 l(k) for the respective frame 1.
  • As an example the input data for the training device provided by the blocks B10 and B12 may be of four different utterances spoken by different speakers, four male and four female and 84 car noise signals, taken from for example NTT-AT databases. Thee signals are split in two sets of equal size for training and testing. After a combination, 20 x 42 = 840 noisy speech utterances at the sampling frequency of 8 KHz are obtained for a training and testing session.
  • The polynomial function used for approximation purposes preferably has an order between 4 and 12, it may however also have an order higher than 12 if enough memory space is available.
  • The wanted signal activity detector may also be referred to as wanted signal activity detection. In a particular case it may be the voice activity detector or also a wanted signal absence probability.

Claims (22)

  1. Signal processing method comprising the steps of
    - acquisition of an audio signal (A1 y(t)),
    - periodically digitizing the audio signal (A1 y(t)) resulting in frames (1) of the digitized audio signal (A4 y1(n)),
    - determining a noisy audio signal spectrum (A6 Y1(k)) for each frame (1) of the digitized audio signal (A4 y1(n)),
    - determining quantized a priori and a posteriori signal to noise ratios (A14 ξ̃l (k), A16 γ̃l (k)) depending on the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) ,
    - determining for the provided discrete frequencies (k) given associated Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0022
    dependent on the quantized a priori and a posteriori signal to noise ratios (A14 ξ̃ l (k), A16 γ̃ l (k)), the given Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0023
    being provided on a Perceptual scale for respective Perceptual scale subbands (m),
    - multiplying the respective spectral values of the noisy audio signal spectrum (A6 Y1(k)) of the respective frame (1) with the determined respective Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0024
    resulting in estimated wanted spectrum values (A50 l (k)) and
    - determining an estimated digitized wanted signal (A48 l (n)) dependent on the estimated wanted spectrum values (A50).
  2. Signal processing method according to claim 1 comprising determining the associated Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0025
    from an approximating function associated to the respective quantized a posteriori signal to noise ratio (A16 γ̃ l (k)), the approximating function being dependent on the respective quantized a priori signal to noise ratio (A14 ξ̃ l (k)).
  3. Signal processing method according to claim 2 with the approximating function being a polynomial function (P).
  4. Signal processing method according to claim 3 with the approximating function being a polynomial function (P) and a saturation level (A32 Hsat).
  5. Signal processing method according to one of the claims 3 or 4, with the polynomial function (P) having an order between four and twelve.
  6. Signal processing method according to one of the previous claims, with the quantization of the quantized a priori signal to noise ratio (A14 ξ̃l (k)) and/or the quantized a posteriori signal to noise ratio (A16 γ̃ l (k)) being on a logarithmic scale.
  7. Signal processing method according to one the previous claims, with the estimated digitized wanted signal (A48 x̂ l (n) ) being a digitized speech signal and with the estimated wanted spectrum (A50 X̂ l (k)) being an estimated speech spectrum.
  8. Signal processing method according to one of the previous claims comprising determining the Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0026
    depending on a wanted signal activity detector (VAD).
  9. Signal processing device being operable to conduct a signal processing method according to one of the previous claims.
  10. Training method comprising the steps of:
    - provision of frames (1) of a digitized audio signal (A4 y1(n)),
    - provision of frames (1) of a digitized wanted signal (A34 x1(n)),
    - provision of frames (1) of a digitized noise (A38 n1(n)),
    - determining a noisy audio signal spectrum (A6 Y1(k)) for each frame (1) of the digitized audio signal (A4 y1(n)),
    - determining a wanted signal spectrum (A36 x1(k)) for each frame (1) of the digitized wanted signal (A34 x1(n)),
    - determining a noise spectrum (A40 N1(k)) for each frame (1) of the digitized noise (A38 n1(n)),
    - determining a quantized a priori and a posteriori signal to noise ratios (A14 ξ̃ l(k) , A16 γ̃ l (k)) depending on the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) and depending on the wanted signal spectrum (A36 X1(k)) for the provided discrete frequencies (k) of each frame (1),
    - determining gain values (A26 GVAD (k)) for the provided discrete frequencies (k) dependent on the noise spectra (A40 N1(k)) and wanted signal spectra (A36 X1(k)) associated to the respective discrete frequencies (k),
    - associating the quantized a priori and a posteriori signal to noise ratios (A14 ξ̃ l (k), A16 γ̃ l (k)) of respective discrete frequencies (k) to the respective gain values (A26 GVAD(k)) for the provided discrete frequencies (k),
    - determining Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0027
    associated to the quantized a priori and a posteriori signal to noise ratios (A14 ξ̃1(k), A16 γ̃ l (k)) of respective discrete frequencies (k) dependent on the respective gain values (A26 GVAD(k)) being associated to the respective discrete frequencies (k) falling within the respective Perceptual scale subband (m) and being associated to the quantized a priori and a posteriori signal to noise ratios (A14 ξ̃ l (k), A16 γ̃ l (k)) of respective discrete frequencies (k).
  11. Training method according to claim 10 comprising determining parameters of an approximating function by curve fitting of Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0028
    associated to a respective quantized a posteriori signal to noise ratio (A16 γ̃ l (k)).
  12. Training method according to claim 11 with the approximating function being a polynomial function (P).
  13. Training method according to claim 12 with the approximating function being a polynomial function (P) and a saturation level (A32 Hsat).
  14. Training method according to one of the claims 12 or 13, with the polynomial function (P) having an order between 4 and 12.
  15. A training method according to one of the claims 10 to 14 with the quantization of the quantized a priori signal to noise ratio (A14 ξ̃ l (k)) and the quantized a posteriori signal to noise ratio (A12 γ̂ l (k)) being on a logarithmic scale.
  16. Training method according to one of the claims 10 to 15 with the estimated digitized wanted signal (A48 x̂ l (n)) being a digitized speech signal with the estimated wanted spectrum (A50 X̂ l(k)) being an estimated speech spectrum.
  17. Training method according to one of the claims 10 to 16, comprising determining the Perceptual scale gain values (A28 G VAD Bark m )
    Figure imgb0029
    depending on a wanted signal activity detector (VAD)
  18. Training method according to one of the claims 10 to 17 comprising determining the parameters of the approximating function depending on a wanted signal activity detector (VAD).
  19. Training method according to one of the claims 10 to 18, comprising determining a noise estimate for each frame (1) dependent on the respective depending on the noisy audio signal spectrum (A6 Y1(k)), and determining a quantized a priori and a posteriori signal to noise ratios (A14 ξ̃ l (k), A16 γ̃ l (k)) depending on the noisy estimate, the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) and depending on the wanted signal spectrum (A36 X1(k)) for the provided discrete frequencies (k) of each frame (1).
  20. Training device being operable to conduct a training method according to one of the claims 10 to 18.
  21. Computer program product comprising a computer readable medium embodying program instructions executable by a computer in order to conduct a signal processing method according to one of the claims 1 to 8.
  22. Computer program product comprising a computer readable medium embodying program instructions executable by a computer in order to conduct a training method according one of the claims 10 to 18.
EP06007389A 2006-04-07 2006-04-07 Signal processing method and device and training method and device Withdrawn EP2006841A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP06007389A EP2006841A1 (en) 2006-04-07 2006-04-07 Signal processing method and device and training method and device
PCT/EP2007/003189 WO2007115823A1 (en) 2006-04-07 2007-04-10 Signal processing method and device and training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP06007389A EP2006841A1 (en) 2006-04-07 2006-04-07 Signal processing method and device and training method and device

Publications (1)

Publication Number Publication Date
EP2006841A1 true EP2006841A1 (en) 2008-12-24

Family

ID=36787926

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06007389A Withdrawn EP2006841A1 (en) 2006-04-07 2006-04-07 Signal processing method and device and training method and device

Country Status (2)

Country Link
EP (1) EP2006841A1 (en)
WO (1) WO2007115823A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054232A1 (en) * 2011-08-24 2013-02-28 Texas Instruments Incorporated Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110164467B (en) * 2018-12-18 2022-11-25 腾讯科技(深圳)有限公司 Method and apparatus for speech noise reduction, computing device and computer readable storage medium
CN110491407B (en) * 2019-08-15 2021-09-21 广州方硅信息技术有限公司 Voice noise reduction method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6643619B1 (en) * 1997-10-30 2003-11-04 Klaus Linhard Method for reducing interference in acoustic signals using an adaptive filtering method involving spectral subtraction
EP1635331A1 (en) * 2004-09-14 2006-03-15 Siemens Aktiengesellschaft Method for estimating a signal to noise ratio

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6643619B1 (en) * 1997-10-30 2003-11-04 Klaus Linhard Method for reducing interference in acoustic signals using an adaptive filtering method involving spectral subtraction
EP1635331A1 (en) * 2004-09-14 2006-03-15 Siemens Aktiengesellschaft Method for estimating a signal to noise ratio

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SETIAWAN PANJI ET AL: "Robust speech recognition for mobile devices in car noise", EUR. CONF. SPEECH COMMUN. TECHNOL.; 9TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY; 9TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY, EUROSPEECH INTERSPEECH 2005, 4 September 2005 (2005-09-04), pages 2673 - 2676, XP002395155 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054232A1 (en) * 2011-08-24 2013-02-28 Texas Instruments Incorporated Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames
US9666206B2 (en) * 2011-08-24 2017-05-30 Texas Instruments Incorporated Method, system and computer program product for attenuating noise in multiple time frames

Also Published As

Publication number Publication date
WO2007115823A1 (en) 2007-10-18

Similar Documents

Publication Publication Date Title
US11694711B2 (en) Post-processing gains for signal enhancement
US6768979B1 (en) Apparatus and method for noise attenuation in a speech recognition system
US6377637B1 (en) Sub-band exponential smoothing noise canceling system
EP3040991B1 (en) Voice activation detection method and device
US11475907B2 (en) Method and device of denoising voice signal
US10049678B2 (en) System and method for suppressing transient noise in a multichannel system
US8571231B2 (en) Suppressing noise in an audio signal
US6173258B1 (en) Method for reducing noise distortions in a speech recognition system
EP1547061B1 (en) Multichannel voice detection in adverse environments
Martin Bias compensation methods for minimum statistics noise power spectral density estimation
CN108464015B (en) Microphone array signal processing system
US8275611B2 (en) Adaptive noise suppression for digital speech signals
US6952482B2 (en) Method and apparatus for noise filtering
US20100198588A1 (en) Signal bandwidth extending apparatus
EP2851898B1 (en) Voice processing apparatus, voice processing method and corresponding computer program
EP4273861A2 (en) Voice activity detection methods and apparatuses
Borowicz et al. Signal subspace approach for psychoacoustically motivated speech enhancement
CN101802910A (en) Speech enhancement with voice clarity
CN101719969A (en) Method and system for judging double-end conversation and method and system for eliminating echo
CN101802909A (en) Speech enhancement with noise level estimation adjustment
AU2009203194A1 (en) Noise spectrum tracking in noisy acoustical signals
EP3316256A1 (en) Voice activity modification frame acquiring method, and voice activity detection method and apparatus
US20140337018A1 (en) Method and device for adaptively adjusting sound effect
US7885810B1 (en) Acoustic signal enhancement method and apparatus
US20030187637A1 (en) Automatic feature compensation based on decomposition of speech and noise

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK YU

AKX Designation fees paid
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090625

REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566