EP0335521A1 - Voice activity detection - Google Patents

Voice activity detection Download PDF

Info

Publication number
EP0335521A1
EP0335521A1 EP89302422A EP89302422A EP0335521A1 EP 0335521 A1 EP0335521 A1 EP 0335521A1 EP 89302422 A EP89302422 A EP 89302422A EP 89302422 A EP89302422 A EP 89302422A EP 0335521 A1 EP0335521 A1 EP 0335521A1
Authority
EP
European Patent Office
Prior art keywords
measure
speech
signal
noise
voice activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP89302422A
Other languages
German (de)
French (fr)
Other versions
EP0335521B1 (en
Inventor
Daniel Kenneth Freeman
Ivan Boyd
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB888805795A external-priority patent/GB8805795D0/en
Priority claimed from GB888813346A external-priority patent/GB8813346D0/en
Priority claimed from GB888820105A external-priority patent/GB8820105D0/en
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Priority to AT89302422T priority Critical patent/ATE97757T1/en
Priority to AT93200015T priority patent/ATE229683T1/en
Priority to EP93200015A priority patent/EP0548054B1/en
Publication of EP0335521A1 publication Critical patent/EP0335521A1/en
Application granted granted Critical
Publication of EP0335521B1 publication Critical patent/EP0335521B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Definitions

  • a voice activity detector is a device which is supplied with a signal with the object of detecting periods of speech, or periods containing only noise.
  • the present invention is not limited thereto, one application of particular interest for such detectors is in mobile radio telephone systems where the knowledge as to the presence or otherwise of speech can be used exploited by a speech coder to improve the efficient utilisation of radio spectrum, and where also the noise level (from a vehicle-mounted unit) is likely to be high.
  • voice activity detection is to locate a measure which differs appreciably between speech and non-speech periods.
  • apparatus which includes a speech coder
  • a number of parameters are readily available from one or other stage of the coder, and it is therefore desirable to economise on processing needed by utilising some such parameter.
  • the main noise sources occur in known defined areas of the frequency spectrum. For example, in a moving car much of the noise (eg, engine noise) is concentrated in the low frequency regions of the spectrum. Where such knowledge of the spectral position of noise is available, it is desirable to base the decision as to whether speech is present or absent upon measurements taken from that portion of the spectrum which contains relatively little noise. It would, of course, be possible in practice to pre-filter the signal before analysing to detect speech activity, but where the voice activity detector follows the output of a speech coder, prefiltering would distort the voice signal to be coded.
  • voice activity detection apparatus comprising means for receiving an input signal, means for estimating the noise signal component of the input signal, means for continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal, and means for comparing a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence or absence of speech in dependence upon whether or not that value is exceeded.
  • voice activity detection apparatus comprising: means for continually forming a spectral distortion measure of the similarity between a portion of the input signal and earlier portions of the input signal and means for comparing the degree of variation between successive values of the measure with a threshold value to produce an output yering the presence or absence of speech in dependence upon whether or not that value is exceeded.
  • the measure is the Itakura-Saito Distortion Measure.
  • a frame of n signal samples (s0, s1, s2, s3, s4 ... s n-1 ) will, when passed through a notional fourth order finite impulse response (FIR) digital filter of impulse response (1, h0, h1, h2, h3), result in a filtered signal (ignoring samples from previous frames)
  • s′ (s0), (s1 + h0s0), (s2 + h0s1 + h1s0), (s3 + h0s2 + h1s1 + h2s0), (s4 + h0s3 + h1s2 + h2s1 + h1s0), (s5 + h0s4 + h1s3 + h2s2 + h3s1), (s6 + h0s5 + h1s4 + h2s3 + h3s2), (s7 ...
  • FIR finite impulse response
  • R′0 can be obtained from a combination of the autocorrelation coefficients R i , weighted by the bracketed constants which determine the frequency band to which the value of R′0 is responsive.
  • bracketed terms are the autocorrelation coefficients of the impulse response of the notional filter, so that the expression above may be simplified to where N is the filter order and H i are the (un-normalised) autocorrelation coefficients of the impulse response of the filter.
  • the effect on the signal autocorrelation coefficients of filtering a signal may be simulated by producing a weighted sum of the autocorrelation coefficients of the (unfiltered) signal, using the impulse response that the required filter would have had.
  • a relatively simple algorithm involving a small number of multiplication operations, may simulate the effect of a digital filter requiring typically a hundred times this number of multiplication operations.
  • This filtering operation may alternatively be viewed as a form of spectrum comparison, with the signal spectrum being matched against a reference spectrum (the inverse of the response of the notional filter). Since the notional filter in this application is selected so as to approximate the inverse of the noise spectrum, this operation may be viewed as a spectral comparison between speech and noise spectra, and the zeroth autocorrelation coefficient thus generated (i.e. the energy of the inverse filtered signal) as a measure of dissimilarity between the spectra.
  • the Itakura-Saito distortion measure is used in LPC to assess the match between the predictor filter and the input spectrum, and in one form is expressed as where A0 etc are the autocorrelation coefficients of the LPC parameter set.
  • the LPC coefficients are the taps of an FIR filter having the inverse spectral response of the input signal so that the LPC coefficient set is the impulse response of the inverse LPC filter, it will be apparent that the Itakura-Saito Distortion Measure is in fact merely a form of equation 1, wherein the filter response H is the inverse of the spectral shape of an all-pole model of the input signal.
  • a signal from a microphone is received at an input 1 and converted to digital samples s at a suitable sampling rate by an analogue to digital converter 2.
  • An LPC analysis unit 3 (in a known type of LPC coder) then derives, for successive frames of n (eg 160) samples, a set of N (eg 8 or 12) LPC filter coefficients L i which are transmitted to represent the input speech.
  • the speech signal s also enters a correlator unit 4 (normally part of the LPC coder 3 since the autocorrelation vector R i of the speech is also usually produced as a step in the LPC analysis although it will be appreciated that a separate correlator could be provided).
  • the correlator 4 produces the autocorrelation vector R i , including the zero order correlation coefficient R0 and at least 2 further autocorrelation coefficients R1, R2, R3. These are then supplied to a multiplier unit 5.
  • a second input 11 is connected to a second microphone located distant from the speaker so as to receive only background noise.
  • the input from this microphone is converted to a digital input sample train by AD convertor 12 and LPC analysed by a second LPC analyser 13.
  • the "noise" LPC coefficients produced from analyser 13 are passed to correlator unit 14, and the autocorrelation vector thus produced is multiplied term by term with the autocorrelation coefficients R i of the input signal from the speech microphone in multiplier 5 and the weighted coefficients thus produced are combined in adder 6 according to Equation 1, so as to apply a filter having the inverse shape of the noise spectrum from the noise-only microphone (which in practice is the same as the shape of the noise spectrum in the signal-plus-noise microphone) and thus filter out most of the noise.
  • the resulting measure M is thresholded by thresholder 7 to produce a logic output 8 indicating the presence or absence of speech; if M is high, speech is deemed to be present.
  • This embodiment does, however, require two microphones and two LPC analysers, which adds to the expense and complexity of the equipment necessary.
  • another embodiment uses a corresponding measure formed using the autocorrelations from the noise microphone 11 and the LPC coefficients from the main microphone 1, so that an extra autocorrelator rather than an LPC analyser is necessary.
  • a buffer 15 which stores a set of LPC coefficients (or the autocorrelation vector of the set) derived from the microphone input 1 in a period identified as being a "non speech" (ie noise only) period. These coefficients are then used to derive a measure using equation 1, which also of course corresponds to the Itakura-Saito Distortion Measure, except that a single stored frame of LPC coefficients corresponding to an approximation of the inverse noise spectrum is used, rather than the present frame of LPC coefficients.
  • the LPC coefficient vector L i output by analyser 3 is also routed to a correlator 14, which produces the autocorrelation vector of the LPC coefficient vector.
  • the buffer memory 15 is controlled by the speech/non-speech output of thresholder 7, in such a way that during "speech" frames the buffer retains the "noise” autocorrelation coefficients, but during "noise” frames a new set of LPC coefficients may be used to update the buffer, for example by a multiple switch 16, via which outputs of the correlator 14, carrying each autocorrelation coefficient, are connected to the buffer 15. It will be appreciated that correlator 14 could be positioned after buffer 15. Further, the speech/no-speech decision for coefficient update need not be from output 8, but could be (and preferably is) otherwise derived.
  • the LPC coefficients stored in the buffer are updated from time to time, so that the apparatus is thus capable of tracking changes in the noise spectrum. It will be appreciated that such updating of the buffer may be necessary only occasionally, or may occur only once at the start of operation of the detector, if (as is often the case) the noise spectrum is relatively stationary over time, but in a mobile radio environment frequent updating is preferred.
  • the system initially employs equation 1 with coefficient terms corresponding to a simple fixed high pass filter, and then subsequently starts to adapt by switching over to using "noise period" LPC coefficients. If, for some reason, speech detection fails, the system may return to using the simple high pass filter.
  • LPC analysis unit 13 is simply replaced by an adaptive filter (for example a transversal FIR or lattice filter), connected so as to whiten the noise input by modelling the inverse filter, and its coefficients are supplied as before to autocorrelator 14.
  • an adaptive filter for example a transversal FIR or lattice filter
  • LPC analysis means 3 is replaced by such an adapter filter, and buffer means 15 is omitted, but switch 16 operates to prevent the adaptive filter from adapting its coefficients during speech periods.
  • the LPC coefficient vector is simply the impulse response of an FIR filter which has a response approximating the inverse spectral shape of the input signal.
  • the Itakura-Saito Distortion Measure between adjacent frames is formed, this is in fact equal to the power of the signal, as filtered by the LPC filter of the previous frame. So if spectra of adjacent frames differ little, a correspondingly small amount of the spectral power of a frame will escape filtering and the measure will be low.
  • a large interframe spectral difference produces a high Itakura-Saito Distortion Measure, so that the measure reflects the spectral similarity of adjacent frames.
  • the Itakura-Saito Distortion Measure between adjacent frames of a noisy signal containing intermittent speech is higher during periods of speech than periods of noise; the degree of variation (as illustrated by the standard deviation) is higher, and less intermittently variable.
  • the standard deviation of the standard deviation of M is also a reliable measure; the effect of taking each standard deviation is essentially to smooth the measure.
  • the measured parameter used to decide whether speech is present is preferably the standard deviation of the Itakura-Saito Distortion Measure, but other measures of variance and other spectral distortion measures (based for example on FFT analysis) could be employed.
  • an adaptive threshold in voice activity detection. Such thresholds must not be adjusted during speech periods or the speech signal will be thresholded out. It is accordingly necessary to control the threshold adapter using a speech/non-speech control signal, and it is preferable that this control signal should be independent of the output of the threshold adapter.
  • the threshold T is adaptively adjusted so as to keep the threshold level just above the level of the measure M when noise only is present. Since the measure will in general vary randomly when noise is present, the threshold is varied by determining an average level over a number of blocks, and setting the threshold at a level proportional to this average. In a noisy environment this is not usually sufficient, however, and so an assessment of the degree of variation of the parameter over several blocks is also taken into account.
  • an input 1 receives a signal which is sampled and digitised by analogue to digital converter (ADC) 2, and supplied to the input of an inverse filter analyser 3, which in practice is part of a speech coder with which the voice activity detector is to work, and which generates coefficients L i (typically 8) of a filter corresponding to the inverse of the input signal spectrum.
  • ADC analogue to digital converter
  • the digitised signal is also supplied to an autocorrelator 4, (which is part of analyser 3) which generates the autocorrelation vector R i of the input signal (or at least as many low order terms as there are LPC coefficients). Operation of these parts of the apparatus is as described in Figres 1 and 2.
  • the autocorrelation coefficients R i are then averaged over several successive speech frames (typically 5-20 ms long) to improve their reliability. This may be achieved by storing each set of autocorrelations coefficients output by autocorrelator 4 in a buffer 4a, and employing an averager 4b to produce a weighted sum of the current autocorrelation coefficients R i and those from previous frames stored in and supplied from buffer 4a.
  • the averaged autocorrelation coefficients Ra i thus derived are supplied to weighting and adding means 5,6 which receives also the autocorrelation vector A i of stored noise-period inverse filter coefficients L i from an autocorrelator 14 via buffer 15, and forms from Ra i and A i a measure M preferably defined as:
  • thresholder 7 This measure is then thresholded by thresholder 7 against a threshold level, and the logical result provides an indication of the presence or absence of speech at output 8.
  • the inverse filter coefficients L i correspond to a fair estimate of the inverse of the noise spectrum, it is desirable to update these coefficients during periods of noise (and, of course, not to update during periods of speech). It is, however, preferable that the speech/non-speech decision on which the updating is based does not depend upon the result of the updating, or else a single wrongly identified frame of signal may result in the voice activity detector subsequently going "out of lock" and wrongly identifying following frames.
  • a control signal generating circuit 20 effectively a separate voice activity detector, which forms an independent control signal indicating the presence or absence of speech to control inverse filter analyser 3 (or buffer 8) so that the inverse filter autocorrelation coefficients A i used to form the measure M are only updated during "noise" periods.
  • the control signal generator circuit 20 includes LPC analyser 21 (which again may be part of a speech coder and, specifically, may be performed by analyser 3), which produces a set of LPC coefficients M i corresponding to the input signal and an autocorrelator 21a (which may be performed by autocorrelator 3a) which derives the autocorrelation coefficients B i of M i .
  • a measure of the spectral similarity between the input speech frame and the preceding speech frame is thus calculated; this may be the Itakura-Saito distortion measure between R i of the present frame and B i of the preceding frame, as disclosed above, or it may instead be derived by calculating the Itakura - Saito distortion measure for R i and B i of the present frame, and subtracting (in subtractor 25) the corresponding measure for the previous frame stored in buffer 24, to generate a spectral difference signal (in either case, the measure is preferably energy-normalised by dividing by R o ).
  • the buffer 24 is then, of course, updated.
  • a voiced speech detection circuit comprising a pitch analyser 27 (which in practice may operate as part of a speech coder, and in particular may measure the long term predictor lag value produced in a multipulse LPC coder).
  • the pitch analyser 27 produces a logic signal which is "true” when voiced speech is detected, and this signal, together with the thresholded measure derived from thresholder 26 (which will generally be “true” when unvoiced speech is present) are supplied to the inputs of a NOR gate 28 to generate a signal which is “false” when speech is present and “true” when noise is present.
  • This signal is supplied to buffer 8 (or to inverse filter analyser 3) so that inverse filter coefficients L i are only updated during noise periods.
  • Threshold adapter 29 is also connected to receive the non-speech signal control output of control signal generator circuit 20. The output of the threshold adapter 29 is supplied to thresholder 7. The threshold adapter operates to increment or decrement the threshold in steps which are a proportion of the instant threshold value, until the threshold approximates the noise power level (which may conveniently be derived from, for example, weighting and adding circuits 22, 23). When the input signal is very low, it may be desirable that the threshold is automatically set to a fixed, low, level since at the low signal levels the effect of signal quantisation produced by ADC 2 can produce unreliable results.
  • hangover generating means 30 which operates to measure the duration of indications of speech after thresholder 7 and, when the presence of speech has been indicated for a period in excess of a predetermined time constant, the output is held high for a short "hangover" period. In this way, clipping of the middle of low-level speech bursts is avoided, and appropriate selection of the time constant prevents triggering of the hangover generator 30 by short spikes of noise which are falsely indicated as speech.
  • DSP Digital Signal Processing
  • the voice detection apparatus may be implemented as part of an LPC codec.
  • autocorrelation coefficients of the signal or related measures partial correlation, or "parcor", coefficients
  • the voice detection may take place distantly from the codec.

Abstract

Voice activity detector (VAD) for use in an LPC coder in a mobile radio system, uses autocorrelation coefficients R₀, R₁..... of the input signal, weighted and combined, to provide a measure M which depends on the power within that part of the spectrum containing no noise, which is thresholded against a variable threshold to provide a speech/no speech logic output. The measure is
Figure imga0001
where Hi are the autocorrelation coefficients of the impulse response of an Nth order FIR inverse noise filter derived from LPC analysis of previous non-speech signal frames. Threshold adaption and coefficient update are controlled by a second VAD responsive to rate of spectral change between frames.

Description

  • A voice activity detector is a device which is supplied with a signal with the object of detecting periods of speech, or periods containing only noise. Although the present invention is not limited thereto, one application of particular interest for such detectors is in mobile radio telephone systems where the knowledge as to the presence or otherwise of speech can be used exploited by a speech coder to improve the efficient utilisation of radio spectrum, and where also the noise level (from a vehicle-mounted unit) is likely to be high.
  • The essence of voice activity detection is to locate a measure which differs appreciably between speech and non-speech periods. In apparatus which includes a speech coder, a number of parameters are readily available from one or other stage of the coder, and it is therefore desirable to economise on processing needed by utilising some such parameter. In many environments, the main noise sources occur in known defined areas of the frequency spectrum. For example, in a moving car much of the noise (eg, engine noise) is concentrated in the low frequency regions of the spectrum. Where such knowledge of the spectral position of noise is available, it is desirable to base the decision as to whether speech is present or absent upon measurements taken from that portion of the spectrum which contains relatively little noise. It would, of course, be possible in practice to pre-filter the signal before analysing to detect speech activity, but where the voice activity detector follows the output of a speech coder, prefiltering would distort the voice signal to be coded.
  • According to a first aspect of the invention there is provided voice activity detection apparatus comprising means for receiving an input signal, means for estimating the noise signal component of the input signal, means for continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal, and means for comparing a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence or absence of speech in dependence upon whether or not that value is exceeded.
  • According to a second aspect of the invention there is provided voice activity detection apparatus comprising: means for continually forming a spectral distortion measure of the similarity between a portion of the input signal and earlier portions of the input signal and means for comparing the degree of variation between successive values of the measure with a threshold value to produce an output incating the presence or absence of speech in dependence upon whether or not that value is exceeded.
  • Preferably, the measure is the Itakura-Saito Distortion Measure.
  • Other aspects of the present invention are as defined in the claims.
  • Some embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
    • Figure 1 is a block diagram of a first embodiment of the invention;
    • Figure 2 shows a second embodiment of the invention;
    • Figure 3 shows a third, preferred embodiment of the invention.
  • The general principle underlying a first Voice Activity Detector according to the a first embodiment of the invention is as follows.
  • A frame of n signal samples (s₀, s₁, s₂, s₃, s₄ ... sn-1 ) will, when passed through a notional fourth order finite impulse response (FIR) digital filter of impulse response (1, h₀, h₁, h₂, h₃), result in a filtered signal (ignoring samples from previous frames)
    s′=
    (s₀),
    (s₁ + h₀s₀),
    (s₂ + h₀s₁ + h₁s₀),
    (s₃ + h₀s₂ + h₁s₁ + h₂s₀),
    (s₄ + h₀s₃ + h₁s₂ + h₂s₁ + h₁s₀),
    (s₅ + h₀s₄ + h₁s₃ + h₂s₂ + h₃s₁),
    (s₆ + h₀s₅ + h₁s₄ + h₂s₃ + h₃s₂),
    (s₇ ... )
    The zero order autocorrelation coefficient is the sum of each term squared, which may be normalized i.e. divided by the total number of terms (for constant frame lengths it is easier to omit the division); that of the filtered signal is thus
    Figure imgb0001
    and this is therefore a measure of the power of the notional filtered signal s′ - in other words, of that part of the signal s which falls within the passband of the notional filter.
    Expanding, neglecting the first 4 terms,
    R′₀ = (s₄ + h₀s₃ + h₁s₂ + h₂s₁ + h₃s₀)²
    + (s₅ + h₀s₄ + h₁s₃ + h₂s₂ + h₃s₁)²
    + ...
    = s 2 4
    Figure imgb0002
    + h₀s₄s₃ + h₁s₄s₂ + h₂s₄s₁ + h₃s₄s₀
    + h₀s₄s₃ + h 2 0
    Figure imgb0003
    s 2 0
    Figure imgb0004
    + h₀h₁s₃s₂ + h₀h₂s₃s₁ + h₀h₃s₃s₀
    + h₁s₄s₂ + h₀h₁s₃s₂ + h 2 1
    Figure imgb0005
    s 2 2
    Figure imgb0006
    + h₁h₂s₂s₁ + h₁h₃s₂s₀
    + h₂s₄s₁ + h₀h₁s₃s₁ + h₁h₂s₂s₁ + h 2 2
    Figure imgb0007
    s 2 1
    Figure imgb0008
    + h₂h₃s₁s₀
    + h₃s₄s₀ + h₀h₃s₃s₀ + h₁h₃s₂s₀ + h₂h₃s₁s₀ + h 2 3
    Figure imgb0009
    s 2 0
    Figure imgb0010

    + ...
    = R₀ (1 + h 2 0
    Figure imgb0011
    + h 2 1
    Figure imgb0012
    + h 2 2
    Figure imgb0013
    + h 2 3
    Figure imgb0014
    )
    + R₁ (2h₀ + 2h₀h₁ + 2h₁h₂ + 2h₂h₃)
    + R₂ (2h₁ + 2h₁h₃ + 2h₀h₂)
    + R₃ (2h₂ + 2h₀h₃)
    + R₄ (2h₃)
    So R′₀ can be obtained from a combination of the autocorrelation coefficients Ri, weighted by the bracketed constants which determine the frequency band to which the value of R′₀ is responsive. In fact, the bracketed terms are the autocorrelation coefficients of the impulse response of the notional filter, so that the expression above may be simplified to
    Figure imgb0015
    where N is the filter order and Hi are the (un-normalised) autocorrelation coefficients of the impulse response of the filter.
  • In other words, the effect on the signal autocorrelation coefficients of filtering a signal may be simulated by producing a weighted sum of the autocorrelation coefficients of the (unfiltered) signal, using the impulse response that the required filter would have had.
  • Thus, a relatively simple algorithm, involving a small number of multiplication operations, may simulate the effect of a digital filter requiring typically a hundred times this number of multiplication operations.
  • This filtering operation may alternatively be viewed as a form of spectrum comparison, with the signal spectrum being matched against a reference spectrum (the inverse of the response of the notional filter). Since the notional filter in this application is selected so as to approximate the inverse of the noise spectrum, this operation may be viewed as a spectral comparison between speech and noise spectra, and the zeroth autocorrelation coefficient thus generated (i.e. the energy of the inverse filtered signal) as a measure of dissimilarity between the spectra. The Itakura-Saito distortion measure is used in LPC to assess the match between the predictor filter and the input spectrum, and in one form is expressed as
    Figure imgb0016
    where A₀ etc are the autocorrelation coefficients of the LPC parameter set. It will be seen that this is closely similar to the relationship derived above, and when it is remembered that the LPC coefficients are the taps of an FIR filter having the inverse spectral response of the input signal so that the LPC coefficient set is the impulse response of the inverse LPC filter, it will be apparent that the Itakura-Saito Distortion Measure is in fact merely a form of equation 1, wherein the filter response H is the inverse of the spectral shape of an all-pole model of the input signal.
  • In fact, it is also possible to transpose the spectra, using the LPC coefficients of the test spectrum and the autocorrelation coefficients of the reference spectrum, to obtain a different measure of spectral similarity.
  • The I-S Distortion measure is further discussed in "Speech Coding based upon Vector Quantisation" by A Buzo, A H Gray, R M Gray and J D Markel, IEEE Trans on ASSP, Vol ASSP-28, No 5, October 1980.
  • Since the frames of signal have only a finite length, and a number of terms (N, where N is the filter order) are neglected, the above result is an approximation only; it gives, however, a surprisingly good indicator of the presence or absence of speech and thus may be used as a measure M in speech detection. In an environment where the noise spectrum is well known and stationary, it is quite possible to simply employ fixed h₀, h₁ etc coefficients to model the inverse noise filter.
  • However, apparatus which can adapt to different noise environments is much more widely useful.
  • Referring to Figure 1, in a first embodiment, a signal from a microphone (not shown) is received at an input 1 and converted to digital samples s at a suitable sampling rate by an analogue to digital converter 2. An LPC analysis unit 3 (in a known type of LPC coder) then derives, for successive frames of n (eg 160) samples, a set of N (eg 8 or 12) LPC filter coefficients Li which are transmitted to represent the input speech. The speech signal s also enters a correlator unit 4 (normally part of the LPC coder 3 since the autocorrelation vector Ri of the speech is also usually produced as a step in the LPC analysis although it will be appreciated that a separate correlator could be provided). The correlator 4 produces the autocorrelation vector Ri, including the zero order correlation coefficient R₀ and at least 2 further autocorrelation coefficients R₁, R₂, R₃. These are then supplied to a multiplier unit 5.
  • A second input 11 is connected to a second microphone located distant from the speaker so as to receive only background noise. The input from this microphone is converted to a digital input sample train by AD convertor 12 and LPC analysed by a second LPC analyser 13. The "noise" LPC coefficients produced from analyser 13 are passed to correlator unit 14, and the autocorrelation vector thus produced is multiplied term by term with the autocorrelation coefficients Ri of the input signal from the speech microphone in multiplier 5 and the weighted coefficients thus produced are combined in adder 6 according to Equation 1, so as to apply a filter having the inverse shape of the noise spectrum from the noise-only microphone (which in practice is the same as the shape of the noise spectrum in the signal-plus-noise microphone) and thus filter out most of the noise. The resulting measure M is thresholded by thresholder 7 to produce a logic output 8 indicating the presence or absence of speech; if M is high, speech is deemed to be present.
  • This embodiment does, however, require two microphones and two LPC analysers, which adds to the expense and complexity of the equipment necessary.
  • Alternatively, another embodiment uses a corresponding measure formed using the autocorrelations from the noise microphone 11 and the LPC coefficients from the main microphone 1, so that an extra autocorrelator rather than an LPC analyser is necessary.
  • These embodiments are therefore able to operate within different environments having noise at different frequencies, or within a changing noise spectrum in a given environment.
  • Referring to Figure 2, in the preferred embodiment of the invention, there is provided a buffer 15 which stores a set of LPC coefficients (or the autocorrelation vector of the set) derived from the microphone input 1 in a period identified as being a "non speech" (ie noise only) period. These coefficients are then used to derive a measure using equation 1, which also of course corresponds to the Itakura-Saito Distortion Measure, except that a single stored frame of LPC coefficients corresponding to an approximation of the inverse noise spectrum is used, rather than the present frame of LPC coefficients.
  • The LPC coefficient vector Li output by analyser 3 is also routed to a correlator 14, which produces the autocorrelation vector of the LPC coefficient vector. The buffer memory 15 is controlled by the speech/non-speech output of thresholder 7, in such a way that during "speech" frames the buffer retains the "noise" autocorrelation coefficients, but during "noise" frames a new set of LPC coefficients may be used to update the buffer, for example by a multiple switch 16, via which outputs of the correlator 14, carrying each autocorrelation coefficient, are connected to the buffer 15. It will be appreciated that correlator 14 could be positioned after buffer 15. Further, the speech/no-speech decision for coefficient update need not be from output 8, but could be (and preferably is) otherwise derived.
  • Since frequent periods without speech occur, the LPC coefficients stored in the buffer are updated from time to time, so that the apparatus is thus capable of tracking changes in the noise spectrum. It will be appreciated that such updating of the buffer may be necessary only occasionally, or may occur only once at the start of operation of the detector, if (as is often the case) the noise spectrum is relatively stationary over time, but in a mobile radio environment frequent updating is preferred.
  • In a modification of this embodiment, the system initially employs equation 1 with coefficient terms corresponding to a simple fixed high pass filter, and then subsequently starts to adapt by switching over to using "noise period" LPC coefficients. If, for some reason, speech detection fails, the system may return to using the simple high pass filter.
  • It is possible to normalise the above measure by dividing through by R₀, so that the expression to be thresholded has the form
    Figure imgb0017
    This measure is independent of the total signal energy in a frame and is thus compensated for gross signal level changes, but gives rather less marked contrast between "noise" and "speech" levels and is hence preferably not employed in high-noise environments.
  • Instead of employing LPC analysis to derive the inverse filter coefficients of the noise signal (from either the noise microphone or noise only periods, as in the various embodiments described above), it is possible to model the inverse noise spectrum using an adaptive filter of known type; as the noise spectrum changes only slowly (as discussed below) a relatively slow coefficient adaption rate common for such filters is acceptable. In one embodiment, which corresponds to Figure 1, LPC analysis unit 13 is simply replaced by an adaptive filter (for example a transversal FIR or lattice filter), connected so as to whiten the noise input by modelling the inverse filter, and its coefficients are supplied as before to autocorrelator 14.
  • In a second embodiment, corresponding to that of Figure 2, LPC analysis means 3 is replaced by such an adapter filter, and buffer means 15 is omitted, but switch 16 operates to prevent the adaptive filter from adapting its coefficients during speech periods.
  • A second Voice Activity Detector in accordance with another aspect of the invention will now be described.
  • From the foregoing, it will be apparent that the LPC coefficient vector is simply the impulse response of an FIR filter which has a response approximating the inverse spectral shape of the input signal. When the Itakura-Saito Distortion Measure between adjacent frames is formed, this is in fact equal to the power of the signal, as filtered by the LPC filter of the previous frame. So if spectra of adjacent frames differ little, a correspondingly small amount of the spectral power of a frame will escape filtering and the measure will be low. Correspondingly, a large interframe spectral difference produces a high Itakura-Saito Distortion Measure, so that the measure reflects the spectral similarity of adjacent frames. In a speech coder, it is desirable to minimise the data rate, so frame length is made as long as possible; in other words, if the frame length is long enough, then a speech signal should show a significant spectral change from frame to frame (if it does not, the coding is redundant). Noise, on the other hand, has a slowly varying spectral shape from frame to frame, and so in a period where speech is absent from the signal then the Itakura-Saito Distortion Measure will correspondingly be low - since applying the inverse LPC filter from the previous frame "filters out" most of the noise power.
  • Typically, the Itakura-Saito Distortion Measure between adjacent frames of a noisy signal containing intermittent speech is higher during periods of speech than periods of noise; the degree of variation (as illustrated by the standard deviation) is higher, and less intermittently variable.
  • It is noted that the standard deviation of the standard deviation of M is also a reliable measure; the effect of taking each standard deviation is essentially to smooth the measure.
  • In this second form of Voice Activity Detector, the measured parameter used to decide whether speech is present is preferably the standard deviation of the Itakura-Saito Distortion Measure, but other measures of variance and other spectral distortion measures (based for example on FFT analysis) could be employed.
  • It is found advantageous to employ an adaptive threshold in voice activity detection. Such thresholds must not be adjusted during speech periods or the speech signal will be thresholded out. It is accordingly necessary to control the threshold adapter using a speech/non-speech control signal, and it is preferable that this control signal should be independent of the output of the threshold adapter.
    The threshold T is adaptively adjusted so as to keep the threshold level just above the level of the measure M when noise only is present. Since the measure will in general vary randomly when noise is present, the threshold is varied by determining an average level over a number of blocks, and setting the threshold at a level proportional to this average. In a noisy environment this is not usually sufficient, however, and so an assessment of the degree of variation of the parameter over several blocks is also taken into account.
  • The threshold value T is therefore preferably calculated according to
    T = M′ + K.d
    where M′ is the average value of the measure over a number of consecutive frames, d is the standard deviation of the measure over those frames, and K is a constant (which may typically be 2).
  • In practice, it is preferred not to resume adaptation immediately after speech is indicated to be absent, but to wait to ensure the fall is stable (to avoid rapid repeated switching between the adapting and non-adapting states).
  • Referring to Figure 3, in a preferred embodiment of the invention incorporating the above aspects, an input 1 receives a signal which is sampled and digitised by analogue to digital converter (ADC) 2, and supplied to the input of an inverse filter analyser 3, which in practice is part of a speech coder with which the voice activity detector is to work, and which generates coefficients Li (typically 8) of a filter corresponding to the inverse of the input signal spectrum. The digitised signal is also supplied to an autocorrelator 4, (which is part of analyser 3) which generates the autocorrelation vector Ri of the input signal (or at least as many low order terms as there are LPC coefficients). Operation of these parts of the apparatus is as described in Figres 1 and 2. Preferably, the autocorrelation coefficients Ri are then averaged over several successive speech frames (typically 5-20 ms long) to improve their reliability. This may be achieved by storing each set of autocorrelations coefficients output by autocorrelator 4 in a buffer 4a, and employing an averager 4b to produce a weighted sum of the current autocorrelation coefficients Ri and those from previous frames stored in and supplied from buffer 4a. The averaged autocorrelation coefficients Rai thus derived are supplied to weighting and adding means 5,6 which receives also the autocorrelation vector Ai of stored noise-period inverse filter coefficients Li from an autocorrelator 14 via buffer 15, and forms from Rai and Ai a measure M preferably defined as:
    Figure imgb0018
  • This measure is then thresholded by thresholder 7 against a threshold level, and the logical result provides an indication of the presence or absence of speech at output 8.
  • In order that the inverse filter coefficients Li correspond to a fair estimate of the inverse of the noise spectrum, it is desirable to update these coefficients during periods of noise (and, of course, not to update during periods of speech). It is, however, preferable that the speech/non-speech decision on which the updating is based does not depend upon the result of the updating, or else a single wrongly identified frame of signal may result in the voice activity detector subsequently going "out of lock" and wrongly identifying following frames. Preferably, therefore, there is provided a control signal generating circuit 20, effectively a separate voice activity detector, which forms an independent control signal indicating the presence or absence of speech to control inverse filter analyser 3 (or buffer 8) so that the inverse filter autocorrelation coefficients Ai used to form the measure M are only updated during "noise" periods. The control signal generator circuit 20 includes LPC analyser 21 (which again may be part of a speech coder and, specifically, may be performed by analyser 3), which produces a set of LPC coefficients Mi corresponding to the input signal and an autocorrelator 21a (which may be performed by autocorrelator 3a) which derives the autocorrelation coefficients Bi of Mi. If analyser 3 is performed by analyser 3, then Mi=Li and Bi=Ai. These autocorrelation coefficients are then supplied to weighting and adding means 22,23 (equivalent to 5, 6) which receive also the autocorrelation vector Ri of the input signal from autocorrelator 4. A measure of the spectral similarity between the input speech frame and the preceding speech frame is thus calculated; this may be the Itakura-Saito distortion measure between Ri of the present frame and Bi of the preceding frame, as disclosed above, or it may instead be derived by calculating the Itakura - Saito distortion measure for Ri and Bi of the present frame, and subtracting (in subtractor 25) the corresponding measure for the previous frame stored in buffer 24, to generate a spectral difference signal (in either case, the measure is preferably energy-normalised by dividing by Ro). The buffer 24 is then, of course, updated. This spectral difference signal, when thresholded by a thresholder 26 is, as discussed above, an indicator of the presence or absence of speech. We have found, however, that although this measure is excellent for distinguishing noise from unvoiced speech (a task which prior art systems are generally incapable of) it is in general rather less able to distinguish noise from voiced speech. Accordingly, there is preferably further provided within circuit 20 a voiced speech detection circuit comprising a pitch analyser 27 (which in practice may operate as part of a speech coder, and in particular may measure the long term predictor lag value produced in a multipulse LPC coder). The pitch analyser 27 produces a logic signal which is "true" when voiced speech is detected, and this signal, together with the thresholded measure derived from thresholder 26 (which will generally be "true" when unvoiced speech is present) are supplied to the inputs of a NOR gate 28 to generate a signal which is "false" when speech is present and "true" when noise is present. This signal is supplied to buffer 8 (or to inverse filter analyser 3) so that inverse filter coefficients Li are only updated during noise periods.
  • Threshold adapter 29 is also connected to receive the non-speech signal control output of control signal generator circuit 20. The output of the threshold adapter 29 is supplied to thresholder 7. The threshold adapter operates to increment or decrement the threshold in steps which are a proportion of the instant threshold value, until the threshold approximates the noise power level (which may conveniently be derived from, for example, weighting and adding circuits 22, 23). When the input signal is very low, it may be desirable that the threshold is automatically set to a fixed, low, level since at the low signal levels the effect of signal quantisation produced by ADC 2 can produce unreliable results.
  • There may be further provided "hangover" generating means 30, which operates to measure the duration of indications of speech after thresholder 7 and, when the presence of speech has been indicated for a period in excess of a predetermined time constant, the output is held high for a short "hangover" period. In this way, clipping of the middle of low-level speech bursts is avoided, and appropriate selection of the time constant prevents triggering of the hangover generator 30 by short spikes of noise which are falsely indicated as speech. It will of course be appreciated that all the above functions may be executed by a single suitably programmed digital processing means such as a Digital Signal Processing (DSP) chip, as part of an LPC codec thus implemented (this is the preferred implementation), or as a suitably programmed microcomputer or microcontroller chip with an associated memory device.
  • Conveniently, as described above, the voice detection apparatus may be implemented as part of an LPC codec. Alternatively, where autocorrelation coefficients of the signal or related measures (partial correlation, or "parcor", coefficients) are transmitted to a distant station the voice detection may take place distantly from the codec.

Claims (24)

1. Voice activity detection apparatus comprising means for receiving an input signal, means for estimating the noise signal component of the input signal, means for continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal component, and means for comparing a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence or absence of speech in dependence upon whether or not that value is exceeded.
2. Apparatus according to claim 1, in which the noise estimating means comprises means for computing the autocorrelation coefficients Ai of the impulse response of an FIR filter having a response approximating the inverse of the short term spectrum of the noise signal component, and the measure forming means comprises means for computing the autocorrelation coefficients Ri of the signal, means connected to receive Ri and Ai, and to calculate M therefrom, the parameter being the value of M.
3. Apparatus according to claim 2, in which
Figure imgb0019
4. Apparatus according to claim 2, in which
Figure imgb0020
5. Apparatus according to any one of claims 2 to 4, further comprising an input arranged to receive a second signal, similarly subject to noise, from which speech is absent, in which the Ai computing means comprises LPC analysis means for deriving values of Ai from the second signal.
6. Apparatus according to any one of claims 2 to 4, further comprising a buffer connected to store data from which the autocorrelation coefficients Ai of the said filter response may be derived, in which the said filter response is periodically calculated from the signal by LPC analysis means, the apparatus being so connected and controlled that the measure M is calculated using the said stored data, and the said stored data is updated only from periods in which speech is indicated to be absent.
7. Apparatus according to any one of claims 1 to 4 in which the noise estimating means includes an adaptive filter.
8. Apparatus according to any one of claims 2 to 6 characterised in that the means for computing the autocorrelation coefficients of the signal are arranged to do so in dependence upon the autocorrelation coefficients of several successive portions of the signal.
9. Apparatus according to claim 1 in which the measure M is a spectral distortion measure.
10. Apparatus according to claim 9 in which the measure M is the Itakura-Saito Distortion measure.
11. Apparatus according to any one of the preceding claims, further comprising means for adjusting the said predetermined threshold T during periods when speech is indicated to be absent.
12. Apparatus detector according to claim 11, further comprising second voice activity detection means arranged to prevent adjustment of the threshold value when speech is present.
13. Apparatus detector as claimed in claim 11 or claim 12, in which the threshold value T is, when adjusted, adjusted to be equal to the mean of the measure plus a term which is a function of the standard deviation of the measure.
14. Voice activity detection apparatus comprising: means for continually forming a spectral distortion measure of the similarity between a portion of the input signal and earlier portions of the input signal and means for comparing the degree of variation between successive values of the measure with a threshold value to produce an output indicating the presence or absence of speech in dependence upon whether or not that value is exceeded.
15. Apparatus as claimed in claim 14, wherein the degree of variation is measured as the standard deviation of a block of successive values of the measure.
16. Apparatus according to Claim 6 further comprising means for indicating the absence of speech to control the updating of the said stored data, the means for indicating the absence of speech being a second voice activity detection means.
17. Apparatus according to Claim 16 and Claim 13 in which the said second voice activity detection means controls both threshold adaption and data updating.
18. Apparatus according to Claim 13 or Claim 16 or Claim 17 in which said second voice activity detection means is apparatus according to Claim 14 or Claim 15.
19. A method of detecting speech activity in a signal, comprising the steps of comparing the signal spectrum with an estimated noise spectrum, forming a variable measure of the spectral similarity therebetween, and comparing that measure with a threshold.
20. A method of detecting speech activity in a signal, comprising the steps of comparing the signal spectrum with a preceding portion of the signal, forming a variable measure of the spectral similarity therebetween, and comparing the degree of variation between successive values of the measure with a threshold.
21. Voice activity detection apparatus substantially as herein described, with reference to Figure 1 or Figure 2 or Figure 3.
22. Apparatus for encoding speech signals including apparatus according to any preceding claim.
23. Mobile telephone apparatus including apparatus according to any preceding claim.
24. A method of detecting speech substantially as herein described.
EP89302422A 1988-03-11 1989-03-10 Voice activity detection Expired - Lifetime EP0335521B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AT89302422T ATE97757T1 (en) 1988-03-11 1989-03-10 DETECTION FOR THE PRESENCE OF A VOICE SIGNAL.
AT93200015T ATE229683T1 (en) 1988-03-11 1989-03-10 ARRANGEMENT FOR DETERMINING THE PRESENCE OF SPEECH SOUNDS
EP93200015A EP0548054B1 (en) 1988-03-11 1989-03-10 Voice activity detector

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
GB8805795 1988-03-11
GB888805795A GB8805795D0 (en) 1988-03-11 1988-03-11 Voice activity detector
GB888813346A GB8813346D0 (en) 1988-06-06 1988-06-06 Voice activity detection
GB8813346 1988-06-06
GB888820105A GB8820105D0 (en) 1988-08-24 1988-08-24 Voice activity detection
GB8820105 1988-08-24

Related Child Applications (2)

Application Number Title Priority Date Filing Date
EP93200015A Division EP0548054B1 (en) 1988-03-11 1989-03-10 Voice activity detector
EP93200015.1 Division-Into 1993-01-06

Publications (2)

Publication Number Publication Date
EP0335521A1 true EP0335521A1 (en) 1989-10-04
EP0335521B1 EP0335521B1 (en) 1993-11-24

Family

ID=27263821

Family Applications (2)

Application Number Title Priority Date Filing Date
EP89302422A Expired - Lifetime EP0335521B1 (en) 1988-03-11 1989-03-10 Voice activity detection
EP93200015A Expired - Lifetime EP0548054B1 (en) 1988-03-11 1989-03-10 Voice activity detector

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP93200015A Expired - Lifetime EP0548054B1 (en) 1988-03-11 1989-03-10 Voice activity detector

Country Status (16)

Country Link
EP (2) EP0335521B1 (en)
JP (2) JP3321156B2 (en)
KR (1) KR0161258B1 (en)
AU (1) AU608432B2 (en)
BR (1) BR8907308A (en)
CA (1) CA1335003C (en)
DE (2) DE68910859T2 (en)
DK (1) DK175478B1 (en)
ES (2) ES2188588T3 (en)
FI (2) FI110726B (en)
HK (1) HK135896A (en)
IE (1) IE61863B1 (en)
NO (2) NO304858B1 (en)
NZ (1) NZ228290A (en)
PT (1) PT89978B (en)
WO (1) WO1989008910A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0435458A1 (en) * 1989-11-28 1991-07-03 Nec Corporation Speech/voiceband data discriminator
EP0451796A1 (en) * 1990-04-09 1991-10-16 Kabushiki Kaisha Toshiba Speech detection apparatus with influence of input level and noise reduced
FR2697101A1 (en) * 1992-10-21 1994-04-22 Sextant Avionique Speech detection method
EP0625774A2 (en) * 1993-05-19 1994-11-23 Matsushita Electric Industrial Co., Ltd. A method and an apparatus for speech detection
WO1994028542A1 (en) * 1993-05-26 1994-12-08 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
EP0633658A2 (en) * 1993-07-06 1995-01-11 Hughes Aircraft Company Voice activated transmission coupled AGC circuit
WO1995012879A1 (en) * 1993-11-02 1995-05-11 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
FR2727236A1 (en) * 1994-11-22 1996-05-24 Alcatel Mobile Comm France DETECTION OF VOICE ACTIVITY
WO1996034382A1 (en) * 1995-04-28 1996-10-31 Northern Telecom Limited Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals
EP0768770A1 (en) * 1995-10-13 1997-04-16 France Telecom Method and arrangement for the creation of comfort noise in a digital transmission system
GB2306010A (en) * 1995-10-04 1997-04-23 Univ Wales Medicine A method of classifying signals
US5632004A (en) * 1993-01-29 1997-05-20 Telefonaktiebolaget Lm Ericsson Method and apparatus for encoding/decoding of background sounds
WO1998001847A1 (en) * 1996-07-03 1998-01-15 British Telecommunications Public Limited Company Voice activity detector
EP0786760A3 (en) * 1996-01-29 1998-09-16 Texas Instruments Incorporated Speech coding
WO2000063887A1 (en) * 1999-04-19 2000-10-26 Motorola Inc. Noise suppression using external voice activity detection
WO2005048619A1 (en) * 2003-11-12 2005-05-26 Koninklijke Philips Electronics N.V. Method and apparatus for transferring no-speech data in voice channel
EP1703493A2 (en) * 1994-08-10 2006-09-20 Qualcomm Incorporated Method and apparatus for selecting an encoding rate in a variable rate vocoder
EP1887559A3 (en) * 2006-08-10 2009-01-14 STMicroelectronics Asia Pacific Pte Ltd. Yule walker based low-complexity voice activity detector in noise suppression systems
CN101010722B (en) * 2004-08-30 2012-04-11 诺基亚西门子网络公司 Device and method of detection of voice activity in an audio signal
US8244528B2 (en) 2008-04-25 2012-08-14 Nokia Corporation Method and apparatus for voice activity determination
US8275136B2 (en) 2008-04-25 2012-09-25 Nokia Corporation Electronic device speech enhancement
US8611556B2 (en) 2008-04-25 2013-12-17 Nokia Corporation Calibrating multiple microphones

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5241692A (en) * 1991-02-19 1993-08-31 Motorola, Inc. Interference reduction system for a speech recognition device
IN184794B (en) * 1993-09-14 2000-09-30 British Telecomm
DE10052626A1 (en) * 2000-10-24 2002-05-02 Alcatel Sa Adaptive noise level estimator
US7155388B2 (en) * 2004-06-30 2006-12-26 Motorola, Inc. Method and apparatus for characterizing inhalation noise and calculating parameters based on the characterization
US7139701B2 (en) * 2004-06-30 2006-11-21 Motorola, Inc. Method for detecting and attenuating inhalation noise in a communication system
US8708702B2 (en) * 2004-09-16 2014-04-29 Lena Foundation Systems and methods for learning using contextual feedback
US8954324B2 (en) * 2007-09-28 2015-02-10 Qualcomm Incorporated Multiple microphone voice activity detector
US8175871B2 (en) 2007-09-28 2012-05-08 Qualcomm Incorporated Apparatus and method of noise and echo reduction in multiple microphone audio systems
US8223988B2 (en) 2008-01-29 2012-07-17 Qualcomm Incorporated Enhanced blind source separation algorithm for highly correlated mixtures
ES2371619B1 (en) * 2009-10-08 2012-08-08 Telefónica, S.A. VOICE SEGMENT DETECTION PROCEDURE.
WO2011049516A1 (en) * 2009-10-19 2011-04-28 Telefonaktiebolaget Lm Ericsson (Publ) Detector and method for voice activity detection
CN108985277B (en) * 2018-08-24 2020-11-10 广东石油化工学院 Method and system for filtering background noise in power signal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4052568A (en) * 1976-04-23 1977-10-04 Communications Satellite Corporation Digital voice switch
GB2061676A (en) * 1979-08-31 1981-05-13 Marconi Co Ltd Voice detector
US4358738A (en) * 1976-06-07 1982-11-09 Kahn Leonard R Signal presence determination method for use in a contaminated medium
EP0127718A1 (en) * 1983-06-07 1984-12-12 International Business Machines Corporation Process for activity detection in a voice transmission system
EP0178933A2 (en) * 1984-10-17 1986-04-23 Sharp Kabushiki Kaisha Auto-correlation filter
US4688256A (en) * 1982-12-22 1987-08-18 Nec Corporation Speech detector capable of avoiding an interruption by monitoring a variation of a spectrum of an input signal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3509281A (en) * 1966-09-29 1970-04-28 Ibm Voicing detection system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4052568A (en) * 1976-04-23 1977-10-04 Communications Satellite Corporation Digital voice switch
US4358738A (en) * 1976-06-07 1982-11-09 Kahn Leonard R Signal presence determination method for use in a contaminated medium
GB2061676A (en) * 1979-08-31 1981-05-13 Marconi Co Ltd Voice detector
US4688256A (en) * 1982-12-22 1987-08-18 Nec Corporation Speech detector capable of avoiding an interruption by monitoring a variation of a spectrum of an input signal
EP0127718A1 (en) * 1983-06-07 1984-12-12 International Business Machines Corporation Process for activity detection in a voice transmission system
EP0178933A2 (en) * 1984-10-17 1986-04-23 Sharp Kabushiki Kaisha Auto-correlation filter

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
1977 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, and SIGNAL PROCESSING, Hartford, Connecticut, 9th-11th May 1977, pages 425-428, IEEE, New York, US; R.J. McAULAY: "Optimum speech classification and its application to adaptive noise cancellation" *
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 22, no. 7, December 1979, pages 2624-2625, New York, US; R.J. JOHNSON et al.: "Speech detector" *
ICASSP'81 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Atlanta, 30th March - 1st April 1981, vol. 3, pages 1082-1085, IEEE, New York, US; C.K. UN et al.: "Improving LPC analysis of noisy speech by autocorrelation subtraction method" *
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-25, no. 4, August 1977, pages 338-343, New York, US; L.R. RABINER et al.: "Application of an LPC distance measure to the voiced-unvoiced-silence detection problem" *
IEEE TRANSACTIONS ON COMMUNICATIONS, vol. COM-26, no. 1, January 1978, pages 140-145, IEEE, New York, US; P.G. DRAGO et al.: "Digital dynamic speech detectors" *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0435458A1 (en) * 1989-11-28 1991-07-03 Nec Corporation Speech/voiceband data discriminator
EP0451796A1 (en) * 1990-04-09 1991-10-16 Kabushiki Kaisha Toshiba Speech detection apparatus with influence of input level and noise reduced
US5293588A (en) * 1990-04-09 1994-03-08 Kabushiki Kaisha Toshiba Speech detection apparatus not affected by input energy or background noise levels
FR2697101A1 (en) * 1992-10-21 1994-04-22 Sextant Avionique Speech detection method
EP0594480A1 (en) * 1992-10-21 1994-04-27 Sextant Avionique Speech detection method
US5572623A (en) * 1992-10-21 1996-11-05 Sextant Avionique Method of speech detection
US5632004A (en) * 1993-01-29 1997-05-20 Telefonaktiebolaget Lm Ericsson Method and apparatus for encoding/decoding of background sounds
EP0625774A2 (en) * 1993-05-19 1994-11-23 Matsushita Electric Industrial Co., Ltd. A method and an apparatus for speech detection
EP0625774A3 (en) * 1993-05-19 1996-10-30 Matsushita Electric Ind Co Ltd A method and an apparatus for speech detection.
US5611019A (en) * 1993-05-19 1997-03-11 Matsushita Electric Industrial Co., Ltd. Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech
WO1994028542A1 (en) * 1993-05-26 1994-12-08 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
AU681551B2 (en) * 1993-05-26 1997-08-28 Telefonaktiebolaget Lm Ericsson (Publ) Discriminating between stationary and non-stationary signals
US5579432A (en) * 1993-05-26 1996-11-26 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
AU670383B2 (en) * 1993-05-26 1996-07-11 Telefonaktiebolaget Lm Ericsson (Publ) Discriminating between stationary and non-stationary signals
EP0633658A2 (en) * 1993-07-06 1995-01-11 Hughes Aircraft Company Voice activated transmission coupled AGC circuit
EP0633658A3 (en) * 1993-07-06 1996-01-17 Hughes Aircraft Co Voice activated transmission coupled AGC circuit.
AU672934B2 (en) * 1993-11-02 1996-10-17 Telefonaktiebolaget Lm Ericsson (Publ) Discriminating between stationary and non-stationary signals
WO1995012879A1 (en) * 1993-11-02 1995-05-11 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
US5579435A (en) * 1993-11-02 1996-11-26 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
EP1703493A2 (en) * 1994-08-10 2006-09-20 Qualcomm Incorporated Method and apparatus for selecting an encoding rate in a variable rate vocoder
EP1703493A3 (en) * 1994-08-10 2007-02-14 Qualcomm Incorporated Method and apparatus for selecting an encoding rate in a variable rate vocoder
FR2727236A1 (en) * 1994-11-22 1996-05-24 Alcatel Mobile Comm France DETECTION OF VOICE ACTIVITY
EP0714088A1 (en) * 1994-11-22 1996-05-29 Alcatel Mobile Phones Voice activity detection
US5732141A (en) * 1994-11-22 1998-03-24 Alcatel Mobile Phones Detecting voice activity
WO1996034382A1 (en) * 1995-04-28 1996-10-31 Northern Telecom Limited Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals
US5774847A (en) * 1995-04-28 1998-06-30 Northern Telecom Limited Methods and apparatus for distinguishing stationary signals from non-stationary signals
GB2317084B (en) * 1995-04-28 2000-01-19 Northern Telecom Ltd Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals
GB2317084A (en) * 1995-04-28 1998-03-11 Northern Telecom Ltd Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals
GB2306010A (en) * 1995-10-04 1997-04-23 Univ Wales Medicine A method of classifying signals
US5812965A (en) * 1995-10-13 1998-09-22 France Telecom Process and device for creating comfort noise in a digital speech transmission system
EP0768770A1 (en) * 1995-10-13 1997-04-16 France Telecom Method and arrangement for the creation of comfort noise in a digital transmission system
FR2739995A1 (en) * 1995-10-13 1997-04-18 Massaloux Dominique METHOD AND DEVICE FOR CREATING A COMFORT NOISE IN A DIGITAL SPEECH TRANSMISSION SYSTEM
EP0786760A3 (en) * 1996-01-29 1998-09-16 Texas Instruments Incorporated Speech coding
US6427134B1 (en) 1996-07-03 2002-07-30 British Telecommunications Public Limited Company Voice activity detector for calculating spectral irregularity measure on the basis of spectral difference measurements
WO1998001847A1 (en) * 1996-07-03 1998-01-15 British Telecommunications Public Limited Company Voice activity detector
US6618701B2 (en) 1999-04-19 2003-09-09 Motorola, Inc. Method and system for noise suppression using external voice activity detection
WO2000063887A1 (en) * 1999-04-19 2000-10-26 Motorola Inc. Noise suppression using external voice activity detection
WO2005048619A1 (en) * 2003-11-12 2005-05-26 Koninklijke Philips Electronics N.V. Method and apparatus for transferring no-speech data in voice channel
CN101010722B (en) * 2004-08-30 2012-04-11 诺基亚西门子网络公司 Device and method of detection of voice activity in an audio signal
EP1887559A3 (en) * 2006-08-10 2009-01-14 STMicroelectronics Asia Pacific Pte Ltd. Yule walker based low-complexity voice activity detector in noise suppression systems
US8775168B2 (en) 2006-08-10 2014-07-08 Stmicroelectronics Asia Pacific Pte, Ltd. Yule walker based low-complexity voice activity detector in noise suppression systems
US8244528B2 (en) 2008-04-25 2012-08-14 Nokia Corporation Method and apparatus for voice activity determination
US8275136B2 (en) 2008-04-25 2012-09-25 Nokia Corporation Electronic device speech enhancement
US8611556B2 (en) 2008-04-25 2013-12-17 Nokia Corporation Calibrating multiple microphones
US8682662B2 (en) 2008-04-25 2014-03-25 Nokia Corporation Method and apparatus for voice activity determination

Also Published As

Publication number Publication date
FI115328B (en) 2005-04-15
NO903936L (en) 1990-11-09
FI110726B (en) 2003-03-14
BR8907308A (en) 1991-03-19
NO304858B1 (en) 1999-02-22
DE68910859T2 (en) 1994-12-08
DK215690D0 (en) 1990-09-07
NZ228290A (en) 1992-01-29
KR900700993A (en) 1990-08-17
EP0548054B1 (en) 2002-12-11
KR0161258B1 (en) 1999-03-20
ES2047664T3 (en) 1994-03-01
DE68929442T2 (en) 2003-10-02
WO1989008910A1 (en) 1989-09-21
NO316610B1 (en) 2004-03-08
DK215690A (en) 1990-09-07
NO982568L (en) 1990-11-09
HK135896A (en) 1996-08-02
NO903936D0 (en) 1990-09-10
DK175478B1 (en) 2004-11-08
EP0548054A3 (en) 1994-01-12
PT89978A (en) 1989-11-10
DE68929442D1 (en) 2003-01-23
PT89978B (en) 1995-03-01
JPH03504283A (en) 1991-09-19
FI904410A0 (en) 1990-09-07
CA1335003C (en) 1995-03-28
JP2000148172A (en) 2000-05-26
IE61863B1 (en) 1994-11-30
FI20010933A (en) 2001-05-04
JP3321156B2 (en) 2002-09-03
DE68910859D1 (en) 1994-01-05
EP0335521B1 (en) 1993-11-24
EP0548054A2 (en) 1993-06-23
NO982568D0 (en) 1998-06-04
JP3423906B2 (en) 2003-07-07
ES2188588T3 (en) 2003-07-01
AU608432B2 (en) 1991-03-28
AU3355489A (en) 1989-10-05
IE890774L (en) 1989-09-11

Similar Documents

Publication Publication Date Title
EP0335521B1 (en) Voice activity detection
US5276765A (en) Voice activity detection
US4630304A (en) Automatic background noise estimator for a noise suppression system
US5579435A (en) Discriminating between stationary and non-stationary signals
US5091948A (en) Speaker recognition with glottal pulse-shapes
KR950000842B1 (en) Pitch detector
CA1123955A (en) Speech analysis and synthesis apparatus
US5970441A (en) Detection of periodicity information from an audio signal
JPH09212195A (en) Device and method for voice activity detection and mobile station
US5579432A (en) Discriminating between stationary and non-stationary signals
US5632004A (en) Method and apparatus for encoding/decoding of background sounds
JPH08221097A (en) Detection method of audio component
US4972490A (en) Distance measurement control of a multiple detector system
Vahatalo et al. Voice activity detection for GSM adaptive multi-rate codec
JP2002258881A (en) Device and program for detecting voice
US6993478B2 (en) Vector estimation system, method and associated encoder
JPH01502858A (en) Apparatus and method for detecting the presence of fundamental frequencies in audio frames
NZ286953A (en) Speech encoder/decoder: discriminating between speech and background sound
Prasad et al. A 2.4 Kilobits Per Second Linear Prediction Vocoder
JPH0827637B2 (en) Voice / silence judgment circuit

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE ES FR GB GR IT LI LU NL SE

17P Request for examination filed

Effective date: 19900306

17Q First examination report despatched

Effective date: 19910201

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH DE ES FR GB GR IT LI LU NL SE

REF Corresponds to:

Ref document number: 97757

Country of ref document: AT

Date of ref document: 19931215

Kind code of ref document: T

ITF It: translation for a ep patent filed

Owner name: JACOBACCI CASETTA & PERANI S.P.A.

REF Corresponds to:

Ref document number: 68910859

Country of ref document: DE

Date of ref document: 19940105

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2047664

Country of ref document: ES

Kind code of ref document: T3

ET Fr: translation filed
REG Reference to a national code

Ref country code: GR

Ref legal event code: FG4A

Free format text: 3010629

EPTA Lu: last paid annual fee
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
EAL Se: european patent in force in sweden

Ref document number: 89302422.4

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

NLS Nl: assignments of ep-patents

Owner name: LG ELECTRONICS INC.

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

REG Reference to a national code

Ref country code: CH

Ref legal event code: NV

Representative=s name: SAEGER & PARTNER

Ref country code: CH

Ref legal event code: PUE

Owner name: LG ELECTRONICS INC.

Free format text: BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY#81 NEWGATE STREET#LONDON EC1A 7AJ (GB) -TRANSFER TO- LG ELECTRONICS INC.#LG TWIN TOWERS 20, YEOUIDO-DONG YEONGDEUNGPO-GU#SEOUL, 150-721 (KR)

REG Reference to a national code

Ref country code: ES

Ref legal event code: PC2A

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: CH

Payment date: 20080228

Year of fee payment: 20

REG Reference to a national code

Ref country code: CH

Ref legal event code: PCAR

Free format text: MANFRED SAEGER;POSTFACH 5;7304 MAIENFELD (CH)

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: LU

Payment date: 20080313

Year of fee payment: 20

Ref country code: IT

Payment date: 20080327

Year of fee payment: 20

Ref country code: SE

Payment date: 20080306

Year of fee payment: 20

Ref country code: GB

Payment date: 20080305

Year of fee payment: 20

Ref country code: NL

Payment date: 20080203

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: AT

Payment date: 20080313

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20080311

Year of fee payment: 20

Ref country code: ES

Payment date: 20080418

Year of fee payment: 20

Ref country code: DE

Payment date: 20080306

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: BE

Payment date: 20080916

Year of fee payment: 20

Ref country code: GR

Payment date: 20080215

Year of fee payment: 20

BE20 Be: patent expired

Owner name: *LG ELECTRONICS INC.

Effective date: 20090310

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20090309

NLV7 Nl: ceased due to reaching the maximum lifetime of a patent

Effective date: 20090310

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20090311

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20090310

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20090309

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20090311