WO2002032356A1 - Transient processing for communication system - Google Patents

Transient processing for communication system Download PDF

Info

Publication number
WO2002032356A1
WO2002032356A1 PCT/US2001/032455 US0132455W WO0232356A1 WO 2002032356 A1 WO2002032356 A1 WO 2002032356A1 US 0132455 W US0132455 W US 0132455W WO 0232356 A1 WO0232356 A1 WO 0232356A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
cabin
signal
voice
noise
Prior art date
Application number
PCT/US2001/032455
Other languages
French (fr)
Other versions
WO2002032356A8 (en
Inventor
Alan M. Finn
Saligrama R. Venkatesh
Ronald R. Reich
Philip Lemay
Original Assignee
Lear Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/692,531 external-priority patent/US7171003B1/en
Priority claimed from US09/692,268 external-priority patent/US7039197B1/en
Priority claimed from US09/691,869 external-priority patent/US6748086B1/en
Priority claimed from US09/691,928 external-priority patent/US6674865B1/en
Priority claimed from US09/692,725 external-priority patent/US7117145B1/en
Application filed by Lear Corporation filed Critical Lear Corporation
Priority to AU2002224413A priority Critical patent/AU2002224413A1/en
Publication of WO2002032356A1 publication Critical patent/WO2002032356A1/en
Publication of WO2002032356A8 publication Critical patent/WO2002032356A8/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/13Acoustic transducers and sound field adaptation in vehicles

Definitions

  • the present invention relates to improvements in voice amplification and clarification in a noisy environment, such as a cabin communication system, which enables a voice spoken within the cabin to be increased in volume for improved understanding while minimizing any unwanted noise amplification.
  • the present invention also relates to a movable cabin that advantageously includes such a cabin communication system for this purpose.
  • the term "movable cabin” is intended to be embodied by a car, truck or any other wheeled vehicle, an airplane or helicopter, a boat, a railroad car and indeed any other enclosed space that is movable and wherein a spoken voice may need to be amplified or clarified.
  • an echo cancellation apparatus such as an acoustic echo cancellation apparatus, can be coupled between the microphone and the loudspeaker to remove the portion of the picked-up signal corresponding to the voice component output by the loudspeaker.
  • the speech and noise occupy the same bandwidth, and therefore cannot be separated by band-limited filters.
  • different people speak differently, and therefore it is harder to properly identify the speech components in the mixed signal.
  • the noise characteristics vary rapidly and unpredictably, due to the changing sources of noise as the vehicle moves.
  • the speech signal is not stationary, and therefore constant adaptation to its characteristics is required.
  • One prior art approach to speech intelligibility enhancement is filtering. As noted above, since speech and noise occupy the same bandwidth, simple band-limited filtering will not suffice. That is, the overlap of speech and noise in the same frequency band means that filtering based on frequency separation will not work.
  • filtering may be based on the relative orthogonality between speech and noise waveforms.
  • highly non-stationary nature of speech necessitates adaptation to continuously estimate a filter to subtract the noise.
  • the filter will also depend on the noise characteristics, which in this environment are time-varying on a slower scale than speech and depend on such factors as vehicle speed, road surface and weather.
  • Fig. 1 is a simplified block diagram of a conventional cabin communication system (CCS) 100 using only a microphone 102 and a loudspeaker 104.
  • CCS cabin communication system
  • an echo canceller 106 and a conventional speech enhancement filter (SEF)108 are connected between the microphone 102 and loudspeaker 104.
  • a summer 110 subtracts the output of the echo canceller 106 from the input of the microphone 102, and the result is input to the SEF 108 and used as a control signal therefor.
  • the output of the SEF 108 which is the output of the loudspeaker 26, is the input to the echo canceller 106.
  • online identification of the transfer function of the acoustic path (including the loudspeaker 104 and the microphone 102) is performed, and the signal contribution from the acoustic path is subtracted.
  • the two problems of removing echos and removing noise are addressed separately and the loss in performance resulting from coupling of the adaptive SEF and the adaptive echo canceller is usually insignificant. This is because speech and noise are correlated only over a relatively short period of time. Therefore, the signal coming out of the loudspeaker can be made to be uncorrelated from the signal received directly at the microphone by adding adequate delay into the SEF. This ensures robust identification of the echo canceller and in this way the problems can be completely decoupled. The delay does not pose a problem in large enclosures, public address systems and telecommunication systems such as automobile hands-free telephones.
  • the acoustics of relatively smaller movable cabins dictate that processing be completed in a relatively short time to prevent the perception of an echo from direct and reproduced paths.
  • the reproduced voice output from the loudspeaker should be heard by the listener at substantially the same time as the original voice from the speaker is heard.
  • the acoustic paths are such that an addition of delay beyond approximately 20ms will sound like an echo, with one version coming from the direct path and another from the loudspeaker. This puts a limit on the total processing time, which means a limit both on the amount of delay and on the length of the signal that can be processed.
  • conventional adaptive filtering applied to a cabin communication system may reduce voice quality by introducing distortion or by creating artifacts such as tones or echos. If the echo cancellation process is coupled with the speech extraction filter, it becomes difficult to accurately estimate the acoustic transfer functions, and this in turn leads to poor estimates of noise spectrum and consequently poor speech intelligibility at the loudspeaker.
  • An advantageous approach to overcoming this problem is disclosed below, as are the structure and operation of an advantageous adaptive SEF.
  • filters are known for use in the task of speech intelligibility enhancement. These filters can be broadly classified into two main categories: (1) filters based on a Wiener filtering approach and (2) filters based on the method of spectral subtraction. Two other approaches, i.e. Kalman filtering and H- infinity filtering, have also been tried, but will not be discussed further herein.
  • Spectral subtraction has been subjected to rigorous analysis, and it is well known, at least as it currently stands, not to be suitable for low SNR (signal-to- noise) environments because it results in "musical tone” artifacts and in unacceptable degradation in speech quality.
  • the movable cabin in which the present invention is intended to be used is just such a low SNR environment.
  • the present invention is an improvement on Wiener filtering, which has been widely applied for speech enhancement in noisy environments.
  • the Wiener filtering technique is statistical in nature, i.e. it constructs the optimal linear estimator (in the sense of minimizing the expected squared error) of an unknown desired stationary signal, n. from a noisy observation, y, which is also stationary.
  • the optimal linear estimator is in the form of a convolution operator in the time domain, which is readily converted to a multiplication in the frequency domain.
  • the Wiener filter can be applied to estimate noise, and then the resulting estimate can be subtracted from the noisy speech to give an estimate for the speech signal.
  • Wiener filtering requires the solution, h, to the following Wiener-Hopf equation:
  • R ny is the cross-correlation matrix of the noise-only signal with
  • R yy is the auto-correlation matrix of the noisy speech
  • h is the
  • m is the length of the data window.
  • S nn and S yv are the Fourier Transforms, or equivalently the power
  • PSDs spectral densities
  • AEC adaptive acoustic echo canceller
  • CCS cabin communication system
  • the echo cancellation has to be adaptive because the acoustics of a cabin change due to temperature, humidity and passenger movement. It has also been recognized that
  • a CCS couples the echo cancellation process with the SEF.
  • the present invention is different from the prior art in in addressing the coupled on-line
  • One such aspect relates to an improved AGC in accordance with the present invention controls amplification volume and related functions in the CCS, including the generation of appropriate gain control signals for
  • Such volume control should have an
  • any microphone in a cabin will detect not only the ambient noise, but also sounds purposefully introduced into the cabin.
  • sounds include, for example, sounds from the entertainment system (radio, CD player or even
  • a further aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS.
  • the user interface enables customized use of the plural microphones and loudspeakers.
  • an object of the invention to provide an adaptive speech extraction filter (SEF) that avoids the problems of the prior art. It is another object of the invention to provide an adaptive SEF that
  • a cabin communication system incorporating an advantageous adaptive AEC for enhancing speech intelligibility in the moving vehicle.
  • gain control that provides both an overall gain control signal and a dither control signal.
  • one aspect of the present invention is
  • the cabin communication system comprising a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into an audio signal, the audio
  • the speech enhancement filter removing the second component from the audio signal to provide a filtered audio signal
  • the speech enhancement filter removing the second component by processing the audio signal by a method taking into account elements of psycho-acoustics of a human ear, and a loudspeaker for outputting a clarified voice in response to the
  • Another aspect of the present invention is directed to a cabin
  • the cabin communication system comprising an adaptive
  • speech enhancement filter for receiving an audio signal that includes a first component indicative of the spoken voice, a second component indicative of a feedback echo of
  • Enhancement filter filtering the audio signal by removing the third component to
  • the speech enhancement filter adapting to the audio signal at a first adaptation rate
  • an adaptive acoustic echo cancellation system for receiving the filtered audio signal and removing the second component in the filtered
  • the echo cancellation signal to provide an echo-cancelled audio signal, the echo cancellation signal
  • adaptation rate and the second adaptation rate are different from each other so that the speech enhancement filter does not adapt in response to operation of the echo- cancellation system and the echo-cancellation system does not adapt in response to
  • Another aspect of the present invention is directed to an automatic gain
  • the automatic gain control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise
  • the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for
  • the first audio signal to provide a filtered audio signal
  • an acoustic echo canceller for
  • the first automatic gain control signal controlling a first gain of the dither signal supplied to the filter, the control signal
  • a loudspeaker for outputting a reproduced voice in response to the echo-cancelled audio signal with a second gain controlled by the second automatic gain control signal.
  • Another aspect of the present invention is directed to an automatic gain
  • control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the ambient noise intermittently
  • the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for converting
  • the spoken voice and the ambient noise into a first audio signal, the first audio signal
  • a filter for filtering the first audio signal to provide a filtered audio signal
  • a loudspeaker for outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location
  • control signal generating circuit for generating an automatic gain
  • control signal in response to the decision logic, wherein when the decision logic
  • control signal generating circuit decides that the second component corresponds to an undesirable transient signal, the control signal generating circuit generates the automatic gain control signal so as to
  • Another aspect of the present invention is directed to an improved user
  • FIG. 1 is a simplified block diagram of a conventional cabin
  • Fig. 2 is an illustrative drawing of a vehicle incorporating a first
  • Fig. 3 is a block diagram explanatory of the multi-input, multi-output
  • Fig. 4 is an experimentally derived acoustic budget for implementation
  • Fig. 5 is a block diagram of filtering in the present invention.
  • Fig. 6 is a block diagram of the SEF of the present invention.
  • Fig. 7 is a plot of Wiener filtering performance by the SEF of Fig. 6.
  • Fig. 8 is a plot of speech plus noise.
  • Fig. 9 is a plot of the speech plus noise of Fig. 8 after Wiener filtering
  • Fig. 10 is a plot of actual test results.
  • Fig. 11 is a block diagram of an embodiment of the AEC of the present
  • Fig. 12 is a block diagram of a single input-single output CCS with radio cancellation.
  • Fig. 13 illustrates an algorithm for Recursive Least Squares (RLS)
  • Fig. 14 is an illustration of the relative contribution of errors in temperature compensation.
  • Fig. 15 is a first plot of the transfer function from a right rear loudspeaker to a right rear microphone using the AEC of the invention.
  • Fig. 16 is a second plot of the transfer function from a right rear
  • Fig. 17 is a schematic diagram of a first embodiment of the automatic
  • Fig. 18 illustrates an embodiment of a device for generating a first
  • Fig. 19 illustrates an embodiment of a device for generating a second advantageous age signal.
  • Fig. 20 is a schematic diagram of a second embodiment of the
  • Fig. 21 is a schematic diagram illustrating a transient processing
  • Fig. 22 illustrates the determination of a simple threshold.
  • Fig. 23 illustrates the behavior of the automatic gain control for the
  • Fig. 24 is a detail of Fig. 24 illustrating the graceful fade-out.
  • Fig. 25 illustrates the determination of a simple template.
  • Fig. 26 is a schematic diagram of an embodiment of the user interface
  • Fig. 27 is a diagram illustrating the incorporation of the inventive user interface in the inventive CCS.
  • Fig. 28 is a schematic diagram illustrating the interior struction of a
  • FIG. 2 illustrates a first embodiment of the present invention as
  • the mini-van 10 includes a driver's
  • the microphone layout may include a right and a left microphone for each seat.
  • the spoken voice from the location where it originates e.g. the passenger or driver
  • beamformed phase array or more generally, by providing plural microphones whose signals are processed in combination to be more sensitive to the location of the spoken voice, or even more generally to preferentially detect sound from a limited physical
  • the plural microphones can be directional microphones or omnidirectional
  • the system can be any suitable microphones, whose combined signals define the detecting location.
  • the system can be any suitable microphones, whose combined signals define the detecting location.
  • the microphones 18-22 are advantageously located in the headliner 24 of the mini-van 10. Also located within the cabin of the mini-van 10.
  • mini-van 10 are plural loudspeakers 26, 28. While three microphones and two loudspeakers are shown in Fig. 2, it will be recognized that the number of
  • microphones and loudspeakers and their respective locations may be changed to suit any particular cabin layout. If the microphones 18, 20, 22 are directional or form an
  • each will have a respective beam pattern 30, 32, 34 indicative of the direction in
  • the input signals from the microphones 18-22 are all sent to a digital signal
  • DSP signal processor
  • the DSP 36 may
  • FIG. 3 illustrates a block diagram explanatory of elements in this embodiment, having two microphones, mic, and mic 2 , and two loudspeakers 1, and 1,.
  • Microphone mic picks up six signal components, including first voice v, with a
  • Microphone mic also picks up the output s, of
  • loudspeaker 1 with a transfer function of H, , and the output s 2 of loudspeaker 1 2 with a transfer function H 2] .
  • Microphone mic 2 picks up six corresponding signal
  • the microphone signal from microphone mic is echo cancelled (-H,,s,-
  • the total signal at point A in Fig. 3 is (H,,-Hn)s, + (H 2 ,-H 2l )s 2 + V,,v, + V 2 ,v 2 +N,,n, + N 2 ,n 2 .
  • the CCS uses a number of such echo cancellers equal to the
  • the system is run open loop (switches in Fig. 3 are open)
  • the inventors have determined that a minimum of 20 dB SNR provides comfortable intelligibility for front to rear communication in a mini-van.
  • the SNR is measured as
  • the microphones used in a test of the CCS gave a 5 dB SNR at 65 mph. with the SNR decreasing with increasing speed.
  • the system may be designed to provide 20 dB each. Similarly, at least
  • Fig. 4 illustrates an advantageous experimentally derived acoustic budget. The overall
  • the present invention differs from the prior art in
  • Microphone independence can be achieved by small beamforming arrays over each
  • Another aspect of acceptable psycho-acoustics is good voice quality.
  • FIG. 5 is a block diagram of filtering circuitry provided in a CCS
  • the first two elements are
  • the final element is an analog LPF 4-pole filter 46.
  • AGC automatic gain control
  • voice amplification is desirably greater than the natural acoustic attenuation.
  • distinct echos result when the total CCS and audio delays exceed 20 ms.
  • the delays advantageously are limited to 17 ms.
  • weights ⁇ k are advantageously chosen as the inverse of the
  • the system here corresponds to a causal operation (as opposed to the input speech), so that the noise at any instant of
  • the present invention exploits this difference in causality by solving an appropriate causal filtering problem, i.e. a causal Wiener filtering approach.
  • a causal Wiener filtering approach i.e. a causal Wiener filtering approach.
  • straightforward causal filtering has severe drawbacks.
  • Equation (3) with the addition of constraints on causality and minimization of the residual power spectrum.
  • Equation 5 fails to satisfy this requirement. The reason
  • the present invention resolves these problems by formulating a
  • FFT Fast Fourier Transform
  • Variants of Equation (7) can also be used wherein a smoothed weight
  • the reduced-length filter may be of an a priori fixed length, or the length may be adaptive, for example based on the filter
  • the filter may be normalized, e.g. for unity DC gain.
  • Equation 7 the denominator in Equation 7 for the causal filter is an instantaneous value of the power spectrum of the noisy speech signal, and therefore it tends to have a large variance compared to the numerator,
  • the speech signal is weighted with a cos 2 weighting function in
  • H n (f) ⁇ H n (f) + (1 - ⁇ ) H n _,(f)
  • weights, w can be frequency dependent.
  • VAD voice activity detector
  • VAD is not even necessary, since the duration of speech, even when multiple people
  • Fig. 6 illustrates the
  • the noisy speech signal is sampled at a frequency of 5 KHz.
  • a buffer
  • the noisy speech is first mel-filtered in mel-filter 302. This results in improving the SNR at high frequencies.
  • a typical situation is shown in Fig. 7, where mel-filtering with the SEF 300 primarily improves
  • An optimization tool 308 inputs the
  • This causal filter update is
  • causal filter 310 determines the current noise estimate. This noise estimate is
  • Fig. 9 illustrates the corresponding Wiener- filtered speech signal, both for the period of 12 seconds. A comparison of the two
  • the Wiener filter sample window has been increased to 128 points while keeping the
  • Wiener filtered signal is of the order of 15 dB below the noise-only part of the noise
  • Audio recordings were taken at 5 KHz.
  • the reproduced loudspeaker signals had between 24 dB
  • Fig. 10 illustrates the results. Therefore
  • Fig. 11 illustrates a block diagram
  • the signal from microphone 200 is fed to a summer 210, which also receives a processed output signal, so that its output is an error signal (e).
  • the error signal is fed to a multiplier 402.
  • the multiplier also receives a parameter ⁇
  • the multiplier 402 also receives the regressor (x) and produces an output that is added to a feedback output in summer 404, with the sum being fed to a accumulator 406 for storing the coefficients (h) of the transfer
  • the output of the accumulator 406 is the feedback output fed to summer 404. This same output is then fed to a combination delay circuit or Finite Impulse
  • FIR Fast
  • mu controls how fast the AEC 400 adapts. It is an important feature of the present invention that mu is advantageously set in relation to
  • the present invention also recognizes that the AEC 400 does not need to adapt rapidly.
  • the most dynamic aspect of the cabin acoustics found so far is
  • acoustic parameters such as the number and movement of passengers, change
  • the adaptation rate of the echo canceller should be slow.
  • the filtered error sum is monitored until it no longer decreases, where the filtered error sum is a sufficiently Loss Pass Filtered sum of the squared changes in transfer
  • Mu is progressively set smaller while there is no change in the filtered error sum until reaching a sufficiently small value. Then the dither is set to its
  • the actual convergence rate of the LMS filter is made a submultiple of
  • the step size mu for the AEC 400 is set to 0.01, based on empirical studies.
  • beta is one of the overall limiting parameters of the CCS, since it controls the rate of adaptation of the long term noise estimate. It has been found that it is important for good CCS performance that beta and mu be related as:
  • k is the value of the variable update-every for the AEC 400 (2 in
  • n is the number of samples accumulated before block processing by
  • Wiener filter noise estimate should be outside the range of these parameters.
  • the adaptive algorithms must be separated in rate as much as possible.
  • n(t) is the noise
  • s(t) is the speech signal from a passenger, i.e. the spoken voice, received at the microphone
  • FI is the acoustic transfer function
  • u is a function of past values of s and n.
  • n(t) could be correlated with u(t).
  • s(t) is colored for the time scale of interest, which implies again that u(t) and
  • the first step is to cancel the signal from the car stereo system, since the radio signal can be directly measured.
  • the only unknown is the gain, but this can be estimated using any estimator, such as a conventional single tap
  • Fig. 12 illustrates the single input-single output CCS with radio cancellation.
  • the CCS 500 includes a microphone 200 with the input signal s(t)
  • the CCS 500 also includes an
  • the output of the second summer 508 is also the signal u(t) fed to
  • the random noise is input at summer 508 to provide a known source of uncorrelated noise.
  • This random noise r(t) is used as a
  • the random noise r(t) is entered as a dither signal.
  • random dither is independent of both noise and speech. Moreover, since it is a known signal, it is removed, or blocked, by the Wiener SEF 300. As a result, identification of the system can now be performed based on the dither signal, since the system looks
  • the dither signal must be sufficiently small so that it does not introduce objectionable noise into the acoustic environment, but at the
  • the dither volume is adjusted by the same automatic volume control used to modify the CCS volume control.
  • RLS recursive least squares
  • SEF is the speech extraction filter 300 and d accounts for time
  • d is a truncation operator that extracts the d impulse response coefficients and sets the others to zero, and d is less than the filter delay plus the
  • Equation 14 The last three terms in Equation 14 are uncorrelated from the first term, which is the required feature. It should also be noted that only the first d coefficients
  • coefficients of H The coefficients from d+1 onwards can either be processed in a
  • H 2d tH H 2d t + ⁇ ⁇ r d t . d (y[fj - (u d t )H d , - (u 2d , d )H 2d t )
  • H 2d tH denotes the update at time t+1.
  • H 2d tM is a column vector of the acoustic transfer function H containing the coefficients from d to 2d- 1.
  • u d t denotes a column vector [u[t], u[t-l]....,u[t-d+l]]'.
  • H 3d t is estimated in a
  • d is advantageously between 10 and 40.
  • c is the speed of sound.
  • the transfer function at a frequency ⁇ can be estimated using any of
  • Fig. 8 illustrates the noisy speech
  • Fig. 9 illustrates the corresponding Wiener-filtered speech signal, both for the period of 12 seconds.
  • a comparison of the two plots demonstrates substantial noise attenuation. Also tested was a dBase implementation of the algorithm in which the
  • Wiener filter sample window has been increased to 128 points while keeping the
  • One such aspect relates to an improved AGC in accordance with the present invention that is particularly appropriate in a CCS
  • the present invention provides a novel and
  • sounds include, for example, sounds from the entertainment system (radio, CD player or even
  • the present invention provides an advantageous way to
  • both the SEF 300 and the AEC 400 are used in combination with the AGC in accordance with the present invention, although the use
  • the AGC 600 receives two input signals: a signal gain-pot
  • signal age-signal 604 which is a signal from the vehicle control system that is
  • the AGC 600 represents a further aspect of the present invention.
  • FIG. 1 is similar to Fig. 1, but shows the use of the SEF 300 and the AEC 400, as well as the addition of a noise estimator 700 that generates the age-signal 604. As shown in Fig. 1, but shows the use of the SEF 300 and the AEC 400, as well as the addition of a noise estimator 700 that generates the age-signal 604. As shown in Fig.
  • the age-signal 604 is generated in noise estimator 700 from a noise output of the SEF 300. As described above in connection with Fig. 6, the primary output signal
  • output from filter F 0 312 is the speech signal from which all noise has been
  • This current noise estimate is illustrated as noise 702 in Fig. 18.
  • noise 702 is an improvement for this purpose over noise estimates in prior art systems in that it reflects the superior noise estimation of the SEF 300, with the speech effectively removed. It further reflects the advantageous operation of the AEC 400
  • this output includes speech content
  • the present invention goes beyond the improved noise estimation that would occur if the noise 702 were used for the age-signal 604 by
  • noise 702 which is a feedback signal
  • such feed forward signals advantageously include a speed signal 704 from a speed sensor (not illustrated) and/or a window position signal 706 from a window position sensor (not illustrated).
  • one or more windows are opened.
  • a superior age-signal 604 can be generated as the output 708 of noise estimator 700.
  • the superior AGC signal may actually decrease the system gain
  • the age-signal 604 is considered to be the desired one of the noise 702 and the output 708.
  • the structure of the AGC 600 is itself novel and unobvious and constitutes an aspect of the present invention, it is possible to alternatively use a more conventional signal, such as the
  • the age-signal 604 is then processed, advantageously in
  • the age-signal 604 is, by its very nature, noisy. Therefore, it is first
  • AGC-LIMTT in a limiter 610.
  • AGC-LIMIT is 0.8 on a scale of zero to one. Then the signal is filtered with a one-
  • this filter should be fast enough to track vehicle speed changes, but slow enough that the variation of the filtered signal does not introduce noise by amplitude modulation.
  • a suitable value for ALPHA-AGC is 0.0001.
  • the output of the filter 612 is the filt-
  • agc-signal is used both to modify the overall system gain and to provide automatic gain control for the dither signal, as discussed above.
  • the filt-agc-signal is used both to modify the overall system gain and to provide automatic gain control for the dither signal, as discussed above.
  • This linear function has a slope of AGC-GAIN,
  • AGC-GAIN 0.8. The result is a signal age, which advantageously
  • This component is formed by filtering the signal gain-pot 602 from the user's volume control. Like age-signal 604, gain-pot 602 is very noisy and therefore is filtered in low-pass filter 618 under the control of variable ALPHA-GAIN-POT.
  • ALPHA-GAIN-POT suitable value for ALPHA-GAIN-POT is 0.0004.
  • the filtered output is stored in the
  • variable var-gain The overall front to rear gain is the product of the variable var-gain
  • variable gain-r (not shown).
  • a suitable value for gain-r is 3.0.
  • the overall rear to front gain (not shown) is the product of the variable var-gain and a
  • variable gain-f also having a suitable value of 3.0 in consideration of power amplifier
  • the overall system gain 606 is formed by multiplying, in multiplier 620, the var-gain output from filter 618 by the signal age output from the summer 616.
  • the gain control signal rand-val 608 for the dither signal is similarly
  • a suitable value for rand-val-mult is 45.
  • the output of summer 624 is multiplied by variable rand-amp, a suitable value of which is 0.0001.
  • the result is the signal rand-val 608.
  • the AGC 600 is tuned by setting appropriate values for AGC-LIMIT and ALPHA-AGC based on the analog AGC hardware and the electrical noise. In the
  • rand-val for the dither signal is further tuned by setting rand-amp and rand-val-mult.
  • first rand-amp is set to the largest value that is imperceptible in system on/off under open loop, idle, windows and doors
  • variable rand-val-mult is set to the largest value that is
  • rand-amp 0.0001
  • rand-val-mult 45
  • Fig. 19 illustrates the generation of the signal-age by a quadratic
  • the filt-agc-signal from low pass filter 612 in Fig. 17 is multiplied in multiplier 628 by AGC-GAIN and added, in summer 630, to one.
  • summer 630 also adds to these terms a filt-agc-signal squared term from square multiplier 632
  • structure implements a preferred age signal that is a quadratic function of the filt-agc-
  • the interior noise of a vehicle cabin is influenced by ambient factors
  • interior noise further depends on unpredictable factors such as rain and nearby traffic.
  • estimator 700 of Fig. 18 may be modified to accept inputs such as Door Open and
  • the Door Open signal (e.g. one for each
  • the Window Open signal (e.g. one for each window) are used to increase the
  • Fig. 20 is an illustration of the uses of the input from the SEF 300 to
  • the SEF 300 can operate for each microphone to enhance speech by
  • the noise estimator accepts the instantaneous noise estimates for each microphone, integrates them in integrators 750a, 750b, ...750i and
  • weights in multipliers 752a, 752b,...752i are preferably precomputed to compensate for individual microphone volume and local noise conditions, but the weights could be computed adaptively at the expense of additional computation.
  • weighted noise estimates are then added in adder 754 to calculate a cabin ambient
  • the cabin ambient noise estimate is compared to the noise level
  • the SEF 300 provides excellent noise removal in part by treating the noise as being of
  • noise elements that are of relatively short duration, comparable to the speech components, for example the sound of the mini-van's tire
  • transient signal detection techniques consisting of parameter estimation and
  • the parameter estimation and decision logic includes
  • the system shuts off adaptation for a suitable length of time corresponding to the duration of the transient and the associated cabin ring-down time and the system outputs (e.g. the
  • fade-in is accomplished by any suitable smooth transition, e.g. by an exponential or
  • one threshold might represent the maximum decibel level for any speech
  • This parameter might be used to identify any speech component exceeding this decibel level as an undesirable
  • a group of parameters might establish a template to identify
  • the sound of the wheel hitting a pothole might be characterized by a certain duration, a certain band of frequencies and a certain amplitude envelope. If these characteristics can be adequately described by a
  • thresholds and templates are mentioned as specific examples, it will be apparent to those of ordinary
  • Fig. 21 illustrates the overall operation of the transient processing system 800 in accordance with the present invention.
  • signals from the microphones in the cabin are provided to a parameter estimation processor 802. It will be recalled that the outputs of the loudspeakers will reflect the content of the sounds picked up by the microphones to the extent that those sounds are not
  • the processor 802 Based on these signals, the processor 802
  • transient noise to be handled by fading-out the loudspeaker outputs.
  • Such parameters may be determined either from a single sampling of the microphone signals at one
  • One or more such parameters for example a parameter based on a
  • the parameters may be updated continuously, at set time intervals, or in response
  • decision logic 804 which applies these parameters to actually decide whether a sound is the undesirable transient or not. For example, if one parameter is a maximum
  • the decision logic 804 can decide that the sound is an
  • the decision logic 804 can decide that the
  • decision in decision logic 804 can be based upon a
  • time history comparisons may include differential (spike) techniques, integral (energy) techniques, frequency domain techniques and time-frequency techniques, as well as
  • transient may additionally or alternatively be based on the loudspeaker signals.
  • loudspeaker signals would be provided to a parameter estimation processor 806 for
  • processor 806 would ordinarily be generally similar to, or identical to, the structure
  • processor 802 although different parameter estimations may be appropriate to take into account the specifics of the microphones or loudspeakers, for example. Similarly,
  • decision logic 808 would ordinarily be similar to, or identical to,
  • transient is not limited to fade-out.
  • a simple threshold is shown in Fig. 22. For this determination, a recording is made of the loudest voice signals for normal
  • Fig. 22 shows the microphone signals for such a recording.
  • This example signal consists of a loud, undesirable noise followed by a loud, acceptable
  • a threshold is chosen such that the loudest voice falls below the
  • AGC activation may be chosen empirically, as in the example at 1.5 times the maximum level of speech, or it may by determined statistically to balance incorrect AGC activation
  • FIG. 23 The undesirable noise rapidly exceeds the threshold and is eliminated by the AGC.
  • FIG. 24 A detail of the AGC graceful shutdown from Fig. 23 is shown in Fig. 24, wherein the microphone signal is multiplied by a factor at each
  • Another example of a threshold is provided by comparing the absolute value
  • the microphone is 4 th order Butterworth bandpass limited between 300 Hz and 3 KHz.
  • the maximum the bandpassed signal can change is approximately 43% of the largest acceptable step change input to the bandpass filter.
  • a difference between successive samples that exceeds a threshold of 0.43 should activate the AGC. This threshold may also be determined empirically, since normal voice signals rarely
  • loudspeaker signal containing speech exhibits a characteristic power spectrum, as seen
  • the power spectrum is determined from a short time
  • the template in this example is determined as a Lognormal
  • the template in this example causes AGC activation for tonal noise or broadband noise particularly above about 1.8 KHz.
  • a transient is detected when any microphone or loudspeaker voltage reaches init-mic-threshold or init-spkr-threshold, respectively.
  • the thresholds should be set to preclude any sounds above the maximum
  • This number of samples is defined by a variable adapt-off-count, and
  • This ring down time is
  • TAPS is the length of time it takes for the mini-van to ring down when the sample rate is F s . For an echo to decay 20 dB, this was found to be approximately 40 ms. TAPS increases linearly with F s .
  • TAPS represents the size of the Least Mean
  • variable adapt-off-count is reset to 2*TAPS if multiple transients occur. At the end of a transient, the SEF 300 is also reset. Finally, when the output is being shut off due to a transient (fade-out),
  • a parameter OUTPUT-DECAY-RATE is used as a multiplier of the loudspeaker value
  • a suitable value is 0.8, which provides an exponential decay that
  • a corresponding ramp-on at the end of the transient may also be provided for fade-in.
  • the advantageous AGC provides improved control to aid voice clarity and preclude the amplification of undesirable noises.
  • aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS.
  • user interface enables customized use of the plural microphones and loudspeakers.
  • the CCS employing the SEF 300 and the AEC 400, wherein superior microphone independence, echo cancellation and noise elimination are provided.
  • the CCS of the present invention provides plural
  • microphones including, for example, one directed to pick up speech from the driver's
  • the CCS may provide a respective loudspeaker for each of the driver's seat and the passengers' seats to provide an output directed to the person in the seat. Accordingly, since the
  • the advantageous user interface of the present invention enables such an operation.
  • loudspeakers may be adjusted, or the pickup of a microphone may be reduced to give the occupant of the respective seat more privacy.
  • the pickup of one microphone might be supplied for output to only a selected one or more of the
  • a recorder may be actuated from the various seats to
  • one or more of the cabin's occupants can participate in a hands-free telephone call without
  • Fig. 26 illustrates the overall structure of the user interface in accordance with the present invention. As shown therein, each position within the cabin can have its own subsidiary interface, with the subsidiary interfaces being
  • the overall interface 900 includes a front interface
  • middle interfaces may be provided, or each of the front, middle and rear interfaces may be formed as respective left and right
  • the front interface 910 includes a manual control 912 for recording a
  • the rear interface 930 correspondingly includes a manual control 932 for recording a voice memo, a manual control 934 for playing back the voice memo, a manual control 936 for talking from the rear of the cabin to the front of the cabin, a
  • the middle interface 950 has a corresponding construction, as do any
  • Fig. 27 The incorporation of the user interface 900 in the CCS is illustrated in Fig. 27, wherein the elements of the user interface are contained in box 960 (labeled “Kl "), box 962 (labeled “K2”) and box 964 (labeled "Voice Memo”).
  • connections may advantageously be entirely symmetric for any number of users.
  • a two input, two output vehicle system such as the one in Fig. 3 and the one in Fig.
  • the structure is symmetric from front to back and from back to front.
  • this symmetry holds for any number of inputs and outputs. It
  • K2 962 and the lower half of Voice Memo 964 are symmetrically identical thereto.
  • this output is fed to an amplifier 1002 with a fixed gain Kl.
  • the output of amplifier 1002 is connected to a summer 1004 under the control of a user interface three-way switch 1006. This switch 1006 allows or disallows connection of voice
  • user interface switch control 936 allows or disallows connection of voice from front to rear.
  • the most recently operated switch control has precedence in allowing or
  • the output of the summer 1004 is connected to the volume control
  • volume control 920 which is in the form of a variable amplifier for effecting volume control for a user in the rear position.
  • This volume control 920 is limited by a gain limiter 1010 to
  • the output of the amplifier 1002 may also be sent to a cell phone via
  • control 922 When activated, an amplified and noise filtered voice from the front microphone is sent to the cell phone for transmission to a remote receiver.
  • cell phone signals may be routed to the rear via control 942.
  • control 942 In a preferred embodiment
  • the Voice Memo function consists of user interface controls, control
  • the voice storage device 1014 is a digital random access memory (RAM).
  • RAM digital random access memory
  • EEPROM electrically erasable programmable read-only memory
  • ferro-electric digital memory devices may be used if preservation of the stored voice is desired in the event of a power loss.
  • the voice storage control logic 1012 operates under user interface
  • control 934 a voice message stored in the voice storage device 1014.
  • the activation of control 912 stores the current digital voice sample from the front microphone in the voice storage device at an address specified by an address
  • the activation of the playback control 934 rests the address counter, reads the voice sample at the counter's address for output via a summer 1016 to the rear
  • loudspeaker increments the address counter and checks for more voice samples
  • the voice storage logic 1012 allows the storage of logically separate
  • the symmetric controls allow any user to record and playback from his own location.
  • the voice storage logic 1012 may also provide feedback to the use of

Abstract

An automatic gain control for a cabin communication system improves the clarity of voice spoken within a movable interior cabin. The automatic gain control includes a microphone (102) for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into a first audio signal, the first audio signal including a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a parameter estimation processor (802 and 806) for receiving the first audio signal and for determining parameters for deciding whether or not the second component corresponds to an undersirable transient noise, decision logic for deciding (804 and 808), based on the parameters, whether or not the second component corresponds to an undersirable transient signal, a filter (300) for filtering the first audio signal to provide a filteres audio signal, a loudspeaker (104) for outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location in the cabin, and a control signal generating circuit for generating an automatic gain control signal in response to the decision logic.

Description

TRANSIENT PROCESSING FOR COMMUNICATION SYSTEM
TECHNICAL FIELD The present invention relates to improvements in voice amplification and clarification in a noisy environment, such as a cabin communication system, which enables a voice spoken within the cabin to be increased in volume for improved understanding while minimizing any unwanted noise amplification. The present invention also relates to a movable cabin that advantageously includes such a cabin communication system for this purpose. In this regard, the term "movable cabin" is intended to be embodied by a car, truck or any other wheeled vehicle, an airplane or helicopter, a boat, a railroad car and indeed any other enclosed space that is movable and wherein a spoken voice may need to be amplified or clarified.
BACKGROUND ART As anyone who has ridden in a mini-van, sedan or sport utility vehicle will know, communication among the passengers in the cabin of such a vehicle is difficult. For example, in such a vehicle, it is frequently difficult for words spoken by, for example, a passenger in a back seat to be heard and understood by the driver, or vice versa, due to the large amount of ambient noise caused by the motor, the wind, other vehicles, stationary structures passed by etc., some of which noise is caused by the movement of the cabin and some of which occurs even when the cabin is stationary, and due to the cabin acoustics which may undesirably amplify or damp out different sounds. Even in relatively quiet vehicles, communication between passengers is a problem due to the distance between passengers and the intentional use of sound-absorbing materials to quiet the cabin interior. The communication problem may be compounded by the simultaneous use of high-fidelity stereo systems for entertainment.
To amplify the spoken voice, it may be picked up by a microphone and played back by a loudspeaker. However, if the spoken voice is simply picked up and played back, there will be a positive feedback loop that results from the output of the loudspeaker being picked up again by the microphone and added to the spoken voice to be once again output at the loudspeaker. When the output of the loudspeaker is substantially picked up by a microphone, the loudspeaker and the microphone are said to be acoustically coupled. To avoid an echo due to the reproduced voice itself, an echo cancellation apparatus, such as an acoustic echo cancellation apparatus, can be coupled between the microphone and the loudspeaker to remove the portion of the picked-up signal corresponding to the voice component output by the loudspeaker. This is possible because the audio signal at the microphone corresponding to the original spoken voice is theoretically highly correlated to the audio signal at the microphone corresponding to the reproduced voice component in the output of the loudspeaker. One advantageous example of such an acoustic echo cancellation apparatus is described in commonly-assigned U.S. Patent Application No. 08/868,212. Another advantageous acoustic echo cancellation apparatus is described hereinbelow. On the other hand, any reproduced noise components may not be so highly correlated and need to be removed by other means. However, while systems for noise reduction generally are well known, enhancing speech intelligibility in a noisy cabin environment poses a challenging problem due to constraints peculiar to this environment. It has been determined in developing the present invention that the challenges arise principally, though not exclusively, from the following five causes. First, the speech and noise occupy the same bandwidth, and therefore cannot be separated by band-limited filters. Second, different people speak differently, and therefore it is harder to properly identify the speech components in the mixed signal. Third, the noise characteristics vary rapidly and unpredictably, due to the changing sources of noise as the vehicle moves. Fourth, the speech signal is not stationary, and therefore constant adaptation to its characteristics is required. Fifth, there are psycho- acoustic limits on speech quality, as will be discussed further below. One prior art approach to speech intelligibility enhancement is filtering. As noted above, since speech and noise occupy the same bandwidth, simple band- limited filtering will not suffice. That is, the overlap of speech and noise in the same frequency band means that filtering based on frequency separation will not work. Instead, filtering may be based on the relative orthogonality between speech and noise waveforms. However, the highly non-stationary nature of speech necessitates adaptation to continuously estimate a filter to subtract the noise. The filter will also depend on the noise characteristics, which in this environment are time-varying on a slower scale than speech and depend on such factors as vehicle speed, road surface and weather.
Fig. 1 is a simplified block diagram of a conventional cabin communication system (CCS) 100 using only a microphone 102 and a loudspeaker 104. As shown in the figure, an echo canceller 106 and a conventional speech enhancement filter (SEF)108 are connected between the microphone 102 and loudspeaker 104. A summer 110 subtracts the output of the echo canceller 106 from the input of the microphone 102, and the result is input to the SEF 108 and used as a control signal therefor. The output of the SEF 108, which is the output of the loudspeaker 26, is the input to the echo canceller 106. In the echo canceller 106, online identification of the transfer function of the acoustic path (including the loudspeaker 104 and the microphone 102) is performed, and the signal contribution from the acoustic path is subtracted.
In a conventional acoustic echo and noise cancellation system, the two problems of removing echos and removing noise are addressed separately and the loss in performance resulting from coupling of the adaptive SEF and the adaptive echo canceller is usually insignificant. This is because speech and noise are correlated only over a relatively short period of time. Therefore, the signal coming out of the loudspeaker can be made to be uncorrelated from the signal received directly at the microphone by adding adequate delay into the SEF. This ensures robust identification of the echo canceller and in this way the problems can be completely decoupled. The delay does not pose a problem in large enclosures, public address systems and telecommunication systems such as automobile hands-free telephones. However, it has been recognized in developing the present invention that the acoustics of relatively smaller movable cabins dictate that processing be completed in a relatively short time to prevent the perception of an echo from direct and reproduced paths. In other words, the reproduced voice output from the loudspeaker should be heard by the listener at substantially the same time as the original voice from the speaker is heard.
In particular, in the cabin of a moving vehicle, the acoustic paths are such that an addition of delay beyond approximately 20ms will sound like an echo, with one version coming from the direct path and another from the loudspeaker. This puts a limit on the total processing time, which means a limit both on the amount of delay and on the length of the signal that can be processed.
Thus, conventional adaptive filtering applied to a cabin communication system may reduce voice quality by introducing distortion or by creating artifacts such as tones or echos. If the echo cancellation process is coupled with the speech extraction filter, it becomes difficult to accurately estimate the acoustic transfer functions, and this in turn leads to poor estimates of noise spectrum and consequently poor speech intelligibility at the loudspeaker. An advantageous approach to overcoming this problem is disclosed below, as are the structure and operation of an advantageous adaptive SEF.
Several adaptive filters are known for use in the task of speech intelligibility enhancement. These filters can be broadly classified into two main categories: (1) filters based on a Wiener filtering approach and (2) filters based on the method of spectral subtraction. Two other approaches, i.e. Kalman filtering and H- infinity filtering, have also been tried, but will not be discussed further herein.
Spectral subtraction has been subjected to rigorous analysis, and it is well known, at least as it currently stands, not to be suitable for low SNR (signal-to- noise) environments because it results in "musical tone" artifacts and in unacceptable degradation in speech quality. The movable cabin in which the present invention is intended to be used is just such a low SNR environment.
Accordingly, the present invention is an improvement on Wiener filtering, which has been widely applied for speech enhancement in noisy environments. The Wiener filtering technique is statistical in nature, i.e. it constructs the optimal linear estimator (in the sense of minimizing the expected squared error) of an unknown desired stationary signal, n. from a noisy observation, y, which is also stationary. The optimal linear estimator is in the form of a convolution operator in the time domain, which is readily converted to a multiplication in the frequency domain. In the context of a noisy speech signal, the Wiener filter can be applied to estimate noise, and then the resulting estimate can be subtracted from the noisy speech to give an estimate for the speech signal.
To be concrete, let y be the noisy speech signal and let the noise be n. Then Wiener filtering requires the solution, h, to the following Wiener-Hopf equation:
Rny(t) = ∑ h(s) R^t-s) s— -°°
-0)
Here, Rny is the cross-correlation matrix of the noise-only signal with
the noisy speech, Ryy is the auto-correlation matrix of the noisy speech, and h is the
Wiener filter.
Although this approach is mathematically correct, it is not immediately
amenable to implementation. First, since speech and noise are uncorrelated, the cross-
correlation between n and y, i.e. Rny, is the same as the auto-correlation of the noise,
R. Second, both noise and speech are non-stationary, and therefore the infinite- length cross-correlation of the solution of Equation 1 is not useful. Obviously, infinite data is not available, and furthermore the time constraint of echo avoidance applies.
Therefore, the following truncated equation is solved instead:
m Rnn(t) = I h(s) Ryy(t-s) s=l -m
...(2)
Here, m is the length of the data window. This equation can be readily solved in the frequency domain by taking
Fourier Transforms, as follows:
Snn( = H(f)Syy(f)
...(3)
Here, Snn and Syv are the Fourier Transforms, or equivalently the power
spectral densities (PSDs), of the noise and the noisy speech signal, respectively. The auto-correlation of the noise can only be estimated, since there is no noise-only signal.
However, there are problems in this approach, which holds only in an
approximate sense. First, the statistics of noise have to be continuously updated.
Second, this approach fails to take into account the psycho-acoustics of the human ear,
which is extremely sensitive to processing artifacts at even extremely low decibel levels. Neither does this approach take into account the anti-causal nature of speech
or the relative stationarity of the noise. While several existing Wiener filtering
techniques make use of ad hoc, non-linear processing of the Wiener filter coefficients
in the hope of maintaining and improving speech intelligibility, these techniques do
not work well and do not effectively address the practical problem of interfacing a Wiener filtering technique with the psycho-acoustics of speech.
As noted above, another aspect of the present invention is directed to
the structure and operation of an advantageous adaptive acoustic echo canceller (AEC) for use with an SEF as disclosed herein. Of course, other adaptive SEFs may be used in the present invention provided they cooperate with the advantageous echo canceller in the manner disclosed below.
To realistically design a cabin communication system (CCS) that is
appropriate for a relatively small, movable cabin, it has been recognized that the echo cancellation has to be adaptive because the acoustics of a cabin change due to temperature, humidity and passenger movement. It has also been recognized that
noise characteristics are also time varying depending on several factors such as road
and wind conditions, and therefore the SEF also has to continuously adapt to the changing conditions. A CCS couples the echo cancellation process with the SEF. The present invention is different from the prior art in in addressing the coupled on-line
identification and control problem in a closed loop.
There are other aspects of the present invention that contribute to the
improved functioning of the CCS. One such aspect relates to an improved AGC in accordance with the present invention controls amplification volume and related functions in the CCS, including the generation of appropriate gain control signals for
overall gain and a dither gain and the prevention of amplification of undesirable
transient signals.
It is well known that it is necessary for customer comfort, convenience
and safety to control the volume of amplification of certain audio signals in audio communication systems such as the CCS. Such volume control should have an
automatic component, although a user's manual control component is also desirable.
The prior art recognizes that any microphone in a cabin will detect not only the ambient noise, but also sounds purposefully introduced into the cabin. Such sounds include, for example, sounds from the entertainment system (radio, CD player or even
movie soundtracks) and passengers' speech. These sounds interfere with the
microphone's receiving just a noise signal for accurate noise estimation.
Prior art AGC systems failed to deal with these additional sounds adequately. In particular, prior art AGC systems would either ignore these sounds or attempt to compensate for the sounds. In contrast, the present invention provides an
advantageous way to supply a noise signal to be used by the AGC system that has had
these additional noises eliminated therefrom.
A further aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS.
In particular, while the CCS is intended to incorporate sufficient automatic control to
operate satisfactorily one the initial settings are made, it is of course desirable to
incorporate various manual controls to be operated by the driver and passengers to customize its operation. In this aspect of the present invention, the user interface enables customized use of the plural microphones and loudspeakers.
OBJECTS AND SUMMARY OF THE INVENTION
Accordingly, it is an object of the invention to provide an adaptive speech extraction filter (SEF) that avoids the problems of the prior art. It is another object of the invention to provide an adaptive SEF that
interfaces Wiener filtering techniques with the psycho-acoustics of speech.
It is yet another object of the invention to provide an adaptive SEF that
is advantageously used in a cabin communication system of a moving vehicle.
It is a further object of the invention to provide a cabin communication
system incorporating an advantageous adaptive SEF for enhancing speech
intelligibility in a moving vehicle.
It is yet a further object of the invention to provide moving vehicle including a cabin communication system incorporating an advantageous adaptive SEF for enhancing speech intelligibility in the moving vehicle. It is still a further object of the invention to provide a cabin
communication system with an adaptive SEF that increases intelligibility and ease of
passenger communication with little or no increase in ambient noise.
It is even a further object of the present invention to provide a cabin
communication system with an adaptive SEF that provide acceptable psychoacoustics,
ensures passenger comfort by not amplifying transient sounds and does not interfere
with audio entertainment systems.
It is also an object of the invention to provide an adaptive AEC that avoids the problems of the prior art. It is another object of the invention to provide an adaptive AEC that
interfaces with adaptive Wiener filtering techniques.
It is yet another object of the invention to provide an adaptive AEC that is advantageously used in a cabin communication system of a moving vehicle.
It is a further object of the invention to provide a cabin communication
system incorporating an advantageous adaptive AEC for enhancing speech
intelligibility in a moving vehicle.
It is yet a further object of the invention to provide a moving vehicle
including a cabin communication system incorporating an advantageous adaptive AEC for enhancing speech intelligibility in the moving vehicle.
It is still a further object of the invention to provide a cabin
communication system with an adaptive AEC that increases intelligibility and ease of passenger communication with little or no increase in ambient noise or echos. It is even a further object of the present invention to provide a cabin communication system with an adaptive AEC that does not interfere with audio
entertainment systems.
It is also an object of the present invention to provide an automatic gain
control that avoids the difficulties of the prior art.
It is another object of the present invention to provide an automatic
gain control that provides both an overall gain control signal and a dither control signal.
It is yet another object of the present invention to provide an automatic
gain control that precludes the amplification or reproduction of undesirable transient
sounds.
It is also an object of the present invention to provide a user interface that facilitates the customized use of the inventive cabin communication system.
In accordance with these objects, one aspect of the present invention is
directed to a cabin communication system for improving clarity of a voice spoken
within an interior cabin having ambient noise, the cabin communication system comprising a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into an audio signal, the audio
signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise, a speech enhancement filter for
removing the second component from the audio signal to provide a filtered audio signal, the speech enhancement filter removing the second component by processing the audio signal by a method taking into account elements of psycho-acoustics of a human ear, and a loudspeaker for outputting a clarified voice in response to the
filtered audio signal.
Another aspect of the present invention is directed to a cabin
communication system for improving clarity of a voice spoken within an interior
cabin having ambient noise, the cabin communication system comprising an adaptive
speech enhancement filter for receiving an audio signal that includes a first component indicative of the spoken voice, a second component indicative of a feedback echo of
the spoken voice and a third component indicative of the ambient noise, the speech
enhancement filter filtering the audio signal by removing the third component to
provide a filtered audio signal, the speech enhancement filter adapting to the audio signal at a first adaptation rate, and an adaptive acoustic echo cancellation system for receiving the filtered audio signal and removing the second component in the filtered
audio signal to provide an echo-cancelled audio signal, the echo cancellation signal
adapting to the filtered audio signal at a second adaption rate, wherein the first
adaptation rate and the second adaptation rate are different from each other so that the speech enhancement filter does not adapt in response to operation of the echo- cancellation system and the echo-cancellation system does not adapt in response to
operation of the speech enhancement filter.
Another aspect of the present invention is directed to an automatic gain
control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for
converting the spoken voice and the ambient noise into a first audio signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a filter for removing the second component from
the first audio signal to provide a filtered audio signal, an acoustic echo canceller for
receiving the filtered audio signal in accordance with a supplied dither signal and
providing an echo-cancelled audio signal, a control signal generating circuit for
generating a first automatic gain control signal in response to a noise signal that corresponds to a current speed of the cabin, the first automatic gain control signal controlling a first gain of the dither signal supplied to the filter, the control signal
generating circuit also for generating a second automatic gain control signal in
response to the noise signal, and a loudspeaker for outputting a reproduced voice in response to the echo-cancelled audio signal with a second gain controlled by the second automatic gain control signal.
Another aspect of the present invention is directed to an automatic gain
control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the ambient noise intermittently
including an undesirable transient noise, the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for converting
the spoken voice and the ambient noise into a first audio signal, the first audio signal
including a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a parameter estimation processor for
receiving the first audio signal and for determining parameters for deciding whether or not the second component corresponds to an undesirable transient noise, decision
logic for deciding, based on the parameters, whether or not the second component
corresponds to an undesirable transient signal, a filter for filtering the first audio signal to provide a filtered audio signal, a loudspeaker for outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location
in the cabin, and a control signal generating circuit for generating an automatic gain
control signal in response to the decision logic, wherein when the decision logic
decides that the second component corresponds to an undesirable transient signal, the control signal generating circuit generates the automatic gain control signal so as to
gracefully set the gain of the loudspeaker to zero for fade-out.
Another aspect of the present invention is directed to an improved user
interface installed in the cabin for improving the ease and flexibility of the CCS
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of the
preferred embodiments taken in connection with the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a simplified block diagram of a conventional cabin
communication system.
Fig. 2 is an illustrative drawing of a vehicle incorporating a first
embodiment of the present invention.
Fig. 3 is a block diagram explanatory of the multi-input, multi-output
interaction of system elements in accordance with the embodiment of Fig. 2. Fig. 4 is an experimentally derived acoustic budget for implementation
of the present invention.
Fig. 5 is a block diagram of filtering in the present invention. Fig. 6 is a block diagram of the SEF of the present invention. Fig. 7 is a plot of Wiener filtering performance by the SEF of Fig. 6. Fig. 8 is a plot of speech plus noise.
Fig. 9 is a plot of the speech plus noise of Fig. 8 after Wiener filtering
by the SEF of Fig. 6.
Fig. 10 is a plot of actual test results.
Fig. 11 is a block diagram of an embodiment of the AEC of the present
invention..
Fig. 12 is a block diagram of a single input-single output CCS with radio cancellation.
Fig. 13 illustrates an algorithm for Recursive Least Squares (RLS)
block processing in the AEC.
Fig. 14 is an illustration of the relative contribution of errors in temperature compensation.
Fig. 15 is a first plot of the transfer function from a right rear loudspeaker to a right rear microphone using the AEC of the invention.
Fig. 16 is a second plot of the transfer function from a right rear
loudspeaker to a right rear microphone using the AEC of the invention.
Fig. 17 is a schematic diagram of a first embodiment of the automatic
gain control in accordance with the present invention.
Fig. 18 illustrates an embodiment of a device for generating a first
advantageous age signal.
Fig. 19 illustrates an embodiment of a device for generating a second advantageous age signal.
Fig. 20 is a schematic diagram of a second embodiment of the
automatic gain control in accordance with the present invention. Fig. 21 is a schematic diagram illustrating a transient processing
system in accordance with the present invention.
Fig. 22 illustrates the determination of a simple threshold.
Fig. 23 illustrates the behavior of the automatic gain control for the
signal and threshold of Fig. 22.
Fig. 24 is a detail of Fig. 24 illustrating the graceful fade-out.
Fig. 25 illustrates the determination of a simple template. Fig. 26 is a schematic diagram of an embodiment of the user interface
in accordance with the present invention.
Fig. 27 is a diagram illustrating the incorporation of the inventive user interface in the inventive CCS.
Fig. 28 is a schematic diagram illustrating the interior struction of a
portion of the interface unit of Fig. 26.
BEST MODE OF CARRYING OUT THE INVENTION
Before addressing the specific mathematical implementation of the
SEF in accordance with the present invention, it is helpful to understand the context
wherein it operates. Fig. 2 illustrates a first embodiment of the present invention as
implemented in a mini-van 10. As shown in Fig. 2, the mini-van 10 includes a driver's
seat 12 and first and second passenger seats 14, 16. Associated with each of the seats is a respective microphone 18, 20, 22 adapted to pick up the spoken voice of a passenger sitting in the respective seat. Advantageously, but not necessarily, the microphone layout may include a right and a left microphone for each seat. In
developing the present invention, it has been found that it is advantageous in enhancing the clarity of the spoken voice to use two or more microphones to pick up
the spoken voice from the location where it originates, e.g. the passenger or driver
seat, although a single microphone for each user may be provided within the scope of
the invention. This can be achieved by beamforming the microphones into a
beamformed phase array, or more generally, by providing plural microphones whose signals are processed in combination to be more sensitive to the location of the spoken voice, or even more generally to preferentially detect sound from a limited physical
area. The plural microphones can be directional microphones or omnidirectional
microphones, whose combined signals define the detecting location. The system can
use the plural signals in processing to compensate for differences in the responses of the microphones. Such differences may arise, for example, from the different travel paths to the different microphones or from different response characteristics of the
microphones themselves. As a result, omnidirectional microphones, which are
substantially less expensive than directional microphones or physical beamformed
arrays, can be used. When providing the cabin communication system in possibly millions of cars, such a practical consideration as cost can be a most significant factor.
The use of such a system of plural microphones is therefore advantageous in a
movable vehicle cabin, wherein a large, delicate and/or costly system may be
undesirable. Referring again to Fig. 2, the microphones 18-22 are advantageously located in the headliner 24 of the mini-van 10. Also located within the cabin of the
mini-van 10 are plural loudspeakers 26, 28. While three microphones and two loudspeakers are shown in Fig. 2, it will be recognized that the number of
microphones and loudspeakers and their respective locations may be changed to suit any particular cabin layout. If the microphones 18, 20, 22 are directional or form an
array, each will have a respective beam pattern 30, 32, 34 indicative of the direction in
which the respective microphone is most sensitive to sound. If the microphones 18-22
are omnidirectional, it is well known in the art to provide processing of the combined
signals so that the omnidirectional microphones have effective beam patterns when
used in combination.
The input signals from the microphones 18-22 are all sent to a digital
signal processor (DSP) 36 to be processed so as to provide output signals to the loudspeakers 26, 28. The DSP 36 may be part of the general electrical module of the
vehicle, part of another electrical system or provided independently. The DSP 36 may
be embodied in hardware, software or a combination of the two. It will be recognized
that one of ordinary skill in the art, given the processing scheme discussed below, would be able to construct a suitable DSP from hardware, software or a combination without undue experimentation.
Thus, the basic acoustic system embodied in the layout of Fig. 2
consists of multiple microphones and loudspeakers in a moderately resonant
enclosure. Fig. 3 illustrates a block diagram explanatory of elements in this embodiment, having two microphones, mic, and mic2, and two loudspeakers 1, and 1,.
Microphone mic, picks up six signal components, including first voice v, with a
transfer function Vn from the location of a first person speaking to microphone mic,, second voice v2 with a transfer function V2, from the location of a second person speaking to microphone mic,, first noise n, with a transfer function N,, and second noise n2 with a transfer function N21. Microphone mic, also picks up the output s, of
loudspeaker 1, with a transfer function of H, , and the output s2 of loudspeaker 12 with a transfer function H2]. Microphone mic2 picks up six corresponding signal
components. The microphone signal from microphone mic, is echo cancelled (-H,,s,-
H22s2), using an echo canceller such as the one disclosed herein, Wiener filtered (W,)
using the advantageous Wiener filtering technique disclosed below, amplified (K,) and output through the remote loudspeaker 12. As a result, for example, the total signal at point A in Fig. 3 is (H,,-Hn)s, + (H2,-H2l)s2 + V,,v, + V2,v2 +N,,n, + N2,n2.
Certain aspects of the advantageous CCS shown in Fig. 3 are disclosed
in concurrently filed, commonly assigned applications. For example, each of the
blocks LMS identifies the adaptation of echo cancellers as in the commonly-assigned application mentioned above, or advantageously an echo cancellation system as
described below. The CCS uses a number of such echo cancellers equal to the
product of the number of acoustically independent loudspeakers and the number of
acoustically independent microphones, so that the product here is four.
Additionally, random noises rand, and rand2 are injected and used to
identify the open loop acoustic transfer functions. This happens under two circumstances: initial system identification and during steady state operation. During
initial system identification, the system is run open loop (switches in Fig. 3 are open)
and only the open loop system is identified. Proper system operation depends on adaptive identification of the open loop acoustic transfer functions as the acoustics change. However, during steady state operation, the system runs closed loop. While
normal system identification techniques would identify the closed loop system, the
random noise is effectively blocked by the advantageous Wiener SEF, so that the open loop system is still the one identified. Further details of the random noise processing are disclosed in another concurrently filed, commonly assigned application. A CCS also has certain acoustic requirements. Thus, the present
inventors have determined that a minimum of 20 dB SNR provides comfortable intelligibility for front to rear communication in a mini-van. The SNR is measured as
20 log|0 of the peak voice voltage to the peak noise voltage. Therefore, the amount of
amplification and the amount of ambient road noise reduction will depend on the SNR
of the microphones used. For example, the microphones used in a test of the CCS gave a 5 dB SNR at 65 mph. with the SNR decreasing with increasing speed.
Therefore, at least 15 dB of amplification and 15 dB of ambient road noise reduction
is required. To provide a margin for differences in people's speech and hearing,
advantageously the system may be designed to provide 20 dB each. Similarly, at least
20 dB of acoustic echo cancellation is required, and 25 dB is advantageously supplied. Fig. 4 illustrates an advantageous experimentally derived acoustic budget. The overall
system performance is highly dependent on the SNR and the quality of the raw
microphone signal. Considerable attention must be give to microphone mounting,
vibration isolation, noise rejection and microphone independence. However, such factors are often closely dependent on the particular vehicle cabin layout.
As noted above, the present invention differs from the prior art in
expressly considering psycho-acoustics. One self-imposed aspect of that is that
passengers should not hear their own amplified voices from nearby loudspeakers.
This imposes requirements on the accuracy of echo cancellation and on the rejection of the direct path from a person to a remote microphone, i.e. microphone independence. The relative amplitude at multiple microphones for the same voice sample is a measure of microphone independence. A lack of microphone
independence results in a person hearing his own speech from a nearby loudspeaker because it was received and sufficiently amplified from a remote microphone.
Microphone independence can be achieved by small beamforming arrays over each
seat, or by single directional microphones or by appropriately interrelated
omnidirectional microphones. However, the latter two options provide reduced beamwidth. which results in significant changes in the microphone SNR as a
passenger turns his head from side to side or toward the floor.
Another aspect of acceptable psycho-acoustics is good voice quality.
In the absence of an acceptable metric of good voice quality, which is as yet unavailable, the voice quality is assessed heuristically as the amount of distortion and the perceptibility of echos. Voice distortion and echos result from both analog and
digital CCS filtering. Fig. 5 is a block diagram of filtering circuitry provided in a CCS
incorporating the SEF according to the present invention. The first two elements are
analog, using a High Pass Filter (HPF) 2-pole filter 38 and a Low Pass Filter (LPF) 4- pole filter 40. The next four elements are digital, including a sampler 42, a 4th order Band Pass Filter (BPF) 44, the Wiener SEF 300 in accordance with the present
invention and an interpolator 44. The final element is an analog LPF 4-pole filter 46.
The fixed analog and digital bandpass filters and the sample rate impose bandwidth restrictions on the processed voice. It has been found in developing the present invention that intelligibility is greatly improved with a bandwidth as low as 1.7 KFIz,
but that good voice quality may require a bandwidth as high as 4.0 KHz. Another
source of distortion is the quantization by the A/D and D/A converters (not
illustrated). While the quantization effects have not been fully studied, it is believed that A/D and D/A converters with a dynamic range of 60 dB from quietest to loudest signals will avoid significant quantization effects. The dynamic range of the A/D and D/A converters could be reduced by use of an automatic gain control (AGC). This is not preferred due to the additional cost, complexity and potential algorithm instability with the use of A/D and D/A AGC.
In addition, there will always be a surround sound effect, since the
voice amplification is desirably greater than the natural acoustic attenuation. As noted above, distinct echos result when the total CCS and audio delays exceed 20 ms. The
CCS delays arise from both filtering and buffering. In the preferred embodiment of
the invention, the delays advantageously are limited to 17 ms.
Having described the context of the present invention, the following
discussion will set forth the operation and elements of the novel SEF 300. In designing this SEF 300, it is unique to the present invention's speech enhancement by Wiener filtering to exploit the human perception of sound (mel-filtering), the anti-
causal nature of speech (causal noise filtering), and the (relative) stationarity of the
noise (temporal and frequency filtering).
First, it is commonly known that the human ear perceives sound at
different frequencies on a non-linear scale called the mel-scale. In other words, the frequency resolution of the human ear degrades with frequency. This effect is
significant in the speech band (300 Hz to 4 KHz) and therefore has a fundamental
bearing on the perception of speech. A better SNR can be obtained by smoothing the noisy speech spectrum over larger windows at higher frequencies. This operation is
performed as follows: if Y(f) is the frequency spectrum of noisy speech at frequency f, then the mel-filtering consists of computing: L
£ πk Y(f0 + k) Y(f0) = = x
L Σ πk k= -L
...(4)
Here, the weights πk are advantageously chosen as the inverse of the
noise power spectral densities at the frequency. The length L progressively increases
with frequency in accordance with the mel-scale. The resulting output Y(f0) has a
high SNR at high frequencies with ngligible degradation in speech quality or intelligibility.
Second, speech, as opposed to many other types of sound and in
particular noise, is anti-causal or anticipatory. This is well known from the wide-
spread use of tri-phone and bi-phone models of speech. In other words, each sound in turn is not independent, but rather depends on the context, so that the pronunciation of a particular phoneme often depends on a future phoneme that has yet to be
pronounced. As a result, the spectral properties of speech also depend on context.
This is direct contrast to noise generation, where it is well known that noise can be
modeled as white noise passing through a system. The system here corresponds to a causal operation (as opposed to the input speech), so that the noise at any instant of
time does not depend on its future sample path.
The present invention exploits this difference in causality by solving an appropriate causal filtering problem, i.e. a causal Wiener filtering approach. However in developing the present invention it was also recognized that straightforward causal filtering has severe drawbacks. First, a causal Wiener filtering
approach requires spectral factorization, which turns out to be extremely expensive computationally and is therefore impractical. Second, the residual noise left in the
extracted speech turned out to be perceptually unpleasant.
It was first considered reasonable to believe that it was the power
spectrum of the residual noise which is of concern, rather than the instantaneous value of the residual noise. This suggested solving the following optimization problem:
Find a causal filter that minimizes:
||Snn(f) - PI(f)Svy(f)||2
...(5)
This is the same as the previous formulation of the problem in
Equation (3), with the addition of constraints on causality and minimization of the residual power spectrum.
However, this solution also was found to suffer from drawbacks. From
psycho-acoustics it is known that the relative amount of white noise variation required
to be just noticeable is a constant 5%, independent of the sound pressure level. Since the noise excitation is broadband, it is reasonable to assume that the white noise model for just noticeable variation is appropriate. This would mean that a filter that
keeps the spectral noise spectral density relatively constant over time is appropriate.
The solution of Equation 5 fails to satisfy this requirement. The reason
is that a signal y which suddenly has a large SNR at a single frequency results in a filter H that has a large-frequency component only for those frequencies that have a large SNR. In contrast, for those frequencies with low SNR, the filter H will be nearly
zero. As a result, with this filter H the residual noise changes appreciably from time
frame to time frame, which can result in perceptible noise. The present invention resolves these problems by formulating a
weighted least squares problem, with each weight inversely proportional to the energy in the respective frequency bin. This may be expressed mathematically as follows:
min ∑ ( (Syy(0) '' | Snn(f)-H(f)Syy(f) |l )2 H causal f ...(6)
The above formulation has the following solution:
( Snn{£> 1 H(f) = l Syy(f) ) + ...(7)
Here, the symbol "+" denotes taking the causal part. The computation
of the above filter domain is relatively simple and straightforward, requiring only two
Fourier transforms, and for an appropriate data length the Fourier Transforms themselves can be implemented by a Fast Fourier Transform (FFT).
Variants of Equation (7) can also be used wherein a smoothed weight
is used based on past values of energy in each frequency bin or based on an average
based on neighboring bins. This would obtain increasingly smoother transitions in the
spectral characteristics of the residual noise. However, these variants will increase the required computational time.
It is conventional that the Wiener filter length, in either the frequency
or time domain, is the same as the number of samples. It is a further development of
the present invention to use a shorter filter length. It has been found that such a shorter filter length, most easily implemented in the time domain, results in reduced computations and better noise reduction. The reduced-length filter may be of an a priori fixed length, or the length may be adaptive, for example based on the filter
coefficients. As a further feature, the filter may be normalized, e.g. for unity DC gain.
A third advantageous feature of the present invention is the use of
temporal and frequency smoothing. In particular, the denominator in Equation 7 for the causal filter is an instantaneous value of the power spectrum of the noisy speech signal, and therefore it tends to have a large variance compared to the numerator,
which is based on an average over a longer period of time. This leads to fast variation
in the filter in addition to the fact that the filter is not smooth. Smoothing in both time
and frequency are used to mitigate this problem. First, the speech signal is weighted with a cos2 weighting function in
the time domain. Then the Wiener filter is smoothed temporally, as follows:
Hn(f) = θ Hn(f) + (1 - θ ) Hn_,(f)
...(8)
Here the subscript n denotes the filter at time n. Finally, the Wiener filter is smoothed in frequency, as follows:
m Hn(Q = ∑ w(s) Hn(f+s) s^ -m
•••(9) Here the weights, w, can be frequency dependent.
In addition to the factors discussed above, it has been recognized in
developing the present invention that the estimation of the noise spectrum is critical to
the success of speech extraction. In many conventional speech enhancement
applications, a voice activity detector (VAD) is used to determine when there is no speech. These intervals are then used to update the power spectrum of the noise. This approach may be suitable in situations in which the noise spectrum does not change appreciably with time, and in which noise and speech can be reliably distinguished. However, it has been recognized in developing the present invention that in a movable
cabin environment, the noise characteristics often do change relatively rapidly and the
voice to noise ratio is very low. To operate properly, a VAD would have to track
these variations effectively so that no artifacts are introduced. This is recognized to be difficult to achieve in practice.
It has further recognized in developing the present invention that a
VAD is not even necessary, since the duration of speech, even when multiple people
are speaking continuously, is far less than the duration when there is only noise.
Therefore, it is appropriate to merely provide a weighted average of the estimated noise spectrum and the spectrum of the noisy speech signal, as follows:
Sk nn(f) = δ Sk"' m(f) + (1 - δ) (( γH(f) + (1- γ)) Y(f))2
...(10)
With all of the above considerations in mind, Fig. 6 illustrates the
structure of an embodiment of the advantageous Wiener SEF 300. In this
embodiment, the noisy speech signal is sampled at a frequency of 5 KHz. A buffer
block length of 32 samples is used, and a 64 sample window is used at each instant to
extract speech. An overlap length of 32 samples is used, with the proviso that the first
32 samples of extracted speech from a current window are averaged with the last 32
samples of the previous window. The block length, sample window and overlap
length may be varied, as is well known in the art and illustrated below without departing from the spirit of the invention.
In the block diagram of Fig. 6, the noisy speech is first mel-filtered in mel-filter 302. This results in improving the SNR at high frequencies. A typical situation is shown in Fig. 7, where mel-filtering with the SEF 300 primarily improves
the SNR above 1000 Hz. Next, in Fig. 6, the speech must be enhanced at low
frequencies where fixed filtering schemes such as mel-filtering are ineffective. This is achieved by making use of adaptive filtering techniques. The mel-filtered output
passes through the adaptive filter Fn 304 to produce an estimate of the noise update.
This estimate is integrated with the previous noise spectrum using a one-pole filter F,
306 to produce an updated noise spectrum. An optimization tool 308 inputs the
updated noise spectrum and the mel-filtered output from mel-filter 302 and uses an
optimization algorithm to produce a causal filter update. This causal filter update is
applied to update a causal filter 310 receiving the mel-filtered output. The updated
causal filter 310 determines the current noise estimate. This noise estimate is
subtracted from the mel-filtered output to obtain a speech estimate that is amplified appropriately using a filter F0 312.
The effect of the filtering algorithm on a typical noisy speech signal
taken in a mini-van traveling at approximately 65 mph is shown in Figs. 8 and 9. Fig.
8 illustrates the noisy speech signal and Fig. 9 illustrates the corresponding Wiener- filtered speech signal, both for the period of 12 seconds. A comparison of the two
plots demonstrates substantial noise attenuation.
Also tested was s a Matlab implementation of the algorithm in which
the Wiener filter sample window has been increased to 128 points while keeping the
buffer block length at 32. This results in an overlap of 96 samples. The resulting noise cancellation performance is better. Moreover, by the use of conventional highly optimized real-to-complex and complex-to-real transforms, the computational
requirements are approximately the same as for the smaller sample window. The corresponding noise power spectral densities are shown in Fig. 7.
These correspond to the periods of time in the 12 second interval above when there
was no speech. The three curves respectively correspond to the power spectral density
of the noisy signal, the mel-smoothed signal and the residual noise left in the de-
noised signal. It is clear from Fig. 7 that mel-smoothing results in substantial noise reduction at high frequencies. Also, it can be seen that the residual noise in the
Wiener filtered signal is of the order of 15 dB below the noise-only part of the noise
plus speech signal uniformly across all frequencies.
In an actual test of the CCS incorporating the advantageous SEF in combination with the advantageous acoustic echo canceller disclosed below, the performance of the system was measured in a mini-van after 15 minutes at 70 mph.
Audio recordings were taken at 5 KHz. The directional microphones, their mounting
and the natural acoustic attenuation of the cabin resulted in between 16 dB and 22 dB
of microphone independence. The reproduced loudspeaker signals had between 24 dB
and 33 dB of peak voice to peak noise SNR. The acoustic echo canceller also performed well, as will be discussed below. Fig. 10 illustrates the results. Therefore
it was determined that the CCS performance met or exceeded all microphone
independence, echo cancellation and noise reduction specifications.
The discussion will now address the design of the advantageous AEC 400 in accordance with the present invention. For purposes of easy understanding, the
following discussion will be directed to a single input-single output system, i.e. one
microphone and one loudspeaker. However, it will be well understood by those of ordinary skill in the art that the analysis can be expanded to a multiple input-multiple output system. As a first point, a robust acoustic echo canceller requires accurate
identification of the acoustic transfer function from loudspeaker to the microphone.
This means that if the relation of the loudspeaker and microphone is h and the coefficients of the AEC 400 are h, then ideally h - h = 0. In such case, the AEC is
truly measuring h, not something else. If the system h is properly identified in an
initial open loop operation, then h/h will be initially correct. However, over time, for
example over lA hour, h will begin to drift. Therefore, it is important to keep h accurate in closed loop operation for a robust system. In the present invention, the
underlying theme in developing robust adaption is to evolve a strategy to ensure
independence of noise and the loudspeaker output. Fig. 11 illustrates a block diagram
of the advantageous AEC 400.
In Fig. 1 1, the signal from microphone 200 is fed to a summer 210, which also receives a processed output signal, so that its output is an error signal (e).
The error signal is fed to a multiplier 402. The multiplier also receives a parameter μ
(mu), which is the step size of an unnormalized Least Mean Squares (LMS) algorithm
which estimates the acoustic transfer function. Normalization, which would
automatically scale mu, is advantageously not done so as to save computation. If the extra computation could be absorbed in a viable product cost, then normalization
would advantageously be used. The value of mu is set and used as a fixed step size,
and is significant to the present invention, as will be discussed below.
Referring back to Fig. 11. the multiplier 402 also receives the regressor (x) and produces an output that is added to a feedback output in summer 404, with the sum being fed to a accumulator 406 for storing the coefficients (h) of the transfer
function. The output of the accumulator 406 is the feedback output fed to summer 404. This same output is then fed to a combination delay circuit or Finite Impulse
Response (FIR) filter, in which the echo signal is computed. The echo signal is then
fed to summer 210 to be subtracted from the input signal to yield the error signal (e).
The value of mu controls how fast the AEC 400 adapts. It is an important feature of the present invention that mu is advantageously set in relation to
the step size of the SEF to make them sufficiently different in adaptation rate that they
do not adapt to each other. Rather, they each adapt to the noise and speech signals
and to the changing acoustics of the CCS.
The present invention also recognizes that the AEC 400 does not need to adapt rapidly. The most dynamic aspect of the cabin acoustics found so far is
temperature, and will be addressed below. Temperature, and other changeable
acoustic parameters such as the number and movement of passengers, change
relatively slowly compared to speech and noise. To keep the adaptation rates of the AEC 400 and the SEF 300 separated as much as possible to minimize their interaction, it is noted that some aspects of the Wiener SEF 300 are fast, so that again
the adaptation rate of the echo canceller should be slow.
Since the LMS algorithm is not normalized, the correct step size is
dependent on the magnitude of the echo cancelled microphone signals. To empirically select a correct value for mu, the transfer functions should be manually
converged, and then the loop is closed and the cabin subjected to changes in
temperature and passenger movement. Any increase in residual echo or bursting
indicates that mu is too small. Thereafter, having tuned any remaining parameters in the system, long duration road tests can be performed. Any steady decrease in voice quality during a long road test indicates that m may be too large. Similarly, significant changes in the transfer functions before and after a long road trip at constant
temperature can also indicate that mu may be too large.
To manually cause convergence of the transfer functions, the system is
run open loop with a loud dither, see below, and a large mu, e.g. 1.0 for a mini-van.
The filtered error sum is monitored until it no longer decreases, where the filtered error sum is a sufficiently Loss Pass Filtered sum of the squared changes in transfer
function coefficients. Mu is progressively set smaller while there is no change in the filtered error sum until reaching a sufficiently small value. Then the dither is set to its
steady state value.
The actual convergence rate of the LMS filter is made a submultiple of
Fs (5 KHz in this example). The slowest update that does not compromise voice quality is desirable, since that will greatly reduce the total computational
requirements. Decreasing the update rate of the LMS filter will require a larger mu,
which in turn will interfere with voice quality through the interaction of the AEC 400
and the SEF 300.
As a specific advantageous example, the step size mu for the AEC 400 is set to 0.01, based on empirical studies. Corresponding to this mu, the step size β
(beta) for the SEF 300, which again is based on empirical studies, is set to 0.0005.
The variable beta is one of the overall limiting parameters of the CCS, since it controls the rate of adaptation of the long term noise estimate. It has been found that it is important for good CCS performance that beta and mu be related as:
β « Λk ^ Es k n
...(1 1) Here k is the value of the variable update-every for the AEC 400 (2 in
this example) and n is the number of samples accumulated before block processing by
the SEF 300 (32 in this example). In other words, the adaptation rate of the long term
noise estimate must be much smaller than the the AEC adaptation rate, which must be much smaller than the basic Wiener filter rate. The rate of any new adaptive
algorithms added to the CCS, for example an automatic gain control based on the
Wiener filter noise estimate, should be outside the range of these parameters. For
proper operation, the adaptive algorithms must be separated in rate as much as possible.
Mathematically, in the single input-single output CCS, if y(t) is the
input to the microphone and u(t) is the speaker output, then the two are related by:
y(t) = H * u(t) + s(t) + n(t)
...(12)
Here, n(t) is the noise, s(t) is the speech signal from a passenger, i.e. the spoken voice, received at the microphone, and FI is the acoustic transfer function.
There are two problems resulting from closed loop operation, wherein
u is a function of past values of s and n. First, n(t) could be correlated with u(t). Second, s(t) is colored for the time scale of interest, which implies again that u(t) and
s(t) are correlated. Several methods have been considered to overcome these
problems and three are proposed herein: introducing dither, using block recursive
adaptive algorithms and compensating for temperature, voice cancelled echo canceller adaptation and direct adaptation. These will be discussed in turn.
The first step, however, is to cancel the signal from the car stereo system, since the radio signal can be directly measured. The only unknown is the gain, but this can be estimated using any estimator, such as a conventional single tap
LMS. Fig. 12 illustrates the single input-single output CCS with radio cancellation.
In this development, the CCS 500 includes a microphone 200 with the input signal s(t)
= n(t) + Hu(t), SEF Wiener filter 300 and AEC 400. The CCS 500 also includes an
input 502 from the car audio system feeding a stereo gain estimator 504. The output of the gain estimator 504 is fed to a first summer 506. Another input to first summer
506 is the output of a second summer 508, which sums the output of the SEF 300 and
random noise r(t). The output of the second summer 508 is also the signal u(t) fed to
the loudspeaker. As indicated in Fig. 12, the random noise is input at summer 508 to provide a known source of uncorrelated noise. This random noise r(t) is used as a
direct means of insuring temporal independence, rather than parameterizing the
input/output equations to account for dependencies and then estimate those
parameters. The parameterization strategy has been found to be riddled with
complexity, and the solution involves solving non-convex optimization problems.
Accordingly, the parameterization approach is currently considered infeasible on
account of the strict constraints and the computational cost.
Advantageously, the random noise r(t) is entered as a dither signal. A
random dither is independent of both noise and speech. Moreover, since it is a known signal, it is removed, or blocked, by the Wiener SEF 300. As a result, identification of the system can now be performed based on the dither signal, since the system looks
like it is running open loop. However, the dither signal must be sufficiently small so that it does not introduce objectionable noise into the acoustic environment, but at the
same time it must be loud enough to provide a sufficiently exciting, persistent signal. Therefore, it is important that the dither signal be scaled with the velocity of the cabin,
since the noise similarly increases. Advantageously, the dither volume is adjusted by the same automatic volume control used to modify the CCS volume control.
In the embodiment discussed above, an LMS algorithm is used to
identify the acoustic transfer function. In addition to LMS, other possible approaches
are a recursive least squares (RLS) algorithm and a weighted RLS. However, in these
other approaches, if the dither signal is used for identification, then the estimate is the
estimate of the closed loop system rather than the desired acoustic transfer function. Extracting the acoustic transfer function from the closed loop system is subject to
additional complications. Alternatively, it might be possible to develop an iterative
algorithm that identifies coefficients that must be causally related to due to the
acoustic delay, and the remaining coefficients could then be identified recursively. Therefore, a different algorithm is advantageously used which is a
variant of the above-mentioned algorithms. To derive this algorithm, it is first noted
that the speaker output u(t) can be written as:
u[t] =X (SEF * (s[t] + n[t])) 4- r[t]
...(13)
Here SEF is the speech extraction filter 300 and d accounts for time
delays.
Further, the dither signal r(t) is taken to be white, and therefore is uncorrelated with past values. Therefore, the input/output equations can be rearranged
as follows: y[t] = IId H * u[t] + (I - πd) H * u[t] + s[t] + n[t]
= πd H * r[t] + (I - IL H * (z d (SEF * (s[t] - n[t])) + r[t]) + s[t] + n[t]
H * r[t] + (I - LTd) H * (X (SEF * (s[t] + n[t])) + r[t]) + s[t] + n[t]
...(14)
Here IId is a truncation operator that extracts the d impulse response coefficients and sets the others to zero, and d is less than the filter delay plus the
computational delay plus the acoustic delay, i.e.:
H • ; r -I- f -I- f u LSEF Filter ' LComputati-n "-Acoustics
...(15)
The last three terms in Equation 14 are uncorrelated from the first term, which is the required feature. It should also be noted that only the first d coefficients
can be identified. This point serves as an insight as to the situations where integration
of identification and control results in complications. As may be seen, this happens
whenever d does not meet the "less than" criterion of Equation 15.
Next, the last three terms are regarded as noise, and either an LMS or
RLS approach is applied to obtain very good estimates of the first d impulse
coefficients of H. The coefficients from d+1 onwards can either be processed in a
block format (d+l:2d-l, 2d:3d-l,...) to improve computational cost and accuracy, or else they can be processed all at once. In either case, the equations are modified in
both LMS and RLS to account for the better estimates of the first d coefficients of H.
In the case of unnormalized LMS, the result is as follows:
H2d tH = H2d t + μ ιrd t.d (y[fj - (ud t)Hd , - (u2d,d)H2d t)
...(16)
Here H2d tH denotes the update at time t+1. H2d tM is a column vector of the acoustic transfer function H containing the coefficients from d to 2d- 1. In the case of input, ud t denotes a column vector [u[t], u[t-l]....,u[t-d+l]]'. H3d t is estimated in a
similar manner, with the only difference being that the contribution from H2d l M is also
subtracted from the error. Such algorithms can be guaranteed to have the same
properties as their original counterparts.
It has been found that d is advantageously between 10 and 40. These
values take into account the time delay between the speaker speaking and the sound appearing back at the microphone after having been passed through the CCS. As a
result, this keeps the voice signals uncorrelated. In general, d should be as large as
possible provided that it still meets the requirement of Equation 15.
In the case of RLS, it is also possible to develop a computationally
efficient algorithm by adopting block processing. It takes approximately O(n2) in computational cost to process RLS where n is the length of the transfer function H. Block processing, on the other hand, only requires O(nd2). The algorithm is presented
in Fig.13.
As noted above, temperature is one of the principle components that
contribute towards time variation in the AEC 400. Changes in temperature result in changing the speed of sound, which in turn has the effect of scaling the time axis or
equivalently, in the frequency domain, linearly phase shifting the acoustic transfer
function. Thus, if the temperature inside the cabin and the acoustic transfer function
at a reference temperature are known, it is possible to derive the modified transfer function either in time, by decimating and interpolating, or in the frequency domain, by phase warping. It therefore is advantageous to estimate the temperature. This may be done by generating a tone at an extremely low frequency that falls within the loudspeaker and microphone bandwidths and yet is not audible. The equation for
compensation is then:
_c,_ = arctan| Hrer_£ω}_ 1 crer l H,(ω) J ...(17)
Here c is the speed of sound.
The transfer function at a frequency ω can be estimated using any of
several well known techniques. Sudden temperature changes can occur on turning on
the air conditioning, heater or opening a window or door. The reason it becomes
necessary to use the temperature estimate in addition to on-line identification is because the error between two non-overlapping signals is typically larger than for
overlapping signals, as shown in Fig. 14. Therefore, it usually takes a prohibitively
large time to converge based just upon the on-line identification.
To accurately compute the speed of sound, it is necessary to compensate for any fixed time delays in the measured transfer functions H. For instance, there typically are fixed computational delays as well as delays as a function
of frequency through any analog filter. These delays may be measured by use of
multiple tones or a broadband signal.
As previously indicated, the effect of the CCS incorporating the SEF
300 and the AEC 400 on a typical noisy speech signal taken in a mini-van traveling at
approximately 65 mph is shown in Figs. 8 and 9. Fig. 8 illustrates the noisy speech
signal and Fig. 9 illustrates the corresponding Wiener-filtered speech signal, both for the period of 12 seconds. A comparison of the two plots demonstrates substantial noise attenuation. Also tested was a dBase implementation of the algorithm in which the
Wiener filter sample window has been increased to 128 points while keeping the
buffer block length at 32. This results in an overlap of 96 samples. The resulting
noise cancellation performance is better. Moreover, by the use of conventional highly
optimized real-to-complex and complex-to-real transforms, the computational
requirements are approximately the same as for the smaller sample window.
As also previously indicated, the corresponding noise power spectral
densities are shown in Fig. 7. These correspond to the periods of time in the 12
second interval above when there was no speech. The three curves respectively
correspond to the power spectral density of the noisy signal, the mel-smoothed signal and the residual noise left in the de-noised signal. It is clear from Fig. 7 that mel- smoothing results in substantial noise reduction at high frequencies. Also, it can be
seen that the residual noise in the Wiener filtered signal is of the order of 15 dB below
the noise-only part of the noise plus speech signal uniformly across all frequencies.
In the actual test of the CCS incorporating the advantageous SEF 300
and AEC 400 as shown in Fig. 10, the AEC 400 achieved more than 20 dB of cancellation. This is further shown in Figs. 15 and 16. Therefore it was determined
that the CCS performance met or exceeded all microphone independence, echo
cancellation and noise reduction specifications. There are other aspects of the present invention that contribute to the
improved functioning of the CCS. One such aspect relates to an improved AGC in accordance with the present invention that is particularly appropriate in a CCS
incorporating the SEF 300 and AEC 400. The present invention provides a novel and
unobvious AGC circuit that controls amplification volume and related functions in the CCS, including the generation of appropriate gain control signals and the prevention
of amplification of undesirable transient signals.
It is well known that it is necessary for customer comfort, convenience and safety to control the volume of amplification of certain audio signals in audio communication systems such as the CCS. Such volume control should have an
automatic component, altough a user's manual control component is also desirable.
The prior art recognizes that any microphone in a cabin will detect not only the
ambient noise, but also sounds purposefully introduced into the cabin. Such sounds include, for example, sounds from the entertainment system (radio, CD player or even
movie soundtracks) and passengers' speech. These sounds interfere with the
microphone's receiving just a noise signal for accurate noise estimation.
Prior art AGC systems failed to deal with these additional sounds adequately. In particular, prior art AGC systems would either ignore these sounds or attempt to compensate for the sounds.
In contrast, the present invention provides an advantageous way to
supply a noise signal to be used by the AGC system that has had these additional
noises eliminated therefrom, i.e. by the use of the inventive SEF 300 and/or the inventive AEC 400. Advantageously, both the SEF 300 and the AEC 400 are used in combination with the AGC in accordance with the present invention, although the use
of either inventive system will improve performance, even with an otherwise
conventional AGC system. In addition, it will be recalled from the discussion of the SEF 300 that it is advantageous for the dither volume to be adjusted by the same automatic volume control used to modify the CCS volume control, and the present invention provides such a feature. The advantageous AGC 600 of the present invention is illustrated in
Fig. 17. As shown therein, the AGC 600 receives two input signals: a signal gain-pot
602, which is an input from a user's volume control 920 (discussed below), and a
signal age-signal 604, which is a signal from the vehicle control system that is
proportional to the vehicle speed. As will be discussed below, the generation of the age-signal 604 represents a further aspect of the present invention. The AGC 600
further provides two output signals: an overall system gain 606, which is used to
control the volume of the loudspeakers and possibly other components of the audio
communication system generally, and an AGC dither gain control signal, rand-val
608, which is available for use as a gain control signal for the random dither signal r(t) of Fig. 9, or equivalently for the random noise signals rand, and rand2 of Fig. 3.
Before discussing the inventive structure of AGC 600 itself, a
discussion will be provided of the generation of the inventive age-signal 604. Fig. 18
is similar to Fig. 1, but shows the use of the SEF 300 and the AEC 400, as well as the addition of a noise estimator 700 that generates the age-signal 604. As shown in Fig.
18, the age-signal 604 is generated in noise estimator 700 from a noise output of the SEF 300. As described above in connection with Fig. 6, the primary output signal
output from filter F0 312 is the speech signal from which all noise has been
eliminated. However, the calculation of this speech signal involved the determination
of the current noise estimate, output from the causal filter 310. This current noise estimate is illustrated as noise 702 in Fig. 18.
It is possible to use this noise 702 as the age-signal 604 itself. This
noise 702 is an improvement for this purpose over noise estimates in prior art systems in that it reflects the superior noise estimation of the SEF 300, with the speech effectively removed. It further reflects the advantageous operation of the AEC 400
that removed the sound introduced into the acoustic environment by the loudspeaker
104. Indeed, it would even be an improvement over the prior art to use the output of
the AEC 400 as the age-signal 604. However, this output includes speech content,
which might bias the estimate, and therefore is generally not as good for this purpose
as the noise 702.
However, the present invention goes beyond the improved noise estimation that would occur if the noise 702 were used for the age-signal 604 by
combining the noise 702, which is a feedback signal, with one or more feed forward
signals that directly correspond to the amount of noise in the cabin that is not a
function of the passengers' speech. As shown in Fig. 18, such feed forward signals advantageously include a speed signal 704 from a speed sensor (not illustrated) and/or a window position signal 706 from a window position sensor (not illustrated). As
anyone who has ridden in an automobile will know, the faster the automobile is going,
the greater the engine and other road noise, while the interior noise also increases as
one or more windows are opened. By combining the use of these feed forward signals
with the noise 702, a superior age-signal 604 can be generated as the output 708 of noise estimator 700. The superior AGC signal may actually decrease the system gain
with increasing noise under certain conditions such as wind noise so loud that
comfortable volume levels are not possible.
Referring back to Fig. 17, the age-signal 604 is considered to be the desired one of the noise 702 and the output 708. Flowever, because the structure of the AGC 600 is itself novel and unobvious and constitutes an aspect of the present invention, it is possible to alternatively use a more conventional signal, such as the
speed signal 704 itself.
In each case, the age-signal 604 is then processed, advantageously in
combination with the output of the user's volume control gain-pot 602, to generate the
two output signals 606, 608. In this processing, a number of variables are assigned values to provide the output signals 606, 608. The choices of these assigned values contribute to the effective processing and are generally made based upon the hardware
used and the associated electrical noise, as well as in accordance with theoretical
factors. However, while the advantageous choices for the assigned values for the tested system are set forth below, it will be understood by those of ordinary skill in the art that the particular choices for other systems will similarly depend on the particular
construction and operation of those systems, as well as any other factors that a
designer might wish to incorporate. Therefore, the present invention is not limited to
these choices. The age-signal 604 is, by its very nature, noisy. Therefore, it is first
limited between 0 and a value AGC-LIMTT in a limiter 610. A suitable value for
AGC-LIMIT is 0.8 on a scale of zero to one. Then the signal is filtered with a one-
pole low-pass digital filter 612 controlled by a value ALPHA- AGC. The response of
this filter should be fast enough to track vehicle speed changes, but slow enough that the variation of the filtered signal does not introduce noise by amplitude modulation.
A suitable value for ALPHA-AGC is 0.0001. The output of the filter 612 is the filt-
agc-signal, and is used both to modify the overall system gain and to provide automatic gain control for the dither signal, as discussed above. Turning first to the overall system gain calculation, the filt-agc-signal
is used to linearly increase this gain. This linear function has a slope of AGC-GAIN,
applied by multiplier 614, and a y-intercept of 1, applied by summer 616. A suitable
value for AGC-GAIN is 0.8. The result is a signal age, which advantageously
multiplies a component from the user's volume control.
This component is formed by filtering the signal gain-pot 602 from the user's volume control. Like age-signal 604, gain-pot 602 is very noisy and therefore is filtered in low-pass filter 618 under the control of variable ALPHA-GAIN-POT. A
suitable value for ALPHA-GAIN-POT is 0.0004. The filtered output is stored in the
variable var-gain. The overall front to rear gain is the product of the variable var-gain
and the variable gain-r (not shown). A suitable value for gain-r is 3.0. Similarly, the overall rear to front gain (not shown) is the product of the variable var-gain and a
variable gain-f, also having a suitable value of 3.0 in consideration of power amplifier
balance.
In AGC 600, however, the overall system gain 606 is formed by multiplying, in multiplier 620, the var-gain output from filter 618 by the signal age output from the summer 616.
The gain control signal rand-val 608 for the dither signal is similarly
processed, in that the filt-agc-signal is used to linearly increase this gain. This linear
function has a slope of fand-val-mult. applied by multiplier 622, and a y-intercept of
1 , applied by summer 624. A suitable value for rand-val-mult is 45. The output of summer 624 is multiplied by variable rand-amp, a suitable value of which is 0.0001. The result is the signal rand-val 608. The AGC 600 is tuned by setting appropriate values for AGC-LIMIT and ALPHA-AGC based on the analog AGC hardware and the electrical noise. In the
test system, the appropriate values are 0.5 and 0.0001, respectively.
Then the variable rand-val for the dither signal is further tuned by setting rand-amp and rand-val-mult. To this end, first rand-amp is set to the largest value that is imperceptible in system on/off under open loop, idle, windows and doors
closed conditions. Next, the variable rand-val-mult is set to the largest value that is
imperceptible in system on/off under open loop, cruise speed (e.g. 65 mph), windows
and doors closed conditions. In the test system, this resulted in rand-amp equal to 0.0001 and rand-val-mult equal to 45, as indicated above.
In the test vehicle, the output 708 of Fig. 18 was identical to the signal-
age 604 output from the summer 616 in Fig. 17. This signal-age was directly
proportional to vehicle speed over a certain range of speeds, i.e. was linearly related over the range of interest. However, since road and wind noise often increase as a nonlinear function of speed, e.g. as a quadratic function, a more sophisticated
generation of the signal-age may be preferred.
Fig. 19 illustrates the generation of the signal-age by a quadratic
function. The filt-agc-signal from low pass filter 612 in Fig. 17 is multiplied in multiplier 628 by AGC-GAIN and added, in summer 630, to one. However, summer 630 also adds to these terms a filt-agc-signal squared term from square multiplier 632
which was multiplied by a constant AGC-SQUARE-GAIN in multiplier 634. This
structure implements a preferred age signal that is a quadratic function of the filt-agc-
signal. The interior noise of a vehicle cabin is influenced by ambient factors
beyond the contributions to engine, wind and road noise discussed above that depend
only on vehicle speed. For instance, wind noise varies depending on whether the
windows are open or closed and engine noise varies depending on the RPM. The
interior noise further depends on unpredictable factors such as rain and nearby traffic.
Additional information is needed to compensate for these factors.
In addition to the Window Position and Speed Sensor inputs, noise
estimator 700 of Fig. 18 may be modified to accept inputs such as Door Open and
Engine RPM etc. for known factors that influence cabin interior noise levels. These
additional inputs are used to generate the output 708.
In a preferred embodiment, the Door Open signal (e.g. one for each
door) is used to reduce the AGC gain to zero, i.e. to turn the system off while a door is
open. The Window Open signal (e.g. one for each window) are used to increase the
AGC within a small range if, for example, one or more windows are slightly open, or
to turn the system off if the windows are fully open. In many vehicles, the engine noise proportional to RPM is insignificant and AGC for this noise will not be needed. However, this may not be the case for certain vehicles such as Sport Utility Vehicles,
and linear compensation such as depicted in Fig. 17 for the age-signal may be
appropriate.
Fig. 20 is an illustration of the uses of the input from the SEF 300 to
account for unknown factors that influence cabin interior noise levels. As shown therein, the SEF 300 can operate for each microphone to enhance speech by
estimating and subtracting the ambient noise, so that individual microphone noise
estimates can be provided. The noise estimator accepts the instantaneous noise estimates for each microphone, integrates them in integrators 750a, 750b, ...750i and
weights them with respective individual microphone average levels compensation
weights in multipliers 752a, 752b,...752i. The weights are preferably precomputed to compensate for individual microphone volume and local noise conditions, but the weights could be computed adaptively at the expense of additional computation. The
weighted noise estimates are then added in adder 754 to calculate a cabin ambient
noise estimate. The cabin ambient noise estimate is compared to the noise level
estimated from known factors by subtraction in subtractor 756. If the cabin ambient noise estimate is greater, then after limiting in limiter 758, the difference is used as a correction in that the overall noise estimate is increased accordingly. While it is
possible to use just the cabin ambient noise estimate for automatic gain control, the
overall noise estimate has been found to be more accurate if known factors are used first and unknown factors are added as a correction, as in Fig. 20.
Another aspect of the AGC in accordance with the present invention
contributes to the advantageous functioning of the CCS. Thus, it was noted above that
the SEF 300 provides excellent noise removal in part by treating the noise as being of
relatively long duration or continuous in time compared with the speech component.
However, there are some noise elements that are of relatively short duration, comparable to the speech components, for example the sound of the mini-van's tire
hitting a pothole. There is nothing to be gained by amplifying this type of noise along
with the speech component. Indeed, such short noises are frequently significantly louder than any expected speech component and, if amplified, could startle the driver into making a dangerous mistake. Such short noises are called transient noises, and the prior art includes
many devices for specific transient signal suppression, such as lightning or voltage
surge suppressors. Other prior art methods pertain to linear or logarithmic volume
control (fade-in and fade-out) to control level-change transients. There are also
numerous control systems which are designed to control the transient response of some physical plant, i.e. closed loop control systems. All these prior art devices and methods tend to be specific to certain implementations and fields of use.
A transient suppression system for use with the CCS in accordance
with the present invention also has implementation specifics. It must first satisfy the
requirement, discussed above, that all processing between detection by the microphones and output by the speakers must take no more than 20 ms. It must also
operate under open loop conditions.
In accordance with a further aspect of the present invention, there are
provided transient signal detection techniques consisting of parameter estimation and
decision logic that are used to gracefully preclude the amplification or reproduction of undesirable signals in an intercommunication system such as the CCS.
In particular, the parameter estimation and decision logic includes
comparing instantaneous measurements of the microphone or loudspeaker signals, and
further includes comparing various processed time histories of those signals to
thresholds or templates. When an undesirable signal is so detected, the system shuts off adaptation for a suitable length of time corresponding to the duration of the transient and the associated cabin ring-down time and the system outputs (e.g. the
outputs of the loudspeakers) are gracefully and rapidly faded out. After the end of this time, the system resets itself, including especially any adaptive parameters, and gracefully and rapidly restores the system outputs. The graceful, rapid fade-out and
fade-in is accomplished by any suitable smooth transition, e.g. by an exponential or
trigonometric function, of the signal envelope from its current value to zero, or vice
versa. In accordance with the present invention, the parameter estimation
advantageously takes the form of setting thresholds and/or establishing templates.
Thus, one threshold might represent the maximum decibel level for any speech
component that might reasonably be expected in the cabin. This parameter might be used to identify any speech component exceeding this decibel level as an undesirable
transient.
Similarly, a group of parameters might establish a template to identify
a particular sound. For example, the sound of the wheel hitting a pothole might be characterized by a certain duration, a certain band of frequencies and a certain amplitude envelope. If these characteristics can be adequately described by a
reasonable number of parameters to permit the identification of the sound by
comparison with the parameters within the allowable processing time, then the group
of parameters can be used as a template to identify the sound. While thresholds and templates are mentioned as specific examples, it will be apparent to those of ordinary
skill in the art that may other methods could be used instead of, or in addition to, these
methods.
Fig. 21 illustrates the overall operation of the transient processing system 800 in accordance with the present invention. As shown in Fig. 21, signals from the microphones in the cabin are provided to a parameter estimation processor 802. It will be recalled that the outputs of the loudspeakers will reflect the content of the sounds picked up by the microphones to the extent that those sounds are not
eliminated by the processing of the CCS, e.g. by noise removal in the SEF and by
echo cancellation by the AEC 400. Based on these signals, the processor 802
determines parameters for deciding whether or not a particular short-duration signal is a speech signal, to be handled by processing in the SEF 300, or an undesirable
transient noise to be handled by fading-out the loudspeaker outputs. Such parameters may be determined either from a single sampling of the microphone signals at one
time, or may be the result of processing together several samples taken over various
lengths of times. One or more such parameters, for example a parameter based on a
single sample and another parameter based on 5 samples, may be determined to be used separately or together to decide if a particular sound is an undesirable transient or
not. The parameters may be updated continuously, at set time intervals, or in response
to set or variable conditions.
The current parameters from processor 802 are then supplied to
decision logic 804, which applies these parameters to actually decide whether a sound is the undesirable transient or not. For example, if one parameter is a maximum
decibel level for a sound, the decision logic 804 can decide that the sound is an
undesirable transient if the sound exceeds the threshold. Correspondingly, if a plurality of parameters define a template, the decision logic 804 can decide that the
sound is an undesirable transient if the sound matches the template to the extent
required.
If the decision logic 804 determines that a sound is an undesirable transient, then it sends a signal to activate the AGC, here illustrated as automatic gain control (AGC) 810, which operates on the loudspeaker output first to achieve a graceful fade-out and then, after a suitable time to allow the transient to end and the
cabin to ring down, provide a graceful fade-in.
Once again, the decision in decision logic 804 can be based upon a
single sample of the sound, or can be based upon plural samples of the sound taken in
combination to define a time history of the sound. Then the time history of the sound
may be compared to the thresholds or templates established by the parameters. Such
time history comparisons may include differential (spike) techniques, integral (energy) techniques, frequency domain techniques and time-frequency techniques, as well as
any others suitable for this purpose.
As shown in Fig. 21, the identification of a sound as an undesirable
transient may additionally or alternatively be based on the loudspeaker signals. These loudspeaker signals would be provided to a parameter estimation processor 806 for
the determination of parameters, and those parameters and the sound sample or time
history of the sound would be provided to another decision logic 808. The structure
of processor 806 would ordinarily be generally similar to, or identical to, the structure
of processor 802, although different parameter estimations may be appropriate to take into account the specifics of the microphones or loudspeakers, for example. Similarly,
the structure of the decision logic 808 would ordinarily be similar to, or identical to,
that of the decision logic 804, although different values of the parameters might yield
different thresholds and/or templates, or even separate thresholds and/or templates.
It will also be understood that other techniques for parameter estimation, decision logic and signal suppression may be used within the scope of the
present invention. Similarly, the invention is not limited to the use of microphone
signals and/or loudspeaker signals, nor need each decision logic operate on only one kind of such signals. Furthermore, the response to the detection of an undesirable
transient is not limited to fade-out.
The determination of a simple threshold is shown in Fig. 22. For this determination, a recording is made of the loudest voice signals for normal
conversation. Fig. 22 shows the microphone signals for such a recording. This example signal consists of a loud, undesirable noise followed by a loud, acceptable
spoken voice. A threshold is chosen such that the loudest voice falls below the
threshold and the undesirable noise rapidly exceeds the threshold. The threshold level
may be chosen empirically, as in the example at 1.5 times the maximum level of speech, or it may by determined statistically to balance incorrect AGC activation
against missed activation for undesirable noise.
The behavior for the AGC for the signal and threshold of Fig. 22 is
shown in Fig. 23. The undesirable noise rapidly exceeds the threshold and is eliminated by the AGC. A detail of the AGC graceful shutdown from Fig. 23 is shown in Fig. 24, wherein the microphone signal is multiplied by a factor at each
successive sample to cause an exponential decay of the signal output from the AGC.
Another example of a threshold is provided by comparing the absolute
difference between two successive samples of a microphone signal to a fixed number. Since the microphone signal is bandlimited. the maximum that the signal can change
between successive samples is limited. For example, suppose that the sample rate is
10 KHz and the microphone is 4th order Butterworth bandpass limited between 300 Hz and 3 KHz. The maximum the bandpassed signal can change is approximately 43% of the largest acceptable step change input to the bandpass filter. A difference between successive samples that exceeds a threshold of 0.43 should activate the AGC. This threshold may also be determined empirically, since normal voice signals rarely
contain maximum allowable amplitude step changes.
The determination of a simple template is shown in Fig. 25. The
loudspeaker signal containing speech exhibits a characteristic power spectrum, as seen
in the lower curve in Fig. 25. The power spectrum is determined from a short time
history of the loudspeaker signal via a Fast Fourier Transform (FFT), by a technique well known in the art. The template in this example is determined as a Lognormal
distribution that exceeds the maximum of the speech power spectrum by
approximately 8 dB. In operation, the power spectrum of short time histories of data
is compared to the template. Any excess causes activation of the AGC. The template in this example causes AGC activation for tonal noise or broadband noise particularly above about 1.8 KHz.
In the testing of the mini-van yielding the results of Fig. 10, a number
of the parameters were assigned values to provide good transient detection and
response. The choices of these assigned values contributed to the effective processing and were generally made base on the hardware used and the associated electrical noise, as well as in accordance with theoretical factors. However, while the
advantageous choices for the assigned valued for the tested system are set forth below,
it will be understood by those of ordinary skill in the art that the particular choices for other systems will similarly depend on the particular construction and operation of those systems, as well as any other factors that a designer might wish to incorporate.
Therefore the present invention is not limited to these choices.
Thus, in the test system, a transient is detected when any microphone or loudspeaker voltage reaches init-mic-threshold or init-spkr-threshold, respectively. These thresholds were chosen to preclude saturation of the respective microphone or
loudspeaker, since, if saturation occurs, the echo cancellation operation diverges (i.e.
the relationship between the input and the output, as seen by the LMS algorithm, changes). The thresholds should be set to preclude any sounds above the maximum
desired level of speech to be amplified. An advantageous value for both thresholds is
0.9.
When a transient is detected, the system shuts off adaptation for a
selected number of samples at the sample rate Fs3 which in the test system is 5 KHz.
This is so that the SEF 300 and the AEC 400 will not adapt their operations to the
transient. This number of samples is defined by a variable adapt-off-count, and
should be long enough for the cabin to fully ring down. This ring down time is
parameterized as TAPS, which is the length of time it takes for the mini-van to ring down when the sample rate is Fs. For an echo to decay 20 dB, this was found to be approximately 40 ms. TAPS increases linearly with Fs.
It should also be noted that TAPS represents the size of the Least Mean
Squares filters LMS (see Fig. 3) that model the acoustics. These filters should be long
enough that the largest transfer function has decayed to approximately 25 dB down from its maximum. Such long transfer functions have an inherently smaller magnitude due to the natural acoustic attenuation.
In the test system, it was found that a suitable value for TAPS was 200
and that a suitable value for adapt-off-count was 2*TAPS, i.e. 80 ms at Fs = 5 KHz.
The variable adapt-off-count is reset to 2*TAPS if multiple transients occur. At the end of a transient, the SEF 300 is also reset. Finally, when the output is being shut off due to a transient (fade-out),
a parameter OUTPUT-DECAY-RATE is used as a multiplier of the loudspeaker value
each sample period. A suitable value is 0.8, which provides an exponential decay that
avoids a "click" associated with abruptly setting the loudspeaker output to zero. A corresponding ramp-on at the end of the transient may also be provided for fade-in.
Thus, the advantageous AGC provides improved control to aid voice clarity and preclude the amplification of undesirable noises.
As mentioned above in connection with Fig. 17, an input from a user's
manual volume control is used in performing the automatic gain control. A further
aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS.
In particular, while the CCS is intended to incorporate sufficient
automatic control to operate satisfactorily one the initial settings are made, it is of course desirable to incorporate various manual controls to be operated by the driver and passengers to customize its operation. In this aspect of the present invention, the
user interface enables customized use of the plural microphones and loudspeakers.
While the user interface of the present invention may be used with many different
cabin communication systems, its use is enhanced through the superior processing of
the CCS employing the SEF 300 and the AEC 400, wherein superior microphone independence, echo cancellation and noise elimination are provided.
As shown in Fig. 2, the CCS of the present invention provides plural
microphones including, for example, one directed to pick up speech from the driver's
seat and one each to pick up speech at each passenger speech. Similarly, the CCS may provide a respective loudspeaker for each of the driver's seat and the passengers' seats to provide an output directed to the person in the seat. Accordingly, since the
sound pickup and the sound output can be directed without uncomfortable echos, it is
possible, for example, for the driver to have a reasonably private conversation with a
passenger in the rear left seat (or any other selected passenger or passengers) by
muting all the microphones and loudspeakers other than the ones at the driver's seat
and the rear left seat. The advantageous user interface of the present invention enables such an operation.
Other useful operations are also enabled by the advantageous user
interface for facilitating communication. For example, the volumes of the various
loudspeakers may be adjusted, or the pickup of a microphone may be reduced to give the occupant of the respective seat more privacy. Similarly, the pickup of one microphone might be supplied for output to only a selected one or more of the
loudspeakers, while the pickup of another microphone might go to other loudspeakers.
In a different type of operation, a recorder may be actuated from the various seats to
record and play back a voice memo so that, for example, one passenger may record a
draft of a memo at one time and the same or another passenger can play it back at another time to recall the contents or revise them. As another example, one or more of the cabin's occupants can participate in a hands-free telephone call without
bothering the other occupants, or even several hands-free telephone calls can take
place without interference.
Fig. 26 illustrates the overall structure of the user interface in accordance with the present invention. As shown therein, each position within the cabin can have its own subsidiary interface, with the subsidiary interfaces being
connected to form the overall interface. Thus, in Fig. 26, the overall interface 900 includes a front interface
910, a rear interface 930 and a middle interface 950. Depending on the size of the
cabin and the number of seats, of course, more middle interfaces may be provided, or each of the front, middle and rear interfaces may be formed as respective left and right
interfaces.
The front interface 910 includes a manual control 912 for recording a
voice memo, a manual control 914 for playing back the voice memo, a manual control
916 for talking from the front of the cabin to the rear of the cabin, a manual control
918 for talking from the rear to the front, a manual control 920 for controlling the volume from the rear to the front, and a manual control 922 for participating in a
hands-free telephone call. Manual controls corresponding to controls 916, 918 and
920 (not shown) for communicating with the middle interface 950 are also provided. The rear interface 930 correspondingly includes a manual control 932 for recording a voice memo, a manual control 934 for playing back the voice memo, a manual control 936 for talking from the rear of the cabin to the front of the cabin, a
manual control 938 for talking from the front to the rear, a manual control 940 for
controlling the volume from the front to the rear, and a manual control 942 for
participating in a hands-free telephone call. Manual controls corresponding to controls 936, 938 and 940 (not shown) for communicating with the middle interface
950 are also provided.
The middle interface 950 has a corresponding construction, as do any
other middle, left or right interfaces.
The incorporation of the user interface 900 in the CCS is illustrated in Fig. 27, wherein the elements of the user interface are contained in box 960 (labeled "Kl "), box 962 (labeled "K2") and box 964 (labeled "Voice Memo"). The structure
and connections may advantageously be entirely symmetric for any number of users.
In a two input, two output vehicle system, such as the one in Fig. 3 and the one in Fig.
27, the structure is symmetric from front to back and from back to front. In a
preferred embodiment, this symmetry holds for any number of inputs and outputs. It
is possible, however, to any number of user interfaces with different functions available to each.
Since the basic user interface is symmetric, it will be described in terms
of Kl 960 and the upper half of Voice Memo 964. The interior structure 1000 of Kl
960 and the upper half of Voice Memo 964 are illustrated in Fig. 28, and it will be
understood that the interior structure of K2 962 and the lower half of Voice Memo 964 are symmetrically identical thereto.
As shown in Fig. 27, the output of the Wiener SEF WI 966
(constructed as the SEF 300) is connected to Kl 960. More specifically, as shown in
Fig. 28, this output is fed to an amplifier 1002 with a fixed gain Kl. The output of amplifier 1002 is connected to a summer 1004 under the control of a user interface three-way switch 1006. This switch 1006 allows or disallows connection of voice
from the front to the rear via front user interface switch control 918. Similarly, rear
user interface switch control 936 allows or disallows connection of voice from front to rear. The most recently operated switch control has precedence in allowing or
disallowing connection.
The output of the summer 1004 is connected to the volume control
920, which is in the form of a variable amplifier for effecting volume control for a user in the rear position. This volume control 920 is limited by a gain limiter 1010 to
preclude inadvertent excessive volume.
The output of the amplifier 1002 may also be sent to a cell phone via
control 922. When activated, an amplified and noise filtered voice from the front microphone is sent to the cell phone for transmission to a remote receiver. Incoming
cell phone signals may be routed to the rear via control 942. In a preferred
embodiment, these are separate switches which, with their symmetric counterparts,
allow any microphone signal to be sent to the cell phone and any incoming cell phone
signal to be routed to any of the loudspeakers. It is possible, however, to make these switches three-way switches, with the most recently operated switch having
precedence in allowing or disallowing connection.
The Voice Memo function consists of user interface controls, control
logic 1012 and a voice storage device 1014. In a preferred embodiment, the voice storage device 1014 is a digital random access memory (RAM). However, any
sequential access or random access device capable of digital or analog storage will
suffice. In particular, Flash Electrically Erasable Programmable Read Only Memory
(EEPROM) or ferro-electric digital memory devices may be used if preservation of the stored voice is desired in the event of a power loss.
The voice storage control logic 1012 operates under user interface
controls to record, using for example control 912, and playback, using for example
control 934, a voice message stored in the voice storage device 1014. In a preferred embodiment, the activation of control 912 stores the current digital voice sample from the front microphone in the voice storage device at an address specified by an address
counter, increments the address counter and checks whether any storage remains unused. The activation of the playback control 934 rests the address counter, reads the voice sample at the counter's address for output via a summer 1016 to the rear
loudspeaker, increments the address counter and checks for more voice samples
remaining. The voice storage logic 1012 allows the storage of logically separate
samples by maintaining separate start and ending addressed for the different messages.
The symmetric controls (not shown) allow any user to record and playback from his own location.
The voice storage logic 1012 may also provide feedback to the use of
the number of stored messages, their duration, the remaining storage capacity while
recording and other information.
Although the invention has been shown and described with respect to
exemplary embodiments thereof, it should be understood by those skilled in the art
that the description is exemplary rather than limiting in nature, and that many changes,
additions and omissions are possible without departing from the scope and spirit of the present invention, which should be determined from the following claims.

Claims

1. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication
system comprising:
a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into an audio signal, the audio signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise;
a speech enhancement filter for removing the second component from
the audio signal to provide a filtered audio signal, said speech enhancement filter removing the second component by processing the audio signal by a method taking
into account elements of psycho-acoustics of a human ear; and a loudspeaker for outputting a clarified voice in response to the filtered
audio signal.
2. The cabin communication system of claim 1, wherein one of
the elements of psycho-acoustics taken into account is that the human ear perceives
sound at different frequencies on a non-linear mel-scale.
3. The cabin communication system of claim 2, wherein said speech enhancement filter takes the one element into account by smoothing a spectrum of the audio signal over larger windows at higher frequencies.
4. The cabin communication system of claim 1 , wherein one of
the elements of psycho-acoustics taken into account is that speech is anti-causal and
noise is causal.
5. The cabin communication system of claim 4, wherein said speech enhancement filter takes the one element into account by filtering the audio signal with a causal filter.
6. The cabin communication system of claim 5, wherein said causal filter is a causal Wiener filter.
7. The cabin communication system of claim 6, wherein said
causal Wiener filter takes a causal part of a weighted least squares Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
8. The cabin communication system of claim 1, wherein said
speech enhancement filter uses temporal smoothing of a Wiener filter calculation.
9. The cabin communication system of claim 1 , wherein said
speech enhancement filter uses frequency smoothing of a Wiener filter calculation.
10. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication
system comprising: a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into an audio signal, the audio
signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise;
a speech enhancement filter for removing the second component from the audio signal to provide a filtered audio signal; and a loudspeaker for outputting a clarified voice in response to the filtered
audio signal,
wherein said speech enhancement filter comprises:
a first filter element that smooths a spectrum of the audio signal over larger windows at higher frequencies in accordance with a mel-scale to provide a
smoothed audio signal;
a second filter element that filters the smoothed audio signal
with a causal Wiener filter to provide a Wiener filter result; and a third filter element that performs at one of temporal and
frequency smoothing of the Wiener filter result to provide the filtered audio signal.
1 1. The cabin communication system of claim 10, wherein said
second filter element provides the Wiener filter result by taking a causal part of a weighted least squares Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
12. The cabin communication system of claim 1 1, wherein said
third filter element performs both temporal and frequency smoothing of the Wiener
filter result.
13. A speech enhancement filter for improving clarity of a voice
represented by an audio signal, said speech enhancement filter comprising:
a first filter element that smooths a spectrum of the audio signal over
larger windows at higher frequencies in accordance with a mel-scale to provide a
smoothed audio signal; a second filter element that filters the smoothed audio signal with a
causal Wiener filter to provide a Wiener filter result; and
a third filter element that performs at one of temporal and frequency smoothing of the Wiener filter result to provide a filtered audio signal corresponding to a clarified version of the spoken voice.
14. The speech enhancement filter of claim 13, wherein said second
filter element provides the Wiener filter result by taking a causal part of a weighted least squares Wiener calculation in which each weight is inversely proportional to an
energy in a respective frequency bin.
15. The speech enhancement filter of claim 14, wherein said third
filter element performs both temporal and frequency smoothing of the Wiener filter result.
16. A movable vehicle cabin having ambient noise, said cabin
comprising: means for causing movement of said cabin, wherein at least a portion of the ambient noise during movement is a result of the movement; and
a cabin communication system for improving clarity of a voice spoken
within an interior of said cabin, wherein said cabin communication system comprises: a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into an audio signal, the audio
signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise;
a speech enhancement filter for removing the second component from the audio signal to provide a filtered audio signal, said speech enhancement filter removing the second component by processing the audio signal by a method taking
into account elements of psycho-acoustics of a human ear; and
a loudspeaker for outputting a clarified voice in response to the filtered
audio signal.
17. The cabin of claim 16, wherein one of the elements of psycho¬
acoustics taken into account is that the human ear perceives sound at different
frequencies on a non-linear mel-scale.
18. The cabin of claim 17, wherein said speech enhancement filter
takes the one element into account by smoothing a spectrum of the audio signal over
larger windows at higher frequencies.
19. The cabin of claim 16, wherein one of the elements of psycho¬
acoustics taken into account is that speech is anti-causal and noise is causal.
20. The cabin of claim 19, wherein said speech enhancement filter
takes the one element into account by filtering the audio signal with a causal filter.
21. The cabin of claim 20, wherein said causal filter is a causal Wiener filter.
22. The cabin of claim 21, wherein said causal Wiener filter takes a causal part of a weighted least squares Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
23. The cabin of claim 16, wherein said speech enhancement filter
uses temporal smoothing of a Wiener filter calculation.
24. The cabin of claim 16, wherein said speech enhancement filter
uses frequency smoothing of a Wiener filter calculation.
25. A movable vehicle cabin having ambient noise, said cabin
comprising: means for causing movement of said cabin, wherein at least a portion
of the ambient noise during movement is a result of the movement; and a cabin communication system for improving clarity of a voice spoken
within an interior of said cabin, wherein said cabin communication system comprises:
a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into an audio signal, the audio
signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise; a speech enhancement filter for removing the second component from
the audio signal to provide a filtered audio signal; and
a loudspeaker for outputting a clarified voice in response to the filtered audio signal,
wherein said speech enhancement filter comprises:
a first filter element that smooths a spectrum of the audio signal
over larger windows at higher frequencies in accordance with a mel-scale to provide a smoothed audio signal; a second filter element that filters the smoothed audio signal
with a causal Wiener filter to provide a Wiener filter result; and
a third filter element that performs at one of temporal and
frequency smoothing of the Wiener filter result to provide the filtered audio signal.
26. The cabin of claim 25, wherein said second filter element
provides the Wiener filter result by taking a causal part of a weighted least squares
Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
27. The cabin of claim 25, wherein said third filter element
performs both temporal and frequency smoothing of the Wiener filter result.
28. A cabin communication system for improving clarity of a voice spoken within an interior cabin having ambient noise, said cabin communication system comprising:
a first microphone, positioned at a first location within the cabin, for
receiving the spoken voice and the ambient noise and for converting the spoken voice
into a first audio signal, the first audio signal having a first component corresponding to the ambient noise;
a second microphone, positioned at a second location within the cabin,
for receiving the spoken voice and the ambient noise and for converting the spoken voice into a second audio signal, the second audio signal having a second component corresponding to the ambient noise;
a processor for summing the first and second audio signals to provide a
resultant audio signal that is indicative of a detection location within the cabin relative
to the first and second locations of said first and second microphones; a speech enhancement filter for filtering the resultant audio signal by removing the first and second components to provide a filtered audio signal;
an echo cancellation system receiving the filtered audio signal and
outputting an echo-cancelled audio signal; and a loudspeaker for converting the echo-cancelled audio signal into an
output reproduced voice within the cabin including a third component indicative of the first and second audio signals. wherein said loudspeaker and said first and second microphones are
acoustically coupled so that the output reproduced voice is fed back from said
loudspeaker to be received by said first and second microphones and converted with
the spoken voice into the first and second audio signals,
wherein said echo cancellation system removes from the filtered audio
signal any portion of the filtered audio signal corresponding to the third component,
and
wherein said speech enhancement filter removes the first and second components by processing the resultant audio signal by a method taking into account
elements of psycho-acoustics of a human ear.
29. The cabin communication system of claim 28, wherein one of the elements of psycho-acoustics taken into account is that the human ear perceives sound at different frequencies on a non-linear mel-scale.
30. The cabin communication system of claim 29, wherein said speech enhancement filter takes the one element into account by smoothing a
spectrum of the resultant audio signal over larger windows at higher frequencies.
31. The cabin communication system of claim 28, wherein one of
the elements of psycho-acoustics taken into account is that speech is anti-causal and noise is causal.
32. The cabin communication system of claim 31 , wherein said
speech enhancement filter takes the one element into account by filtering the resultant
audio signal with a causal filter.
33. The cabin communication system of claim 32, wherein said causal filter is a causal Wiener filter.
34. The cabin communication system of claim 33, wherein said
causal Wiener filter takes a causal part of a weighted least squares Wiener calculation
in which each weight is inversely proportional to an energy in a respective frequency
bin.
35. The cabin communication system of claim 28, wherein said
speech enhancement filter uses temporal smoothing of a Wiener filter calculation.
36. The cabin communication system of claim 35, wherein said speech enhancement filter uses frequency smoothing of a Wiener filter calculation.
37. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication
system comprising: a first microphone, positioned at a first location within the cabin, for
receiving the spoken voice and the ambient noise and for converting the spoken voice into a first audio signal, the first audio signal having a first component corresponding
to the ambient noise;
a second microphone, positioned at a second location within the cabin,
for receiving the spoken voice and the ambient noise and for converting the spoken
voice into a seeond audio signal, the second audio signal having a second component corresponding to the ambient noise;
a processor for summing the first and second audio signals to provide a resultant audio signal that is indicative of a detection location within the cabin relative
to the first and second locations of said first and second microphones;
a speech enhancement filter for filtering the resultant audio signal by removing the first and second components to provide a filtered audio signal;
an echo cancellation system receiving the filtered audio signal and
outputting an echo-cancelled audio signal; and
a loudspeaker for converting the echo-cancelled audio signal into an output reproduced voice within the cabin including a third component indicative of the first and second audio signals,
wherein said loudspeaker and said first and second microphones are
acoustically coupled so that the output reproduced voice is fed back from said
loudspeaker to be received by said first and second microphones and converted with the spoken voice into the first and second audio signals, wherein said echo cancellation system removes from the filtered audio signal any portion of the filtered audio signal corresponding to the third component
and
wherein said speech enhancement filter comprises: a first filter element that smooths a spectrum of the resultant
audio signal over larger windows at higher frequencies in accordance with a mel-scale
to provide a smoothed audio signal;
a second filter element that filters the smoothed audio signal with a causal Wiener filter to provide a Wiener filter result; and
a third filter element that performs at one of temporal and frequency smoothing of the Wiener filter result to provide the filtered audio signal.
38. The cabin communication system of claim 37, wherein said second filter element provides the Wiener filter result by taking a causal part of a weighted least squares Wiener calculation in which each weight is inversely
proportional to an energy in a respective frequency bin.
39. The cabin communication system of claim 38, wherein said third filter element performs both temporal smoothing and frequency smoothing of the
Wiener filter result.
40. A movable vehicle cabin having ambient noise, said cabin
comprising: means for causing movement of said cabin, wherein at least a portion
of the ambient noise during movement is a result of the movement; and
a cabin communication system for improving clarity of a voice spoken within an interior of said cabin , said cabin communication system comprising: a first microphone, positioned at a first location within the cabin, for
receiving the spoken voice and the ambient noise and for converting the spoken voice
into a first audio signal, the first audio signal having a first component corresponding
to the ambient noise; a second microphone, positioned at a second location within the cabin,
for receiving the spoken voice and the ambient noise and for converting the spoken
voice into a second audio signal, the second audio signal having a second component
corresponding to the ambient noise; a processor for summing the first and second audio signals to provide a resultant audio signal that is indicative of a detection location within the cabin relative
to the first and second locations of said first and second microphones;
a speech enhancement filter for filtering the resultant audio signal by
removing the first and second components to provide a filtered audio signal; an echo cancellation system receiving the filtered audio signal and outputting an echo-cancelled audio signal; and
a loudspeaker for converting the echo-cancelled audio signal into an
output reproduced voice within the cabin including a third component indicative of
the first and second audio signals, wherein said loudspeaker and said first and second microphones are
acoustically coupled so that the output reproduced voice is fed back from said
loudspeaker to be received by said first and second microphones and converted with
the spoken voice into the first and second audio signals, wherein said echo cancellation system removes from the filtered audio
signal any portion of the filtered audio signal corresponding to the third component,
and
wherein said speech enhancement filter removes the first and second
components by processing the resultant audio signal by a method taking into account
elements of psycho-acoustics of a human ear.
41. The cabin of claim 40, wherein one of the elements of psycho¬
acoustics taken into account is that the human ear perceives sound at different frequencies on a non-linear mel-scale.
42. The cabin of claim 29, wherein said speech enhancement filter takes the one element into account by smoothing a spectrum of the resultant audio signal over larger windows at higher frequencies.
43. The cabin of claim 40, wherein one of the elements of psycho¬
acoustics taken into account is that speech is anti-causal and noise is causal.
44. The cabin of claim 43, wherein said speech enhancement filter
takes the one element into account by filtering the resultant audio signal with a causal
filter.
45. The cabin of claim 44, wherein said causal filter is a causal
Wiener filter.
46. The cabin of claim 45, wherein said causal Wiener filter takes a
causal part of a weighted least squares Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
47. The cabin of claim 40, wherein said speech enhancement filter
uses temporal smoothing of a Wiener filter calculation.
48. The cabin of claim 40, wherein said speech enhancement filter
uses frequency smoothing of a Wiener filter calculation.
49. A movable vehicle cabin having ambient noise, said cabin
comprising: means for causing movement of said cabin, wherein at least a portion
of the ambient noise during movement is a result of the movement; and
a cabin communication system for improving clarity of a voice spoken within an interior of said cabin , said cabin communication system comprising: a first microphone, positioned at a first location within the cabin, for
receiving the spoken voice and the ambient noise and for converting the spoken voice
into a first audio signal, the first audio signal having a first component corresponding to the ambient noise; a second microphone, positioned at a second location within the cabin,
for receiving the spoken voice and the ambient noise and for converting the spoken voice into a second audio signal, the second audio signal having a second component corresponding to the ambient noise;
a processor for summing the first and second audio signals to provide a
resultant audio signal that is indicative of a detection location within the cabin relative
to the first and second locations of said first and second microphones;
a speech enhancement filter for filtering the resultant audio signal by removing the first and second components to provide a filtered audio signal;
an echo cancellation system receiving the filtered audio signal and
outputting an echo-cancelled audio signal; and
a loudspeaker for converting the echo-cancelled audio signal into an output reproduced voice within the cabin including a third component indicative of the first and second audio signals,
wherein said loudspeaker and said first and second microphones are
acoustically coupled so that the output reproduced voice is fed back from said
loudspeaker to be received by said first and second microphones and converted with the spoken voice into the first and second audio signals,
wherein said echo cancellation system removes from the filtered audio
signal any portion of the filtered audio signal corresponding to the third component, and
wherein said speech enhancement filter comprises: a first filter element that smooths a spectrum of the resultant audio signal over larger windows at higher frequencies in accordance with a mel-scale to provide a smoothed audio signal; a second filter element that filters the smoothed audio signal
with a causal Wiener filter to provide a Wiener filter result; and
a third filter element that performs at one of temporal and frequency smoothing of the Wiener filter result to provide the filtered audio signal.
50. The cabin of claim 49, wherein said second filter element
provides the Wiener filter result by taking a causal part of a weighted least squares
Wiener calculation in which each weight is inversely proportional to an energy in a respective frequency bin.
51. The cabin of claim 50, wherein said third filter element
performs both temporal smoothing and frequency smoothing of the Wiener filter result.
52. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication
system comprising: an adaptive speech enhancement filter for receiving an audio signal that
includes a first component indicative of the spoken voice, a second component
indicative of a feedback echo of the spoken voice and a third component indicative of
the ambient noise, said speech enhancement filter filtering the audio signal by removing the third component to provide a filtered audio signal, said speech enhancement filter adapting to the audio signal at a first adaptation rate; and an adaptive acoustic echo cancellation system for receiving the filtered audio signal and removing the second component in the filtered audio signal to
provide an echo-cancelled audio signal, said echo cancellation signal adapting to the
filtered audio signal at a second adaption rate,
wherein said first adaptation rate and said second adaptation rate are different from each other so that said speech enhancement filter does not adapt in
response to operation of said echo-cancellation system and said echo-cancellation
system does not adapt in response to operation of said speech enhancement filter.
53. The cabin communication system of claim 52, wherein said first adaptation rate is greater than said second adaptation rate.
54. The cabin communication system of claim 53, wherein said
first adaptation rate of said speech enhancement filter is controlled by a step size β, wherein said second adaptation rate of said echo cancellation system is controlled by a step size μ, and wherein β is much less than μ.
55. The cabin communication system of claim 54, wherein said audio signal is sampled at a sampling frequency Fs, wherein n is the number of samples of the audio signal accumulated for block processing by said speech
enhancement filter, wherein said echo cancellation system includes a plurality of
filters and a variable 1/k is the fraction of said plurality of filters that are updated each sampling period, and wherein: β - -. _μ ■< - Fs k n
56. The cabin communication system of claim 52, wherein said
first adaptation rate is an adaptation rate of a long term noise estimate by said speech
enhancement filter, said first adaptation rate being much smaller than said second
adaptation rate, and said second adaptation rate being much smaller than a basic filter
rate of said speech enhancement filter.
57. The cabin communication system of claim 52, further
comprising random noise adding means for adding random noise to the filtered audio
signal, said echo cancellation system using the filtered audio signal with the random
noise added thereto to identify the second component.
58. The cabin communication system of claim 57, wherein the
random noise is a dither signal.
59. The cabin communication system of claim 58, wherein the cabin is movable at a variable velocity and the dither signal is scaled to the velocity.
60. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication system comprising: an adaptive speech enhancement filter for receiving an audio signal that
includes a first component indicative of the spoken voice, a second component indicative of a feedback echo of the spoken voice and a third component indicative of
the ambient noise, said speech enhancement filter filtering the audio signal by
removing the third component to provide a filtered audio signal; and
an adaptive acoustic echo cancellation system for receiving the filtered audio signal and removing the second component in the filtered audio signal to provide an echo-cancelled audio signal,
wherein said speech enhancement filter and said echo cancellation
system are coupled, and
wherein said cabin communication performs a coupled on-line identification of noise and echos in the audio signal to effect closed loop control of the
adaptations of said speech enhancement filter and said echo cancellation system.
61. The cabin communication system of claim 60, wherein said
speech enhancement filter adapts to the audio signal at a first adaptation rate and said echo cancellation signal adapts to the filtered audio signal at a second adaption rate, and wherein said first adaptation rate and said second adaptation rate are different
from each other so that said speech enhancement filter does not adapt in response to
operation of said echo-cancellation system and said echo-cancellation system does not adapt in response to operation of said speech enhancement filter.
62. The cabin communication system of claim 61 , wherein said
first adaptation rate is greater than said second adaptation rate.
63. The cabin communication system of claim 62, wherein said
first adaptation rate of said speech enhancement filter is controlled by a step size β,
wherein said second adaptation rate of said echo cancellation system is controlled by a
step size μ, and wherein β is much less than μ.
64. The cabin communication system of claim 63, wherein said
audio signal is sampled at a sampling frequency Fs, wherein n is the number of samples of the audio signal accumulated for block processing by said speech
enhancement filter, wherein said echo cancellation system includes a plurality of
filters and a variable 1/k is the fraction of said plurality of filters that are updated each sampling period, and wherein: β « x « !_ k n
65. The cabin communication system of claim 61 , wherein said
first adaptation rate is an adaptation rate of a long term noise estimate by said speech enhancement filter, said first adaptation rate being much smaller than said second
adaptation rate, and said second adaptation rate being much smaller than a basic filter rate of said speech enhancement filter.
66. The cabin communication system of claim 60, further comprising random noise adding means for adding random noise to the filtered audio signal, said echo cancellation system using the filtered audio signal with the random
noise added thereto to identify the second component.
67. The cabin communication system of claim 66, wherein the
random noise is a dither signal.
68. The cabin communication system of claim 67, wherein the cabin is movable at a variable velocity and the dither signal is scaled to the velocity.
69. A cabin communication system for improving clarity of a voice
spoken within an interior cabin having ambient noise, said cabin communication
system comprising: a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into a first audio signal, the
first audio signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise; an adaptive speech enhancement filter for filtering the first audio signal by removing the second component to provide a filtered audio signal, said speech
enhancement filter adapting to the first audio signal at a first adaptation rate;
an adaptive acoustic echo cancellation system for receiving the filtered audio signal and providing an echo-cancelled audio signal, said echo cancellation signal adapting to the filtered audio signal at a second adaption rate; and
a loudspeaker for converting the echo-cancelled audio signal into an
output reproduced voice within the cabin including a third component indicative of the first audio signal, wherein said loudspeaker and said microphone are acoustically coupled
so that the output reproduced voice is fed back from said loudspeaker to be received
by said microphone and converted with the spoken voice into the first audio signal.
wherein said echo cancellation system removes from the filtered audio
signal any portion of the filtered audio signal corresponding to the third component, and wherein said first adaptation rate and said second adaptation rate are
different from each other so that said speech enhancement filter does not adapt in
response to operation of said echo-cancellation system and said echo-cancellation
system does not adapt in response to operation of said speech enhancement filter.
70. The cabin communication system of claim 69, wherein said
first adaptation rate is greater than said second adaptation rate.
71. The cabin communication system of claim 70, wherein said first adaptation rate of said speech enhancement filter is controlled by a step size β,
wherein said second adaptation rate of said echo cancellation system is controlled by a
step size μ, and wherein β is much less than μ.
72. The cabin communication system of claim 71 , wherein said
first audio signal is sampled at a sampling frequency Fs, wherein n is the number of samples of the first audio signal accumulated for block processing by said speech
enhancement filter, wherein said echo cancellation system includes a plurality of filters and a variable 1/k is the fraction of said plurality of filters that are updated each
sampling period, and wherein: β -< < x F_ k ' n
73. The cabin communication system of claim 69, wherein said
first adaptation rate is an adaptation rate of a long term noise estimate by said speech
enhancement filter, said first adaptation rate is much smaller than said second adaptation rate, and said second adaptation rate is much smaller than a basic filter rate
of said speech enhancement filter.
74. The cabin communication system of claim 69, further comprising random noise adding means for adding random noise to the filtered audio signal, said echo cancellation system using the filtered audio signal with the random
noise added thereto to identify the third component.
75. The cabin communication system of claim 74, wherein the random noise is a dither signal.
76. The cabin communication system of claim 75, wherein the
cabin is movable at a variable velocity and the dither signal is scaled to the velocity.
77. A method for improving clarity of a voice spoken within an interior cabin having ambient noise, said method comprising the steps of: adaptively filtering, for speech enhancement, an audio signal that
includes a first component indicative of the spoken voice, a second component
indicative of a feedback echo of the spoken voice and a third component indicative of
the ambient noise, said filtering step removing the third component to provide a filtered audio signal, said filtering step adapting to the audio signal at a first adaptation rate; and
adaptively processing the filtered audio signal to remove the second
component by acoustic echo cancellation to provide an echo-cancelled audio signal,
said processing step adapting to the filtered audio signal at a second adaption rate, wherein said first adaptation rate and said second adaptation rate are
different from each other so that said filtering step does not adapt in response to
operation of said processing step and said processing step does not adapt in response to operation of said filtering step.
78. The method of claim 77, wherein said first adaptation rate is
greater than said second adaptation rate.
79. The method of claim 78, wherein said first adaptation rate of said filtering step is controlled by a step size β, wherein said second adaptation rate of
said processing step is controlled by a step size μ, and wherein β is much less than μ.
80. The method of claim 79, wherein the first audio signal is sampled at a sampling frequency FS3 wherein n is the number of samples of the first audio signal accumulated for block processing by said speech enhancement filter, wherein said processing step uses a plurality of filters and a variable 1/k is the fraction of the plurality of filters that are updated each sampling period, and wherein:
β < _μ < Fs k n
81. The method of claim 77, wherein said first adaptation rate is an adaptation rate of a long term noise estimate by said filtering step, said first adaptation
rate being much smaller than said second adaptation rate, and said second adaptation rate being much smaller than a basic filter rate of said filtering step.
82. The method of claim 77, further comprising the step of adding random noise to the filtered audio signal, processing step using the filtered audio signal with the random noise added thereto to identify the second component.
83. The method of claim 82, wherein the random noise is a dither
signal.
84. The method of claim 83, wherein the cabin is movable at a
variable velocity and the dither signal is scaled to the velocity.
85. An adaptive acoustic echo cancellation system for use in a cabin communication system for improving clarity of a voice spoken within an interior cabin, the cabin communication system including a microphone for receiving
the spoken voice and for converting the spoken voice and the ambient noise into a first
audio signal, the first audio signal having a first component corresponding to the spoken voice, the cabin communication system further including a loudspeaker for
outputting a second audio signal within the cabin, wherein the loudspeaker and the
microphone are acoustically coupled so that the second audio signal is fed back from the loudspeaker to be received by the microphone and converted with the spoken
voice into the first audio signal so that the second audio signal includes a second
component indicative of the first component, said echo cancellation system
comprising:
calculation means for adaptively calculating a first plurality of current impulse response coefficients of an acoustic transfer function between an output of the
loudspeaker and an input of the microphone based upon an error signal and a second
plurality of prior impulse response coefficients of the acoustic transfer function;
acoustic echo filter means for applying the first plurality of current impulse response coefficients to the first audio signal to remove from the first audio signal any portion of the first audio signal corresponding to the second component and
to provide an echo-cancelled audio signal, the loudspeaker converting the echo-
cancelled audio signal into the second audio signal; and
error signal calculation means for calculating the error signal by
calculating a difference between the first audio signal and the echo-cancelled audio signal.
86. The system of claim 85, wherein said acoustic echo filter means comprises a Least Mean Squares filter.
87. The system of claim 85, wherein a number of the first plurality of current impulse response coefficients is limited by a delay in computing the second
audio signal.
88. The system of claim 85, further comprising a speech
enhancement filter for filtering the first audio signal prior to the first audio signal being supplied as a filtered audio signal to said acoustic echo filter means, and
wherein a number of the first plurality of current impulse response coefficients is
limited by a sum of a delay in computing the second audio signal and a delay by said speech enhancement filter in filtering the first audio signal.
89. The system of claim 85, further comprising a speech
enhancement filter for filtering the first audio signal prior to the first audio signal
being supplied as a filtered audio signal to said acoustic echo filter means, wherein the cabin has an acoustic delay in transferring the second audio signal from the loudspeaker to the microphone, and wherein a number of the first plurality of current
impulse response coefficients is limited by a sum of a delay in computing the second
audio signal, a delay by said speech enhancement filter in filtering the first audio
signal and the acoustic delay.
90. A method for adaptive acoustic echo cancellation for use in a
cabin communication system for improving clarity of a voice spoken within an
interior cabin, the cabin communication system including a microphone for receiving the spoken voice and for converting the spoken voice and the ambient noise into a first audio signal, the first audio signal having a first component corresponding to the
spoken voice, the cabin communication system further including a loudspeaker for
outputting a second audio signal within the cabin, wherein the loudspeaker and the
microphone are acoustically coupled so that the second audio signal is fed back from
the loudspeaker to be received by the microphone and converted with the spoken
voice into the first audio signal so that the second audio signal includes a second
component indicative of the first component, said method comprising the steps of: adaptively calculating a first plurality of current impulse response coefficients of an acoustic transfer function between an output of the loudspeaker and
an input of the microphone based upon an error signal and a second plurality of prior
impulse response coefficients of the acoustic transfer function; applying the first plurality of current impulse response coefficients to
the first audio signal for acoustic echo cancellation to remove from the first audio signal any portion of the first audio signal corresponding to the second component and
to provide an echo-cancelled audio signal, the loudspeaker converting the echo-
cancelled audio signal into the second audio signal; and
calculating the error signal by calculating a difference between the first audio signal and the echo-cancelled audio signal.
91. The method of claim 90, wherein said applying step comprises
a Least Mean Squares filtering step.
92. The method of claim 90, wherein a number of the first plurality
of current impulse response coefficients is limited by a delay in computing the second
audio signal.
93. The method of claim 90, further comprising a speech enhancement filtering step of filtering the first audio signal prior to the first audio
signal being supplied as a filtered audio signal to said applying step, and wherein a
number of the first plurality of current impulse response coefficients is limited by a
sum of a delay in computing the second audio signal and a delay by said speech
enhancement filtering step in filtering the first audio signal.
94. The method of claim 90, further comprising a speech
enhancement filtering step of filtering the first audio signal prior to the first audio
signal being supplied as a filtered audio signal to said applying step, wherein the cabin has an acoustic delay in transferring the second audio signal from the loudspeaker to the microphone, and wherein a number of the first plurality of current impulse
response coefficients is limited by a sum of a delay in computing the second audio
signal, a delay by said speech enhancement filtering step in filtering the first audio
signal and the acoustic delay.
95. A movable vehicle cabin having ambient noise, said cabin comprising: means for causing movement of said cabin, wherein at least a portion of the ambient noise during movement is a result of the movement; and a cabin communication system for improving clarity of a voice spoken
within an interior of said cabin, wherein said cabin communication system comprises:
a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into a first audio signal, the
first audio signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise;
an adaptive speech enhancement filter for filtering the first audio signal by removing the second component to provide a filtered audio signal, said speech
enhancement filter adapting to the first audio signal at a first adaptation rate;
an adaptive acoustic echo cancellation system for receiving the filtered
audio signal and providing an echo-cancelled audio signal, said echo cancellation signal adapting to the filtered audio signal at a second adaption rate; and a loudspeaker for converting the echo-cancelled audio signal into an
output reproduced voice within the cabin including a third component indicative of
the first audio signal,
wherein said loudspeaker and said microphone are acoustically coupled so that the output reproduced voice is fed back from said loudspeaker to be received by said microphone and converted with the spoken voice into the first audio signal.
wherein said echo cancellation system removes from the filtered audio
signal any portion of the filtered audio signal corresponding to the third component,
and wherein said first adaptation rate and said second adaptation rate are
different from each other so that said speech enhancement filter does not adapt in response to operation of said echo-cancellation system and said echo-cancellation
system does not adapt in response to operation of said speech enhancement filter.
96. The cabin of claim 95, wherein said first adaptation rate is greater than said second adaptation rate.
97. The cabin of claim 96, wherein said first adaptation rate of said
speech enhancement filter is controlled by a step size β, wherein said second adaptation rate of said echo cancellation system is controlled by a step size μ, and wherein β is much less than μ.
98. The cabin of claim 97, wherein said first audio signal is
sampled at a sampling frequency Fs, wherein n is the number of samples of the first audio signal accumulated for block processing by said speech enhancement filter, wherein said echo cancellation system includes a plurality of filters and a variable 1/k
is the fraction of said plurality of filters that are updated each sampling period, and
wherein:
k '
99. The cabin of claim 95, wherein said first adaptation rate is an
adaptation rate of a long term noise estimate by said speech enhancement filter, said first adaptation rate is much smaller than said second adaptation rate, and said second adaptation rate is much smaller than a basic filter rate of said speech enhancement filter.
100. The cabin of claim 95, further comprising random noise adding
means for adding random noise to the filtered audio signal, said echo cancellation
system using the filtered audio signal with the random noise added thereto to identify
the third component.
101. The cabin of claim 100, wherein the random noise is a dither signal.
102. The cabin of claim 101, wherein the cabin is movable at a
variable velocity and the dither signal is scaled to the velocity.
103. An automatic gain control for a cabin communication system
for improving clarity of a voice spoken within a movable interior cabin having
ambient noise, the ambient noise intermittently including an undesirable transient noise, said automatic gain control comprising: a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into a first audio signal, the
first audio signal including a first component corresponding to the spoken voice and a
second component corresponding to the ambient noise; a parameter estimation processor for receiving the first audio signal
and for determining parameters for deciding whether or not the second component corresponds to an undesirable transient noise;
decision logic for deciding, based on the parameters, whether or not the second component corresponds to an undesirable transient signal; a filter for filtering the first audio signal to provide a filtered audio
signal;
a loudspeaker for outputting a reproduced voice in response to the
filtered audio signal with a variable gain at a second location in the cabin; and a control signal generating circuit for generating an automatic gain
control signal in response to said decision logic,
wherein when said decision logic decides that the second component corresponds to an undesirable transient signal, said control signal generating circuit generates the automatic gain control signal so as to gracefully set the gain of said loudspeaker to zero for fade-out.
104. The automatic gain control of claim 103, wherein the parameters include at least one parameter establishing a threshold.
105. The automatic gain control of claim 104, wherein the first audio
signal is a sampled audio signal and wherein said parameter estimation processor
determines the at least one parameter establishing the threshold based upon a single sample of the first audio signal.
106. The automatic gain control of claim 104, wherein the first audio
signal is a sampled audio signal and wherein said parameter estimation processor determines the at least one parameter establishing the threshold based upon a plurality of samples of the first audio signal.
107. The automatic gain control of claim 103, wherein the parameters include at least one parameter establishing a template.
108. The automatic gain control of claim 107, wherein the first audio
signal is a sampled audio signal and wherein said parameter estimation processor determines the at least one parameter establishing the template based upon a single sample of the first audio signal.
109. The automatic gain control of claim 107, wherein the first audio signal is a sampled audio signal and wherein said parameter estimation processor determines the at least one parameter establishing the template based upon a plurality
of samples of the first audio signal.
110. The automatic gain control of claim 103, wherein said
parameter estimation processor updates the parameters a selected one of continuously, at set time intervals, in response to set conditions and in response to variable
conditions..
11 1. The automatic gain control of claim 103, wherein after the
automatic gain control signal has set the gain to zero, said control signal generating circuit generates the automatic gain control signal after a predetermined time interval
so as to gracefully increase the gain of said loudspeaker from zero for fade-in.
1 12. The automatic gain control of claim 111 , wherein the
predetermined time interval corresponds to a ring-down time of the cabin.
1 13. An automatic gain control for a cabin communication system
for improving clarity of a voice spoken within a movable interior cabin having
ambient noise, the ambient noise intermittently including an undesirable transient
noise, said automatic gain control comprising: a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into a first audio signal;
a filter for filtering the first audio signal to provide a filtered audio
signal, the filtered audio signal including a first component corresponding to the spoken voice and a second component corresponding to the ambient noise; a parameter estimation processor for receiving the filtered audio signal
and for determining parameters for deciding whether or not the second component
corresponds to an undesirable transient noise;
decision logic for deciding, based on the parameters, whether or not the second component corresponds to an undesirable transient signal; a loudspeaker for outputting a reproduced voice in response to the
filtered audio signal with a variable gain at a second location in the cabin; and
a control signal generating circuit for generating an automatic gain
control signal in response to said decision logic, wherein when said decision logic decides that the second component corresponds to an undesirable transient signal, said control signal generating circuit generates the automatic gain control signal so as to gracefully set the gain of said
loudspeaker to zero for fade-out.
114. The automatic gain control of claim 1 13, wherein the parameters include at least one parameter establishing a threshold.
115. The automatic gain control of claim 114, wherein the filtered
audio signal is a sampled audio signal and wherein said parameter estimation
processor determines the at least one parameter establishing the threshold based upon a single sample of the filtered audio signal.
116. The automatic gain control of claim 114, wherein the filtered
audio signal is a sampled audio signal and wherein said parameter estimation processor determines the at least one parameter establishing the threshold based upon a plurality of samples of the filtered audio signal.
117. The automatic gain control of claim 113, wherein the
parameters include at least one parameter establishing a template.
118. The automatic gain control of claim 117, wherein the filtered
audio signal is a sampled audio signal and wherein said parameter estimation processor determines the at least one parameter establishing the template based upon a single sample of the filtered audio signal.
1 19. The automatic gain control of claim 117, wherein the filtered audio signal is a sampled audio signal and wherein said parameter estimation
processor determines the at least one parameter establishing the template based upon a
plurality of samples of the filtered audio signal.
120. The automatic gain control of claim 113, wherein said parameter estimation processor updates the parameters a selected one of continuously, at set time intervals, in response to set conditions and in response to variable
conditions.
121. The automatic gain control of claim 113, wherein after the automatic gain control signal has set the gain to zero, said control signal generating
circuit generates the automatic gain control signal after a predetermined time interval
so as to gracefully increase the gain of said loudspeaker from zero for fade-in.
122. The automatic gain control of claim 121, wherein the
predetermined time interval corresponds to a ring-down time of the cabin.
123. An automatic gain control method for use in a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the ambient noise intermittently including an
undesirable transient noise, said method comprising the steps of:
receiving the spoken voice and the ambient noise at a first location in the cabin and for converting the spoken voice and the ambient noise into a first audio signal, the first audio signal including a first component corresponding to the spoken
voice and a second component corresponding to the ambient noise;
determining, in response to the first audio signal, parameters for
deciding whether or not the second component corresponds to an undesirable transient noise;
deciding, based on the parameters, whether or not the second
component corresponds to an undesirable transient signal; filtering the first audio signal to provide a filtered audio signal;
outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location in the cabin; and
generating an automatic gain control signal in response to said deciding
step, wherein when said deciding step decides that the second component corresponds to an undesirable transient signal, said generating step generates the
automatic gain control signal so as to gracefully set the gain of said outputting step to
zero for fade-out.
124. An automatic gain control method for use in a cabin communication system for improving clarity of a voice spoken within a movable
interior cabin having ambient noise, the ambient noise intermittently including an
undesirable transient noise, said method comprising the steps of: receiving the spoken voice and the ambient noise at a first location in the cabin and for converting the spoken voice and the ambient noise into a first audio
signal; filtering the first audio signal to provide a filtered audio signal, the
filtered audio signal including a first component corresponding to the spoken voice
and a second component corresponding to the ambient noise;
determining parameters for deciding whether or not the second
component corresponds to an undesirable transient noise; deciding, based on the parameters, whether or not the second
component corresponds to an undesirable transient signal;
outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location in the cabin; and
generating an automatic gain control signal in response to said deciding
step, wherein when said deciding step decides that the second component
corresponds to an undesirable transient signal, said generating step generates the automatic gain control signal so as to gracefully set the gain of said outputting step to
zero for fade-out.
125. A user interface for a cabin communication system for
improving clarity of a voice spoken within an interior cabin having at least first,
second and third seat locations, wherein the cabin communication system includes a first microphone for direcfionally receiving a first spoken voice from the first seat location and a first loudspeaker for outputting a first reproduced voice at the first seat
location, a second microphone for direcfionally receiving a second spoken voice from the second seat location and a second loudspeaker for outputting a second reproduced voice at the second seat location, and a third microphone for direcfionally receiving a third spoken voice from the third seat location and a third loudspeaker for outputting a
third reproduced voice at the third seat location, the cabin communication system
further using acoustic echo cancellation to eliminate feedback echos between the microphones and the loudspeakers, said user interface comprising:
a first interface section including a first plurality of manual controls
accessible from the first seat location, said first plurality of manual controls including
a first control for connecting the first microphone to a selected one of the second and
third loudspeakers so that the first spoken voice is output as the respective second or
third reproduced voice at the respective second or third seat location, and a second
control for connecting the first loudspeaker to a selected one of the second and third
microphones so that the respective second or third spoken voice at the respective
second or third seat location is output as the first reproduced voice; a second interface section including a second plurality of manual
controls accessible from the second seat location, said second plurality of manual
controls including a third control for connecting the second microphone to a selected
one of the first and third loudspeakers so that the second spoken voice is output as the respective first or third reproduced voice at the respective first or third seat location,
and a fourth control for connecting the second loudspeaker to a selected one of the
first and third microphones so that the respective first or third spoken voice at the
respective first or third seat location is output as the second reproduced voice; and a third interface section including a third plurality of manual controls
accessible from the third seat location, third first plurality of manual controls including a fifth control for connecting the third microphone to a selected one of the
first and second loudspeakers so that the third spoken voice is output as the respective first or second reproduced voice at the respective first or second seat location, and a
sixth control for connecting the third loudspeaker to a selected one of the first or
second microphones so that the respective first or second spoken voice at the
respective first or second seat location is output as the third reproduced voice.
126. The user interface of claim 125, wherein said first control
optionally connects the first microphone to both of the second and third loudspeakers
and said second control optionally connects the first loudspeaker to both of the second and third microphones.
127. The user interface of claim 125, wherein said third control
optionally connects the second microphone to both of the first and third loudspeakers
and said fourth control optionally connects the second loudspeaker to both of the first
and third microphones.
128. The user interface of claim 125, wherein said fifth control
optionally connects the third microphone to both of the first and second loudspeakers and said sixth control optionally connects the third loudspeaker to both of the first and
second microphones.
129. The user interface of claim 125, further comprising: a first three-way switch for making connection between the first microphone and the second loudspeaker, said first switch making the connection or breaking the connection in response to a most recent actuation of said first and fourth
controls; a second three-way switch for making connection between the first
microphone and the third loudspeaker, said second switch making the connection or breaking the connection in response to a most recent actuation of said first and sixth
controls; a third three-way switch for making connection between the second
microphone and the first loudspeaker, said third switch making the connection or
breaking the connection in response to a most recent actuation of said second and
third controls; a fourth three-way switch for making connection between the second microphone and the third loudspeaker, said fourth switch making the connection or
breaking the connection in response to a most recent actuation of said third and sixth
controls; a fifth three-way switch for making connection between the third microphone and the first loudspeaker, said fifth switch making the connection or
breaking the connection in response to a most recent actuation of said second and fifth
controls; and a sixth three-way switch for making connection between the third
microphone and the second loudspeaker, said sixth switch making the connection or breaking the connection in response to a most recent actuation of said fourth and fifth
controls.
130. The user interface of claim 125, further comprising a voice
storage device for storing voice messages and a voice storage logic device for
controlling access to said voice storage device to record voice messages therein at
accessible locations,
said first interface section including a seventh control for controlling
said voice storage logic device to store in said voice storage device a voice message
received at the first microphone, and an eighth control for controlling said voice storage logic device to retrieve from said voice storage device a recorded voice
message to be output by the first loudspeaker,
said second interface section including a ninth control for controlling
said voice storage logic device to store in said voice storage device a voice message received at the second microphone, and a tenth control for controlling said voice storage logic device to retrieve from said voice storage device a recorded voice
message to be output by the second loudspeaker, and
said third interface section including an eleventh control for controlling
said voice storage logic device to store in said voice storage device a voice message received at the third microphone, and a twelfth control for controlling said voice storage logic device to retrieve from said voice storage device a recorded voice
message to be output by the third loudspeaker.
131. The user interface of claim 130, wherein each of said eighth, tenth and twelfth controls can control said voice storage logic device to retrieve any voice message stored in said voice storage device.
132. The user interface of claim 125, further comprising a wireless
telephone for making a call to a remote location and receiving a call from a remote
location, said first interface section including a seventh control for accessing said telephone for making and placing a call,
said second interface section including an eighth control for accessing
said telephone for making and placing a call, and
said third interface section including a ninth control for accessing said
telephone for making and placing a call.
133. The user interface of claim 132, wherein said seventh, eighth
and ninth controls enable simultaneous access to said wireless telephone for joint
participation in a call.
134. An automatic gain control for a cabin communication system
for improving clarity of a voice spoken within a movable interior cabin having
ambient noise, said automatic gain control comprising: a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into a first audio signal having a first component corresponding to the spoken voice and a second component
corresponding to the ambient noise; a filter for removing the second component from the first audio signal
to provide a filtered audio signal; an acoustic echo canceller for receiving the filtered audio signal in
accordance with a supplied dither signal and providing an echo-cancelled audio
signal; a control signal generating circuit for generating a first automatic gain
control signal in response to a noise signal that corresponds to a current speed of the
cabin, the first automatic gain control signal controlling a first gain of the dither signal supplied to said filter, said control signal generating circuit also for generating a second automatic gain control signal in response to the noise signal; and
a loudspeaker for outputting a reproduced voice in response to the
echo-cancelled audio signal with a second gain controlled by the second automatic
gain control signal.
135. The automatic gain control of claim 134, wherein the noise
signal is the second component.
136. The automatic gain control of claim 134, further comprising a noise estimator for receiving the second component and at least one additional signal
indicative of noise within the cabin, said noise estimator generating the noise signal in
response to the second component and the at least one additional signal.
137. The automatic gain control of claim 136, wherein the at least one additional signal includes a feed forward signal.
138. The automatic gain control of claim 136, wherein the at least
one additional signal includes a window open signal.
139. The automatic gain control of claim 136, wherein the at least
one additional signal includes a door open signal.
140. The automatic gain control of claim 136, wherein the at least
one additional signal includes an engine RPM signal.
141. The automatic gain control of claim 134, wherein said control
signal generating circuit includes a limiter to limit the noise signal to provide a limited
noise signal and a low pass filter to low pass filter the limited noise signal to provide a
filtered noise signal, the first and second automatic gain control signals being generated from the filtered noise signal.
142. The automatic gain control of claim 141, wherein a response of
said low pass filter is fast enough to track speed changes of the cabin and slow enough not to introduce noise by amplitude modulation of the reproduced voice.
143. The automatic gain control of claim 142, wherein said control
signal generating circuit includes a processor for linearly increasing the filtered noise signal to provide an increased signal as the second automatic gain control signal.
144. The automatic gain control of claim 142, further comprising a manual control for inputting an input signal indicative of a desired loudness of the
reproduced spoken voice, wherein said control signal generating circuit includes a
processor for linearly increasing the filtered noise signal to provide an increased signal
and a modifying circuit that modifies the increased signal in accordance with the input signal to provide a modified signal as the second automatic gain signal.
145. The automatic gain control of claim 144, wherein said
modifying circuit is a multiplier that multiplies the increased signal by the input
signal.
146. The automatic gain control of claim 142, wherein said control
signal generating circuit includes a processor for linearly increasing the filtered noise
signal to provide an increased signal as the first automatic gain control signal.
147. An automatic gain control for a cabin communication system
for improving clarity of a voice spoken within a movable interior cabin having
ambient noise, said automatic gain control comprising: a microphone for receiving the spoken voice and the ambient noise and
for converting the spoken voice and the ambient noise into a first audio signal having a first component corresponding to the spoken voice and a second component
corresponding to the ambient noise; a filter for removing the second component from the first audio signal
to separately provide a filtered audio signal and the second component; a noise estimator for receiving the second component and at least one
feed forward signal indicative of noise within the cabin, said noise estimator
generating a noise signal that corresponds to a current speed of the cabin in response to the second component and the at least one additional signal;
a control signal generating circuit for generating an automatic gain
control signal in response to the noise signal; and
a loudspeaker for outputting a reproduced voice in response to the filtered audio signal with a gain controlled by the automatic gain control signal.
148. The automatic gain control of claim 147, wherein the at least
one feed forward signal includes a window open signal.
149. The automatic gain control of claim 147, wherein the at least
one feed forward signal includes a door open signal.
150. The automatic gain control of claim 147, wherein the at least
one feed forward signal includes an engine RPM signal.
151. An automatic gain control for a cabin communication system
for improving clarity of a voice spoken within a movable interior cabin having
ambient noise, said automatic gain control comprising: a plurality of microphones for receiving the spoken voice and the ambient noise, each said microphone converting the spoken voice and the ambient noise into a respective first audio signal having a respective first component corresponding to the spoken voice and a respective second component corresponding
to the ambient noise;
a plurality of filters for removing the respective second components
from the respective first audio signals to separately provide respective filtered audio signals and the respective second components;
a noise estimator for receiving the second components and at least one additional signal indicative of noise within the cabin, said noise estimator generating a
noise signal that corresponds to a current speed of the cabin in response to a weighted
combination of the second components and the at least one additional signal; a control signal generating circuit for generating an automatic gain control signal in response to the noise signal; and
a loudspeaker for outputting a reproduced voice in response to one of
the filtered audio signals with a gain controlled by the automatic gain control signal.
152. The automatic gain control of claim 151, wherein the at least one additional signal includes a feed forward signal.
153. The automatic gain control of claim 151. wherein the at least
one additional signal includes a window open signal.
154. The automatic gain control of claim 151 , wherein the at least
one additional signal includes a door open signal.
155. The automatic gain control of claim 151, wherein the at least
one additional signal includes an engine RPM signal.
156. An automatic gain control method for use in a cabin
communication system for improving clarity of a voice spoken within a movable
interior cabin having ambient noise, said method comprising the steps of: receiving the spoken voice and the ambient noise at a first location in the cabin and converting the spoken voice and the ambient noise into a first audio
signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise;
removing the second component from the first audio signal to provide a filtered audio signal;
acoustically echo cancelling the filtered audio signal in accordance
with a supplied dither signal and providing an echo-cancelled audio signal;
generating a first automatic gain control signal in response to a noise
signal that corresponds to a current speed of the cabin, the first automatic gain control signal controlling a first gain of the dither signal used in said filtering step, said generating step also generating a second automatic gain control signal in response to
the noise signal; and
outputting a reproduced voice in response to the echo-cancelled audio
signal with a second gain controlled by the second automatic gain control signal.
157. An automatic gain control method for use in a cabin
communication system for improving clarity of a voice spoken within a movable
interior cabin having ambient noise, said method comprising the steps of:
receiving the spoken voice and the ambient noise at a first location in
the cabin and converting the spoken voice and the ambient noise into a first audio signal having a first component corresponding to the spoken voice and a second
component corresponding to the ambient noise;
removing the second component from the first audio signal to
separately provide a filtered audio signal and the second component; generating a noise signal that corresponds to a current speed of the cabin in response to the second component and at least one feed forward signal
indicative of noise within the cabin;
generating an automatic gain control signal in response to the noise
signal; and outputting a reproduced voice in response to the filtered audio signal
with a gain controlled by the automatic gain control signal.
158. A cabin communication system for improving clarity of a voice spoken within an interior cabin, said cabin communication system comprising: a microphone array, said microphone array including a first
microphone, position at a first location within the cabin, for receiving the spoken voice primarily in a first direction and for converting the spoken voice into a first
audio signal, and a second microphone, positioned at a second location within the cabin, for receiving the spoken voice primarily in the first direction and for concerting
the spoken voice into a second audio signal;
a sound source that inputs sound into the cabin such that the input sound approaches said microphone array primarily in a second direction different from
the first direction;
a processor for combining said first and second audio signals to
provide a resultant audio signal, wherein the combining of said first and second aufdio
signals defines a beampattern of aid microphone array that includes a plurality of lobes and a plurality of nulls such that the spoken voice is primaryily received by said microphone array along the first direction at a first one of said plurality of lobes and
such that the input sound is primarily received by said microphone array along the
second direction at a first one of said plurality of nulls, whereby any component in said resultant audio signal indicative of the input sound is substantially minimal; and
a loudspeaker for converting said resultant audio signal into an output reproduced voice within the cabin.
159. The cabin communication system of claim 158, wherein said first and second microphones define a beamformed phase array.
160. The cabin communication system of claim 159, wherein the
first one of said plurality of lobes is a main lobe of said beampattern.
161. The cabin communication system of claim 159, wherein said sound source is said loudspeaker.
162. The cabin communication system of claim 161 , further
comprising a second sound source that inputs a second sound into the cabin such that
the input second sound approaches said microphone array primarily in a third
direction different from both the first and second directions, wherein the input second sound is primarily received by said
microphone array along the third direction at a second one of said plurality of nulls, whereby any component in said resultant signal indicative of the input second sound is
substantially minimal.
163. The cabin communication system of claim 162, wherein said
second sound source is a loudspeaker of an entertainment system.
164. The cabin communication system of claim 159, wherein said
sound source is a loudspeaker of an entertainment system.
165. A movable vehicle cabin comprising: means for causing movement of said cabin; and
a communication system for improving clarity of a voice spoken within
an interior of said cabin, wherein said cabin communication system comprises: a microphone array, said microphone array including a first microphone, positioned at a first location within said cabin, for receiving the spoken
voice primarily in a first direction and for converting the spoken voice into a first
audio signal, and a second microphone, positioned at a second location within said cabin, for receiving the spoken voice primarily in the first direction and for converting
the spoken voice into a second audio signal; a sound source that inputs sound into said cabin such that the input sound
approaches said microphone array primarily in a second direction different from the
first direction;
a processor for combining said first and second audio signals to provide a
resultant audio signal, wherein the combining of said first and second audio signals
defines a beampattern of said microphone array that includes a plurality of lobes and a plurality of nulls such that the spoken voice is primarily received by said microphone
array along the first direction at a first one of said plurality of lobes and such that the
input is primarily received by said microphone array along the second direction at a
first one of said plurality of nulls, whereby any component in said resultant audio signal indicative of the input sound is substantially minimal; and
a loudspeaker for converting said resultant audio signal into an output
reproduced voice within said cabin.
166. The cabin of claim 165, wherein said first and second microphones define a beamformed phase array.
167. The cabin of claim 166, wherein the first one of said plurality of lobes is a main lobe of said beampattern.
168. The cabin of claim 166, wherein said source is said loudspeaker.
169. The cabin of claim 168, further comprising a second sound
source that inputs a second sound into said cabin such that the input second sound
approached said microphone array primarily in a third direction different from both
the first and second directions, wherein the input second sound is primarily received by said microphone array along the third direction at a second one of said plurality of nulls,
whereby any component in said resultant audio indicative of the input second sound is
substantially minimal.
170. The cabin of claim 169, wherein said second sound source is a
loudspeaker of an entertainment system.
171. The cabin of claim 165, wherein said sound source is a
loudspeaker of an entertainment system.
PCT/US2001/032455 2000-10-19 2001-10-18 Transient processing for communication system WO2002032356A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002224413A AU2002224413A1 (en) 2000-10-19 2001-10-18 Transient processing for communication system

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US69255600A 2000-10-19 2000-10-19
US09/692,531 US7171003B1 (en) 2000-10-19 2000-10-19 Robust and reliable acoustic echo and noise cancellation system for cabin communication
US09/692725 2000-10-19
US09/692,268 US7039197B1 (en) 2000-10-19 2000-10-19 User interface for communication system
US09/691,869 US6748086B1 (en) 2000-10-19 2000-10-19 Cabin communication system without acoustic echo cancellation
US09/692556 2000-10-19
US09/691,928 US6674865B1 (en) 2000-10-19 2000-10-19 Automatic volume control for communication system
US09/692268 2000-10-19
US09/691928 2000-10-19
US09/692531 2000-10-19
US09/691869 2000-10-19
US09/692,725 US7117145B1 (en) 2000-10-19 2000-10-19 Adaptive filter for speech enhancement in a noisy environment

Publications (2)

Publication Number Publication Date
WO2002032356A1 true WO2002032356A1 (en) 2002-04-25
WO2002032356A8 WO2002032356A8 (en) 2002-05-23

Family

ID=27560235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/032455 WO2002032356A1 (en) 2000-10-19 2001-10-18 Transient processing for communication system

Country Status (2)

Country Link
AU (1) AU2002224413A1 (en)
WO (1) WO2002032356A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1445761A1 (en) * 2003-02-07 2004-08-11 Volkswagen Aktiengesellschaft Apparatus and method for operating voice controlled systems in vehicles
EP1718103A1 (en) 2005-04-29 2006-11-02 Harman Becker Automotive Systems GmbH Compensation of reverberation and feedback
US7317801B1 (en) 1997-08-14 2008-01-08 Silentium Ltd Active acoustic noise reduction system
EP1414021B1 (en) * 2002-10-21 2008-05-14 Silentium Ltd. Active acoustic noise reduction system
EP1533934A3 (en) * 2003-11-21 2010-06-16 Infineon Technologies AG Method and device for predicting the noise contained in a received signal
JP2010537586A (en) * 2007-08-22 2010-12-02 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション Automatic sensor signal matching
US7853024B2 (en) 1997-08-14 2010-12-14 Silentium Ltd. Active noise control system and method
FR2948484A1 (en) * 2009-07-23 2011-01-28 Parrot METHOD FOR FILTERING NON-STATIONARY SIDE NOISES FOR A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE
US8855329B2 (en) 2007-01-22 2014-10-07 Silentium Ltd. Quiet fan incorporating active noise control (ANC)
CN104508737A (en) * 2012-06-10 2015-04-08 纽昂斯通讯公司 Noise dependent signal processing for in-car communication systems with multiple acoustic zones
US9431001B2 (en) 2011-05-11 2016-08-30 Silentium Ltd. Device, system and method of noise control
US9549250B2 (en) 2012-06-10 2017-01-17 Nuance Communications, Inc. Wind noise detection for in-car communication systems with multiple acoustic zones
US9613633B2 (en) 2012-10-30 2017-04-04 Nuance Communications, Inc. Speech enhancement
CN106920559A (en) * 2017-03-02 2017-07-04 奇酷互联网络科技(深圳)有限公司 The optimization method of conversation voice, device and call terminal
CN107251134A (en) * 2014-12-28 2017-10-13 静公司 The devices, systems, and methods of noise are controlled in noise controllable volume
US9805738B2 (en) 2012-09-04 2017-10-31 Nuance Communications, Inc. Formant dependent speech signal enhancement
US9928824B2 (en) 2011-05-11 2018-03-27 Silentium Ltd. Apparatus, system and method of controlling noise within a noise-controlled volume
US9959882B2 (en) 2016-09-08 2018-05-01 Continental Automotive Systems, Inc. In-car communication howling prevention
US10759429B2 (en) 2017-09-06 2020-09-01 Continental Automotive Systems, Inc. Hydraulic roll-off protection
WO2021052958A1 (en) * 2019-09-20 2021-03-25 Peiker Acustic Gmbh System, method, and computer-readable storage medium for controlling an in-car communication system by means of tap sound detection
EP4120260A1 (en) * 2021-07-14 2023-01-18 Alps Alpine Co., Ltd. In-vehicle communication support system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5361305A (en) * 1993-11-12 1994-11-01 Delco Electronics Corporation Automated system and method for automotive audio test
US5802184A (en) * 1996-08-15 1998-09-01 Lord Corporation Active noise and vibration control system
US5872852A (en) * 1995-09-21 1999-02-16 Dougherty; A. Michael Noise estimating system for use with audio reproduction equipment
US6040761A (en) * 1997-07-04 2000-03-21 Kiekert Ag Acoustic warning system for motor-vehicle subsystem

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5361305A (en) * 1993-11-12 1994-11-01 Delco Electronics Corporation Automated system and method for automotive audio test
US5872852A (en) * 1995-09-21 1999-02-16 Dougherty; A. Michael Noise estimating system for use with audio reproduction equipment
US5802184A (en) * 1996-08-15 1998-09-01 Lord Corporation Active noise and vibration control system
US6040761A (en) * 1997-07-04 2000-03-21 Kiekert Ag Acoustic warning system for motor-vehicle subsystem

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853024B2 (en) 1997-08-14 2010-12-14 Silentium Ltd. Active noise control system and method
US8630424B2 (en) 1997-08-14 2014-01-14 Silentium Ltd. Active noise control system and method
US7317801B1 (en) 1997-08-14 2008-01-08 Silentium Ltd Active acoustic noise reduction system
EP1414021B1 (en) * 2002-10-21 2008-05-14 Silentium Ltd. Active acoustic noise reduction system
US7467084B2 (en) 2003-02-07 2008-12-16 Volkswagen Ag Device and method for operating a voice-enhancement system
EP1445761A1 (en) * 2003-02-07 2004-08-11 Volkswagen Aktiengesellschaft Apparatus and method for operating voice controlled systems in vehicles
EP1533934A3 (en) * 2003-11-21 2010-06-16 Infineon Technologies AG Method and device for predicting the noise contained in a received signal
US8165310B2 (en) 2005-04-29 2012-04-24 Harman Becker Automotive Systems Gmbh Dereverberation and feedback compensation system
EP1718103A1 (en) 2005-04-29 2006-11-02 Harman Becker Automotive Systems GmbH Compensation of reverberation and feedback
US8855329B2 (en) 2007-01-22 2014-10-07 Silentium Ltd. Quiet fan incorporating active noise control (ANC)
JP2010537586A (en) * 2007-08-22 2010-12-02 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション Automatic sensor signal matching
FR2948484A1 (en) * 2009-07-23 2011-01-28 Parrot METHOD FOR FILTERING NON-STATIONARY SIDE NOISES FOR A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE
EP2293594A1 (en) * 2009-07-23 2011-03-09 Parrot Method for filtering lateral non stationary noise for a multi-microphone audio device
US8370140B2 (en) 2009-07-23 2013-02-05 Parrot Method of filtering non-steady lateral noise for a multi-microphone audio device, in particular a “hands-free” telephone device for a motor vehicle
US9928824B2 (en) 2011-05-11 2018-03-27 Silentium Ltd. Apparatus, system and method of controlling noise within a noise-controlled volume
US9431001B2 (en) 2011-05-11 2016-08-30 Silentium Ltd. Device, system and method of noise control
CN104508737A (en) * 2012-06-10 2015-04-08 纽昂斯通讯公司 Noise dependent signal processing for in-car communication systems with multiple acoustic zones
US9549250B2 (en) 2012-06-10 2017-01-17 Nuance Communications, Inc. Wind noise detection for in-car communication systems with multiple acoustic zones
CN104508737B (en) * 2012-06-10 2017-12-05 纽昂斯通讯公司 The signal transacting related for the noise of the Vehicular communication system with multiple acoustical areas
US9502050B2 (en) 2012-06-10 2016-11-22 Nuance Communications, Inc. Noise dependent signal processing for in-car communication systems with multiple acoustic zones
US9805738B2 (en) 2012-09-04 2017-10-31 Nuance Communications, Inc. Formant dependent speech signal enhancement
US9613633B2 (en) 2012-10-30 2017-04-04 Nuance Communications, Inc. Speech enhancement
CN107251134A (en) * 2014-12-28 2017-10-13 静公司 The devices, systems, and methods of noise are controlled in noise controllable volume
US9959882B2 (en) 2016-09-08 2018-05-01 Continental Automotive Systems, Inc. In-car communication howling prevention
CN106920559A (en) * 2017-03-02 2017-07-04 奇酷互联网络科技(深圳)有限公司 The optimization method of conversation voice, device and call terminal
CN106920559B (en) * 2017-03-02 2020-10-30 奇酷互联网络科技(深圳)有限公司 Voice communication optimization method and device and call terminal
US10759429B2 (en) 2017-09-06 2020-09-01 Continental Automotive Systems, Inc. Hydraulic roll-off protection
WO2021052958A1 (en) * 2019-09-20 2021-03-25 Peiker Acustic Gmbh System, method, and computer-readable storage medium for controlling an in-car communication system by means of tap sound detection
EP4120260A1 (en) * 2021-07-14 2023-01-18 Alps Alpine Co., Ltd. In-vehicle communication support system

Also Published As

Publication number Publication date
WO2002032356A8 (en) 2002-05-23
AU2002224413A1 (en) 2002-04-29

Similar Documents

Publication Publication Date Title
US7171003B1 (en) Robust and reliable acoustic echo and noise cancellation system for cabin communication
US7117145B1 (en) Adaptive filter for speech enhancement in a noisy environment
US6674865B1 (en) Automatic volume control for communication system
US7039197B1 (en) User interface for communication system
WO2002032356A1 (en) Transient processing for communication system
US9711131B2 (en) Sound zone arrangement with zonewise speech suppression
EP3563562B1 (en) Acoustic echo canceling
EP1858295B1 (en) Equalization in acoustic signal processing
US7068798B2 (en) Method and system for suppressing echoes and noises in environments under variable acoustic and highly feedback conditions
EP0843934B1 (en) Arrangement for suppressing an interfering component of an input signal
EP2048659B1 (en) Gain and spectral shape adjustment in audio signal processing
US8306234B2 (en) System for improving communication in a room
US5251263A (en) Adaptive noise cancellation and speech enhancement system and apparatus therefor
EP1855457B1 (en) Multi channel echo compensation using a decorrelation stage
US9992572B2 (en) Dereverberation system for use in a signal processing apparatus
US6505057B1 (en) Integrated vehicle voice enhancement system and hands-free cellular telephone system
US5796819A (en) Echo canceller for non-linear circuits
EP1718103B1 (en) Compensation of reverberation and feedback
US8712068B2 (en) Acoustic echo cancellation
US20050265560A1 (en) Indoor communication system for a vehicular cabin
KR20040019362A (en) Sound reinforcement system having an multi microphone echo suppressor as post processor
KR20040019339A (en) Sound reinforcement system having an echo suppressor and loudspeaker beamformer
WO2002093876A2 (en) Final signal from a near-end signal and a far-end signal
Schmidt Applications of acoustic echo control-an overview
JP2001005463A (en) Acoustic system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: C1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP