US20130054232A1 - Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames - Google Patents

Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames Download PDF

Info

Publication number
US20130054232A1
US20130054232A1 US13/589,237 US201213589237A US2013054232A1 US 20130054232 A1 US20130054232 A1 US 20130054232A1 US 201213589237 A US201213589237 A US 201213589237A US 2013054232 A1 US2013054232 A1 US 2013054232A1
Authority
US
United States
Prior art keywords
noise
time frame
signal
speech
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/589,237
Other versions
US9666206B2 (en
Inventor
Takahiro Unno
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Texas Instruments Inc
Original Assignee
Texas Instruments Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Inc filed Critical Texas Instruments Inc
Priority to US13/589,237 priority Critical patent/US9666206B2/en
Assigned to TEXAS INSTRUMENTS INCORPORATED reassignment TEXAS INSTRUMENTS INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UNNO, TAKAHIRO
Priority to US13/592,708 priority patent/US20130054233A1/en
Priority to US13/594,401 priority patent/US9137611B2/en
Publication of US20130054232A1 publication Critical patent/US20130054232A1/en
Application granted granted Critical
Publication of US9666206B2 publication Critical patent/US9666206B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the disclosures herein relate in general to audio processing, and in particular to a method, system and computer program product for attenuating noise in multiple time frames.
  • a spectral subtraction technique For attenuating noise, a spectral subtraction technique has various shortcomings, because it estimates a posteriori speech-to-noise ratio (“SNR”) instead of a priori SNR. Conversely, a minimum mean-square error (“MMSE”) technique has various shortcomings, because it estimates a priori SNR instead of a posteriori SNR. Those shortcomings are especially significant if a level of the noise is high.
  • SNR posteriori speech-to-noise ratio
  • MMSE minimum mean-square error
  • At least one signal is received that represents speech and noise.
  • frequency bands are generated of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal.
  • a first ratio is determined of a clean version of the speech for a preceding time frame to the noise for the preceding time frame; and a second ratio is determined of a noisy version of the speech for the time frame n to the noise for the time frame n.
  • a gain is determined for the kth frequency band of the output channel for the time frame n.
  • FIG. 1 is a perspective view of a mobile smartphone that includes an information handling system of the illustrative embodiments.
  • FIG. 2 is a block diagram of the information handling system of the illustrative embodiments.
  • FIG. 3 is an information flow diagram of an operation of the system of FIG. 2 .
  • FIG. 4 is an information flow diagram of a blind source separation operation of FIG. 3 .
  • FIG. 5 is an information flow diagram of a post processing operation of FIG. 3 .
  • FIG. 6 is a graph of various frequency bands that are applied by a discrete Fourier transform (“DFT”) filter bank operation of FIG. 5 .
  • DFT discrete Fourier transform
  • FIG. 7 is a graph of noise suppression gain in response to a signal's a posteriori speech-to-noise ratio (“SNR”) and estimated a priori SNR, in accordance with one example of the illustrative embodiments.
  • SNR posteriori speech-to-noise ratio
  • FIG. 8 is a graph that shows example levels of a signal and an estimated noise floor, as they vary over time.
  • FIG. 1 is a perspective view of a mobile smartphone, indicated generally at 100 , that includes an information handling system of the illustrative embodiments.
  • the smartphone 100 includes a primary microphone, a secondary microphone, an ear speaker, and a loud speaker, as shown in FIG. 1 .
  • the smartphone 100 includes a touchscreen and various switches for manually controlling an operation of the smartphone 100 .
  • FIG. 2 is a block diagram of the information handling system, indicated generally at 200 , of the illustrative embodiments.
  • a human user 202 speaks into the primary microphone ( FIG. 1 ), which converts sound waves of the speech (from the user 202 ) into a primary voltage signal V 1 .
  • the secondary microphone ( FIG. 1 ) converts sound waves of noise (e.g., from an ambient environment that surrounds the smartphone 100 ) into a secondary voltage signal V 2 .
  • the signal V 1 contains the noise
  • the signal V 2 contains leakage of the speech.
  • a control device 204 receives the signal V 1 (which represents the speech and the noise) from the primary microphone and the signal V 2 (which represents the noise and leakage of the speech) from the secondary microphone. In response to the signals V 1 and V 2 , the control device 204 outputs: (a) a first electrical signal to a speaker 206 ; and (b) a second electrical signal to an antenna 208 . The first electrical signal and the second electrical signal communicate speech from the signals V 1 and V 2 , while suppressing at least some noise from the signals V 1 and V 2 .
  • the speaker 206 In response to the first electrical signal, the speaker 206 outputs sound waves, at least some of which are audible to the human user 202 .
  • the antenna 208 outputs a wireless telecommunication signal (e.g., through a cellular telephone network to other smartphones).
  • the control device 204 , the speaker 206 and the antenna 208 are components of the smartphone 100 , whose various components are housed integrally with one another. Accordingly in a first example, the speaker 206 is the ear speaker of the smartphone 100 . In a second example, the speaker 206 is the loud speaker of the smartphone 100 .
  • the control device 204 includes various electronic circuitry components for performing the control device 204 operations, such as: (a) a digital signal processor (“DSP”) 210 , which is a computational resource for executing and otherwise processing instructions, and for performing additional operations (e.g., communicating information) in response thereto; (b) an amplifier (“AMP”) 212 for outputting the first electrical signal to the speaker 206 in response to information from the DSP 210 ; (c) an encoder 214 for outputting an encoded bit stream in response to information from the DSP 210 ; (d) a transmitter 216 for outputting the second electrical signal to the antenna 208 in response to the encoded bit stream; (e) a computer-readable medium 218 (e.g., a nonvolatile memory device) for storing information; and (f) various other electronic circuitry (not shown in FIG. 2 ) for performing other operations of the control device 204 .
  • DSP digital signal processor
  • AMP amplifier
  • the DSP 210 receives instructions of computer-readable software programs that are stored on the computer-readable medium 218 . In response to such instructions, the DSP 210 executes such programs and performs its operations, so that the first electrical signal and the second electrical signal communicate speech from the signals V 1 and V 2 , while suppressing at least some noise from the signals V 1 and V 2 . For executing such programs, the DSP 210 processes data, which are stored in memory of the DSP 210 and/or in the computer-readable medium 218 . Optionally, the DSP 210 also receives the first electrical signal from the amplifier 212 , so that the DSP 210 controls the first electrical signal in a feedback loop.
  • the primary microphone ( FIG. 1 ), the secondary microphone ( FIG. 1 ), the control device 204 and the speaker 206 are components of a hearing aid for insertion within an ear canal of the user 202 .
  • the hearing aid omits the antenna 208 , the encoder 214 and the transmitter 216 .
  • FIG. 3 is an information flow diagram of an operation of the system 200 .
  • the DSP 210 performs an adaptive linear filter operation to separate the speech from the noise.
  • s 1 [n] and s 2 [n] represent the speech (from the user 202 ) and the noise (e.g., from an ambient environment that surrounds the smartphone 100 ), respectively, during a time frame n.
  • x 1 [n] and x 2 [n] are digitized versions of the signals V 1 and V 2 , respectively, of FIG. 2 .
  • x 1 [n] contains information that primarily represents the speech, but also the noise
  • x 2 [n] contains information that primarily represents the noise, but also leakage of the speech.
  • the noise includes directional noise (e.g., a different person's background speech) and diffused noise.
  • the DSP 210 performs a dual-microphone blind source separation (“BSS”) operation, which generates y 1 [n] and y 2 [n] in response to x 1 [n] and x 2 [n], so that: (a) y 1 [n] is a primary channel of information that represents the speech and the diffused noise while suppressing most of the directional noise from x 1 [n]; and (b) y 2 [n] is a secondary channel of information that represents the noise while suppressing most of the speech from x 2 [n].
  • BSS dual-microphone blind source separation
  • the DSP 210 After the BSS operation, the DSP 210 performs a non-linear post processing operation for suppressing noise, without estimating a phase of y 1 [n].
  • the DSP 210 In the post processing operation, the DSP 210 : (a) in response to y 2 [n], estimates the diffused noise within y 1 [n]; and (b) in response to such estimate, generates s 1 [n], which is an output channel of information that represents the speech while suppressing most of the noise from y 1 [n].
  • s 1 [n is an output channel of information that represents the speech while suppressing most of the noise from y 1 [n].
  • the DSP 210 outputs such ⁇ 1 [n] information to: (a) the AMP 212 , which outputs the first electrical signal to the speaker 206 in response to such ⁇ 1 [n] information; and (b) the encoder 214 , which outputs the encoded bit stream to the transmitter 216 in response to such ⁇ 1 [n] information.
  • the DSP 210 writes such ⁇ 1 [n] information for storage on the computer-readable medium 218 .
  • FIG. 4 is an information flow diagram of the BSS operation of FIG. 3 .
  • a speech estimation filter H 1 (a) receives x 1 [n], y 1 [n] and y 2 [n]; and (b) in response thereto, adaptively outputs an estimate of speech that exists within y 1 [n].
  • a noise estimation filter H 2 (a) receives x 2 [n], y 1 [n] and y 2 [n]; and (b) in response thereto, adaptively outputs an estimate of directional noise that exists within y 2 [n].
  • y 1 [n] is a difference between: (a) x 1 [n]; and (b) such estimated directional noise from the noise estimation filter H 2 .
  • the BSS operation iteratively removes such estimated directional noise from x 1 [n], so that y 1 [n] is a primary channel of information that represents the speech and the diffused noise while suppressing most of the directional noise from x 1 [n].
  • y 2 [n] is a difference between: (a) x 2 [n]; and (b) such estimated speech from the speech estimation filter H 1 .
  • the BSS operation iteratively removes such estimated speech from x 2 [n], so that y 2 [n] is a secondary channel of information that represents the noise while suppressing most of the speech from x 2 [n].
  • the filters H 1 and H 2 are adapted to reduce cross-correlation between y 1 [n] and y 2 [n], so that their filter lengths (e.g., 20 filter taps) are sufficient for estimating: (a) a path of the speech from the primary channel to the secondary channel; and (b) a path of the directional noise from the secondary channel to the primary channel.
  • the DSP 210 estimates a level of a noise floor (“noise level”) and a level of the speech (“speech level”).
  • the DSP 210 computes the speech level by autoregressive (“AR”) smoothing (e.g., with a time constant of 20 ms).
  • FIG. 5 is an information flow diagram of the post processing operation.
  • FIG. 6 is a graph of various frequency bands that are applied by a discrete Fourier transform (“DFT”) filter bank operation of FIG. 5 . As shown in FIG. 6 , each frequency band partially overlaps its neighboring frequency bands by fifty percent (50%) apiece. For example, in FIG. 6 , one frequency band ranges from B Hz to D Hz, and such frequency band partially overlaps: (a) a frequency band that ranges from A Hz to C Hz; and (b) a frequency band that ranges from C Hz to E Hz.
  • DFT discrete Fourier transform
  • the DSP 210 in the DFT filter bank operation, the DSP 210 : (a) receives y 1 [n] and y 2 [n] from the BSS operation; (b) converts y 1 [n] from a time domain to a frequency domain, and decomposes the frequency domain version of y 1 [n] into a primary channel of the N bands, which are y 1 [n, 1] through y 1 [n, N]; and (c) converts y 2 [n] from time domain to frequency domain, and decomposes the frequency domain version of y 2 [n] into a secondary channel of the N bands, which are y 2 [n, 1] through y 2 [n, N].
  • the DSP 210 performs a noise suppression operation, such as a spectral subtraction operation, minimum mean-square error (“MMSE”) operation, or maximum likelihood (“ML”) operation.
  • MMSE minimum mean-square error
  • ML maximum likelihood
  • the DSP 210 in response to the secondary channel's kth band y 2 [n, k], estimates the diffused noise within the primary channel's kth band y 1 [n, k]; (b) in response to such estimate, computes the kth band's respective noise suppression gain G[n, k] for the time frame n; and (c) generates a respective noise-suppressed version ⁇ 1 [n, k] of the primary channel's kth band y 1 [n, k] by applying G[n, k] thereto (e.g., by multiplying G[n, k] and the primary channel's kth band y 1 [n, k] for the time frame n).
  • the DSP 210 After the DSP 210 generates the respective noise-suppressed versions ⁇ 1 [n, k] of all N bands of the primary channel for the time frame n, the DSP 210 composes ⁇ 1 [n] for the time frame n by performing an inverse of the DFT filter bank operation, in order to convert a sum of those noise-suppressed versions s 1 [n, k] from a frequency domain to a time domain.
  • a band's G[n, k] is variable per time frame n.
  • FIG. 7 is a graph of noise suppression gain G[n, k] in response to a signal's a posteriori SNR and estimated a priori SNR, in accordance with one example of the illustrative embodiments.
  • the DSP 210 computes the kth band's respective noise suppression gain G[n, k] in response to both: (a) a posteriori SNR, which is a logarithmic ratio between a noisy version of the signal's energy (e.g., speech and diffused noise as represented by y 1 [n, k]) and the noise's energy (e.g., as represented by y 2 [n, k]); and (b) estimated a priori SNR, which is a logarithmic ratio between a clean version of the signal's energy (e.g., as estimated by the DSP 210 ) and the noise's energy (e.g., as represented by y 2 [n, k]).
  • a posteriori SNR which is a log
  • the DSP 210 updates its decision-directed estimate of the kth band's then-current a priori SNR in response to G[n ⁇ 1, k] and y 1 [n ⁇ 1, k] for the immediately preceding time frame n ⁇ 1.
  • the DSP 210 computes:
  • P y 1 [n, k] is AR smoothed power of y 1 [n, k] in the kth band
  • P y 2 [n, k] is AR smoothed power of y 2 [n, k] in the kth band
  • y 1 R [n, k] and y 1 I [n, k] are real and imaginary parts of y 1 [n, k]
  • y 2 R[n, k] and y 2 I [n, k] are real and imaginary parts of y 2 [n, k].
  • 0.95.
  • the DSP 210 computes its estimate of a priori SNR as:
  • P s [n ⁇ 1, k] is estimated power of clean speech for the immediately preceding time frame n ⁇ 1; and (b) P y 2 [n ⁇ 1, k] is AR smoothed power of y 2 [n ⁇ 1, k] in the kth band for the immediately preceding time frame n ⁇ 1.
  • the DSP 210 computes its estimate of a priori SNR as:
  • P N [n ⁇ 1, k] is an estimate of noise level within y 1 [n ⁇ 1, k]; and (b) the DSP 210 estimates P N [n ⁇ 1, k] in the same manner as discussed hereinbelow in connection with FIG. 8 .
  • the DSP 210 computes P s [n ⁇ 1, k] as:
  • G[n ⁇ 1, k] is the kth band's respective noise suppression gain for the immediately preceding time frame n ⁇ 1; and (b) P y 1 [n ⁇ 1, k] is AR smoothed power of y 1 [n ⁇ 1, k] in the kth band for the immediately preceding time frame n ⁇ 1.
  • the DSP 210 computes a posteriori SNR as:
  • the DSP 210 computes a posteriori SNR as:
  • P N [n, k] is an estimate of noise level within y 1 [n, k]; and (b) the DSP 210 estimates P N [n, k] in the same manner as discussed hereinbelow in connection with FIG. 8 .
  • various spectral subtraction curves show how G[n, k] (“attenuation”) varies in response to both a posteriori SNR and estimated a priori SNR.
  • One of those curves (“unshifted curve”) is a baseline curve of a relationship between a posteriori SNR and G[n, k].
  • the DSP 210 shifts the baseline curve horizontally (either left or right by a variable amount X) in response to estimated a priori SNR, as shown by the remaining curves of FIG. 7 .
  • X is positive, so that the DSP 210 shifts the baseline curve left (which effectively increases G[n, k]), because the positive X indicates that y 1 [n, k] likely represents a smaller percentage of noise.
  • X is negative, so that the DSP 210 shifts the baseline curve right (which effectively reduces G[n, k]), because the negative X indicates that y 1 [n, k] likely represents a larger percentage of noise.
  • the DSP 210 smooths G[n, k] transition and thereby reduces its rate of change, so that the DSP 210 reduces an extent of annoying musical noise artifacts (but without producing excessive smoothing distortion, such as reverberation), while nevertheless updating G[n, k] with sufficient frequency to handle relatively fast changes in the signals V 1 and V 2 .
  • the DSP 210 shifts the baseline curve horizontally (either left or right by a first variable amount) and/or vertically (either up or down by a second variable amount) in response to estimated a priori SNR, so that the baseline curve shifts in one dimension (e.g., either horizontally or vertically) or multiple dimensions (e.g., both horizontally and vertically).
  • the DSP 210 implements the curve shift X by precomputing an attenuation table of G[n, k] values (in response to various combinations of a posteriori SNR and estimated a priori SNR) for storage on the computer-readable medium 218 , so that the DSP 210 determines G[n, k] in real-time operation by reading G[n, k] from such attenuation table in response to a posteriori SNR and estimated a priori SNR.
  • the DSP 210 implements the curve shift X by computing G[n, k] as:
  • the DSP 210 imposes a floor on G[n, k] to ensure that G[n, k] is always greater than or equal to a value of the floor, which is programmable as a runtime parameter. In that manner, the DSP 210 further reduces an extent of annoying musical noise artifacts. In the example of FIG. 7 , such floor value is ⁇ 20 dB.
  • FIG. 8 is a graph that shows example levels of P x 1 [n] and P N [n], as they vary over time, where: (a) P x 1 [n] is a power of x 1 [n]; (b) P x 1 [n] is denoted as “signal” in FIG. 8 ; and (c) P N [n] is denoted as “estimated noise floor level” in FIG. 8 .
  • the DSP 210 estimates P N [n] in response to P x 1 [n] for the BSS operation of FIGS. 3 and 4 .
  • the DSP 210 estimates P N [n] in response to P y 1 [n] (instead of P x 1 [n]) for the post processing operation of FIGS. 3 and 5 , as discussed hereinabove in connection with FIG. 7 .
  • the DSP 210 In response to P x 1 [n] exceeding P N [n] by more than a specified amount (“GAP”) for more than a specified continuous duration, the DSP 210 : (a) determines that such excess is more likely representative of noise level increase instead of speech; and (b) accelerates its adjustment of P N [n].
  • the DSP 210 measures the specified continuous duration as a specified number (“MAX”) of consecutive time frames, which aggregately equate to at least such duration (e.g., 0.8 seconds).
  • Count[n] Count[n ⁇ 1]+1. If Count[n]>MAX, then the DSP 210 sets the initialization flag. In response to the initialization flag being set, the DSP 210 estimates P N [n] with a faster time constant (e.g., in the same manner as the DSP 210 estimates P s [n] discussed hereinabove in connection with FIG. 4 ), so that P N [n] rises approximately as quickly as it falls.
  • the DSP 210 By comparison, if the DSP 210 always estimated P N [n] according to the time constants C u and C d , then the DSP 210 would have adjusted P N [n] with less precision and less speed (e.g., as shown by the “slower adjustment” line of FIG. 8 ). Also, in one embodiment, while initially adjusting P N [n] during its first 0.5 seconds of operation, the DSP 210 sets the initialization flag and estimates P N [n] with the faster time constant.
  • a computer program product is an article of manufacture that has: (a) a computer-readable medium; and (b) a computer-readable program that is stored on such medium.
  • Such program is processable by an instruction execution apparatus (e.g., system or device) for causing the apparatus to perform various operations discussed hereinabove (e.g., discussed in connection with a block diagram).
  • an instruction execution apparatus e.g., system or device
  • the apparatus e.g., programmable information handling system
  • Such program e.g., software, firmware, and/or microcode
  • an object-oriented programming language e.g., C++
  • a procedural programming language e.g., C
  • any suitable combination thereof e.g., C++
  • the computer-readable medium is a computer-readable storage medium.
  • the computer-readable medium is a computer-readable signal medium.
  • a computer-readable storage medium includes any system, device and/or other non-transitory tangible apparatus (e.g., electronic, magnetic, optical, electromagnetic, infrared, semiconductor, and/or any suitable combination thereof) that is suitable for storing a program, so that such program is processable by an instruction execution apparatus for causing the apparatus to perform various operations discussed hereinabove.
  • non-transitory tangible apparatus e.g., electronic, magnetic, optical, electromagnetic, infrared, semiconductor, and/or any suitable combination thereof
  • Examples of a computer-readable storage medium include, but are not limited to: an electrical connection having one or more wires; a portable computer diskette; a hard disk; a random access memory (“RAM”); a read-only memory (“ROM”); an erasable programmable read-only memory (“EPROM” or flash memory); an optical fiber; a portable compact disc read-only memory (“CD-ROM”); an optical storage device; a magnetic storage device; and/or any suitable combination thereof.
  • a computer-readable signal medium includes any computer-readable medium (other than a computer-readable storage medium) that is suitable for communicating (e.g., propagating or transmitting) a program, so that such program is processable by an instruction execution apparatus for causing the apparatus to perform various operations discussed hereinabove.
  • a computer-readable signal medium includes a data signal having computer-readable program code embodied therein (e.g., in baseband or as part of a carrier wave), which is communicated (e.g., electronically, electromagnetically, and/or optically) via wireline, wireless, optical fiber cable, and/or any suitable combination thereof.

Abstract

At least one signal is received that represents speech and noise. In response to the at least one signal, frequency bands are generated of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal. Within a kth frequency band of the at least one signal: a first ratio is determined of a clean version of the speech for a preceding time frame to the noise for the preceding time frame; and a second ratio is determined of a noisy version of the speech for the time frame n to the noise for the time frame n. In response to the first and second ratios, a gain is determined for the kth frequency band of the output channel for the time frame n.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims priority to U.S. Provisional Patent Application Ser. No. 61/526,962, filed Aug. 24, 2011, entitled JOINT A PRIORI SNR AND POSTERIOR SNR ESTIMATION FOR BETTER SNR ESTIMATION AND SNR-ATTENUATION MAPPING IN NON-LINEAR PROCESSING NOISE SUPPRESSOR, naming Takahiro Unno as inventor, which is hereby fully incorporated herein by reference for all purposes.
  • BACKGROUND
  • The disclosures herein relate in general to audio processing, and in particular to a method, system and computer program product for attenuating noise in multiple time frames.
  • In mobile telephone conversations, improving quality of uplink speech is an important and challenging objective. For attenuating noise, a spectral subtraction technique has various shortcomings, because it estimates a posteriori speech-to-noise ratio (“SNR”) instead of a priori SNR. Conversely, a minimum mean-square error (“MMSE”) technique has various shortcomings, because it estimates a priori SNR instead of a posteriori SNR. Those shortcomings are especially significant if a level of the noise is high.
  • SUMMARY
  • At least one signal is received that represents speech and noise. In response to the at least one signal, frequency bands are generated of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal. Within a kth frequency band of the at least one signal: a first ratio is determined of a clean version of the speech for a preceding time frame to the noise for the preceding time frame; and a second ratio is determined of a noisy version of the speech for the time frame n to the noise for the time frame n. In response to the first and second ratios, a gain is determined for the kth frequency band of the output channel for the time frame n.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a perspective view of a mobile smartphone that includes an information handling system of the illustrative embodiments.
  • FIG. 2 is a block diagram of the information handling system of the illustrative embodiments.
  • FIG. 3 is an information flow diagram of an operation of the system of FIG. 2.
  • FIG. 4 is an information flow diagram of a blind source separation operation of FIG. 3.
  • FIG. 5 is an information flow diagram of a post processing operation of FIG. 3.
  • FIG. 6 is a graph of various frequency bands that are applied by a discrete Fourier transform (“DFT”) filter bank operation of FIG. 5.
  • FIG. 7 is a graph of noise suppression gain in response to a signal's a posteriori speech-to-noise ratio (“SNR”) and estimated a priori SNR, in accordance with one example of the illustrative embodiments.
  • FIG. 8 is a graph that shows example levels of a signal and an estimated noise floor, as they vary over time.
  • DETAILED DESCRIPTION
  • FIG. 1 is a perspective view of a mobile smartphone, indicated generally at 100, that includes an information handling system of the illustrative embodiments. In this example, the smartphone 100 includes a primary microphone, a secondary microphone, an ear speaker, and a loud speaker, as shown in FIG. 1. Also, the smartphone 100 includes a touchscreen and various switches for manually controlling an operation of the smartphone 100.
  • FIG. 2 is a block diagram of the information handling system, indicated generally at 200, of the illustrative embodiments. A human user 202 speaks into the primary microphone (FIG. 1), which converts sound waves of the speech (from the user 202) into a primary voltage signal V1. The secondary microphone (FIG. 1) converts sound waves of noise (e.g., from an ambient environment that surrounds the smartphone 100) into a secondary voltage signal V2. Also, the signal V1 contains the noise, and the signal V2 contains leakage of the speech.
  • A control device 204 receives the signal V1 (which represents the speech and the noise) from the primary microphone and the signal V2 (which represents the noise and leakage of the speech) from the secondary microphone. In response to the signals V1 and V2, the control device 204 outputs: (a) a first electrical signal to a speaker 206; and (b) a second electrical signal to an antenna 208. The first electrical signal and the second electrical signal communicate speech from the signals V1 and V2, while suppressing at least some noise from the signals V1 and V2.
  • In response to the first electrical signal, the speaker 206 outputs sound waves, at least some of which are audible to the human user 202. In response to the second electrical signal, the antenna 208 outputs a wireless telecommunication signal (e.g., through a cellular telephone network to other smartphones). In the illustrative embodiments, the control device 204, the speaker 206 and the antenna 208 are components of the smartphone 100, whose various components are housed integrally with one another. Accordingly in a first example, the speaker 206 is the ear speaker of the smartphone 100. In a second example, the speaker 206 is the loud speaker of the smartphone 100.
  • The control device 204 includes various electronic circuitry components for performing the control device 204 operations, such as: (a) a digital signal processor (“DSP”) 210, which is a computational resource for executing and otherwise processing instructions, and for performing additional operations (e.g., communicating information) in response thereto; (b) an amplifier (“AMP”) 212 for outputting the first electrical signal to the speaker 206 in response to information from the DSP 210; (c) an encoder 214 for outputting an encoded bit stream in response to information from the DSP 210; (d) a transmitter 216 for outputting the second electrical signal to the antenna 208 in response to the encoded bit stream; (e) a computer-readable medium 218 (e.g., a nonvolatile memory device) for storing information; and (f) various other electronic circuitry (not shown in FIG. 2) for performing other operations of the control device 204.
  • The DSP 210 receives instructions of computer-readable software programs that are stored on the computer-readable medium 218. In response to such instructions, the DSP 210 executes such programs and performs its operations, so that the first electrical signal and the second electrical signal communicate speech from the signals V1 and V2, while suppressing at least some noise from the signals V1 and V2. For executing such programs, the DSP 210 processes data, which are stored in memory of the DSP 210 and/or in the computer-readable medium 218. Optionally, the DSP 210 also receives the first electrical signal from the amplifier 212, so that the DSP 210 controls the first electrical signal in a feedback loop.
  • In an alternative embodiment, the primary microphone (FIG. 1), the secondary microphone (FIG. 1), the control device 204 and the speaker 206 are components of a hearing aid for insertion within an ear canal of the user 202. In one version of such alternative embodiment, the hearing aid omits the antenna 208, the encoder 214 and the transmitter 216.
  • FIG. 3 is an information flow diagram of an operation of the system 200. In accordance with FIG. 3, the DSP 210 performs an adaptive linear filter operation to separate the speech from the noise. In FIG. 3, s1[n] and s2[n] represent the speech (from the user 202) and the noise (e.g., from an ambient environment that surrounds the smartphone 100), respectively, during a time frame n. Further, x1[n] and x2[n] are digitized versions of the signals V1 and V2, respectively, of FIG. 2.
  • Accordingly: (a) x1[n] contains information that primarily represents the speech, but also the noise; and (b) x2[n] contains information that primarily represents the noise, but also leakage of the speech. The noise includes directional noise (e.g., a different person's background speech) and diffused noise. The DSP 210 performs a dual-microphone blind source separation (“BSS”) operation, which generates y1[n] and y2[n] in response to x1[n] and x2[n], so that: (a) y1[n] is a primary channel of information that represents the speech and the diffused noise while suppressing most of the directional noise from x1[n]; and (b) y2[n] is a secondary channel of information that represents the noise while suppressing most of the speech from x2[n].
  • After the BSS operation, the DSP 210 performs a non-linear post processing operation for suppressing noise, without estimating a phase of y1[n]. In the post processing operation, the DSP 210: (a) in response to y2[n], estimates the diffused noise within y1[n]; and (b) in response to such estimate, generates s1[n], which is an output channel of information that represents the speech while suppressing most of the noise from y1[n]. As discussed hereinabove in connection with FIG. 2, the DSP 210 outputs such ŝ1[n] information to: (a) the AMP 212, which outputs the first electrical signal to the speaker 206 in response to such ŝ1[n] information; and (b) the encoder 214, which outputs the encoded bit stream to the transmitter 216 in response to such ŝ1[n] information. Optionally, the DSP 210 writes such ŝ1[n] information for storage on the computer-readable medium 218.
  • FIG. 4 is an information flow diagram of the BSS operation of FIG. 3. A speech estimation filter H1: (a) receives x1[n], y1[n] and y2[n]; and (b) in response thereto, adaptively outputs an estimate of speech that exists within y1[n]. A noise estimation filter H2: (a) receives x2[n], y1[n] and y2[n]; and (b) in response thereto, adaptively outputs an estimate of directional noise that exists within y2[n].
  • As shown in FIG. 4, y1[n] is a difference between: (a) x1[n]; and (b) such estimated directional noise from the noise estimation filter H2. In that manner, the BSS operation iteratively removes such estimated directional noise from x1[n], so that y1[n] is a primary channel of information that represents the speech and the diffused noise while suppressing most of the directional noise from x1[n]. Further, as shown in FIG. 4, y2[n] is a difference between: (a) x2[n]; and (b) such estimated speech from the speech estimation filter H1. In that manner, the BSS operation iteratively removes such estimated speech from x2[n], so that y2[n] is a secondary channel of information that represents the noise while suppressing most of the speech from x2[n].
  • The filters H1 and H2 are adapted to reduce cross-correlation between y1[n] and y2[n], so that their filter lengths (e.g., 20 filter taps) are sufficient for estimating: (a) a path of the speech from the primary channel to the secondary channel; and (b) a path of the directional noise from the secondary channel to the primary channel. In the BSS operation, the DSP 210 estimates a level of a noise floor (“noise level”) and a level of the speech (“speech level”).
  • The DSP 210 computes the speech level by autoregressive (“AR”) smoothing (e.g., with a time constant of 20 ms). The DSP 210 estimates the speech level as Ps[n]=α·Ps[n−1]+(1−α)·y1[n]2, where: (a) α=exp(−1/Fsτ); (b) Ps[n] is a power of the speech during the time frame n; (c) Ps[n−1] is a power of the speech during the immediately preceding time frame n−1; and (d) Fs is a sampling rate. In one example, α=0.95, and τ=0.02.
  • The DSP 210 estimates the noise level (e.g., once per 10 ms) as: (a) if Ps[n]>PN[n−1]·Cu, then PN[n]=PN[n−1]·Cu, where PN[n] is a power of the noise level during the time frame n, PN[n−1] is a power of the noise level during the immediately preceding time frame n−1, and Cu is an upward time constant; or (b) if Ps[n]<PN[n−1]·Cd, then PN[n]=PN[n−1]·Cd, where Cd is a downward time constant; or (c) if neither (a) nor (b) is true, then PN[n]=Ps[n]. In one example, Cu is 3 dB/sec, and Cd is −24 dB/sec.
  • FIG. 5 is an information flow diagram of the post processing operation. FIG. 6 is a graph of various frequency bands that are applied by a discrete Fourier transform (“DFT”) filter bank operation of FIG. 5. As shown in FIG. 6, each frequency band partially overlaps its neighboring frequency bands by fifty percent (50%) apiece. For example, in FIG. 6, one frequency band ranges from B Hz to D Hz, and such frequency band partially overlaps: (a) a frequency band that ranges from A Hz to C Hz; and (b) a frequency band that ranges from C Hz to E Hz.
  • A particular band is referenced as the kth band, where: (a) k is an integer that ranges from 1 through N; and (b) N is a total number of such bands. In the illustrative embodiment, N=64. Referring again to FIG. 5, in the DFT filter bank operation, the DSP 210: (a) receives y1[n] and y2[n] from the BSS operation; (b) converts y1[n] from a time domain to a frequency domain, and decomposes the frequency domain version of y1[n] into a primary channel of the N bands, which are y1[n, 1] through y1[n, N]; and (c) converts y2[n] from time domain to frequency domain, and decomposes the frequency domain version of y2[n] into a secondary channel of the N bands, which are y2[n, 1] through y2[n, N].
  • As shown in FIG. 5, for each of the N bands, the DSP 210 performs a noise suppression operation, such as a spectral subtraction operation, minimum mean-square error (“MMSE”) operation, or maximum likelihood (“ML”) operation. For the kth band, such operation is denoted as the Kk noise suppression operation. Accordingly, in the Kk noise suppression operation, the DSP 210: (a) in response to the secondary channel's kth band y2[n, k], estimates the diffused noise within the primary channel's kth band y1[n, k]; (b) in response to such estimate, computes the kth band's respective noise suppression gain G[n, k] for the time frame n; and (c) generates a respective noise-suppressed version ŝ1[n, k] of the primary channel's kth band y1[n, k] by applying G[n, k] thereto (e.g., by multiplying G[n, k] and the primary channel's kth band y1[n, k] for the time frame n). After the DSP 210 generates the respective noise-suppressed versions ŝ1[n, k] of all N bands of the primary channel for the time frame n, the DSP 210 composes ŝ1[n] for the time frame n by performing an inverse of the DFT filter bank operation, in order to convert a sum of those noise-suppressed versions s1[n, k] from a frequency domain to a time domain. In real-time causal implementations of the system 200, a band's G[n, k] is variable per time frame n.
  • FIG. 7 is a graph of noise suppression gain G[n, k] in response to a signal's a posteriori SNR and estimated a priori SNR, in accordance with one example of the illustrative embodiments. Accordingly, in the illustrative embodiments, the DSP 210 computes the kth band's respective noise suppression gain G[n, k] in response to both: (a) a posteriori SNR, which is a logarithmic ratio between a noisy version of the signal's energy (e.g., speech and diffused noise as represented by y1[n, k]) and the noise's energy (e.g., as represented by y2[n, k]); and (b) estimated a priori SNR, which is a logarithmic ratio between a clean version of the signal's energy (e.g., as estimated by the DSP 210) and the noise's energy (e.g., as represented by y2[n, k]). During the time frame n, the kth band's then-current a priori SNR is not yet determined exactly, so the DSP 210 updates its decision-directed estimate of the kth band's then-current a priori SNR in response to G[n−1, k] and y1[n−1, k] for the immediately preceding time frame n−1.
  • For the time frame n, the DSP 210 computes:

  • P y 1 [n,k]=α·P y 1 [n,k]+(1−α)·(y 1 R [n,k] 2 +y 1 I [n,k] 2), and

  • P y 2 [n,k]=α·P y 2 [n,k]+(1−α)·(y 2 R [n,k] 2 +y 2 I [n,k] 2),
  • where:
    (a) Py 1 [n, k] is AR smoothed power of y1[n, k] in the kth band; (b) Py 2 [n, k] is AR smoothed power of y2[n, k] in the kth band; (c) y1 R [n, k] and y1 I [n, k] are real and imaginary parts of y1[n, k]; and (d) y2R[n, k] and y2 I [n, k] are real and imaginary parts of y2[n, k]. In one example, α=0.95.
  • The DSP 210 computes its estimate of a priori SNR as:

  • a priori SNR=P s [n−1,k]/P y 2 [n−1,k],
  • where:
    (a) Ps[n−1, k] is estimated power of clean speech for the immediately preceding time frame n−1; and (b) Py 2 [n−1, k] is AR smoothed power of y2[n−1, k] in the kth band for the immediately preceding time frame n−1.
  • However, if Py 2 [n−1, k] is unavailable (e.g., if the secondary voltage signal V2 is unavailable), then the DSP 210 computes its estimate of a priori SNR as:

  • a priori SNR=P s [n−1,k]/P N [n−1,k],
  • where:
    (a) PN[n−1, k] is an estimate of noise level within y1[n−1, k]; and (b) the DSP 210 estimates PN[n−1, k] in the same manner as discussed hereinbelow in connection with FIG. 8.
  • The DSP 210 computes Ps[n−1, k] as:

  • P s [n−1,k]=G[n−1,k] 2 ·P y 1 [n−1,k],
  • where:
    (a) G[n−1, k] is the kth band's respective noise suppression gain for the immediately preceding time frame n−1; and (b) Py 1 [n−1, k] is AR smoothed power of y1[n−1, k] in the kth band for the immediately preceding time frame n−1.
  • The DSP 210 computes a posteriori SNR as:

  • a posteriori SNR=P y 1 [n,k]/P y 2 [n,k].
  • However, if Py 2 [n, k] is unavailable (e.g., if the secondary voltage signal V2 is unavailable), then the DSP 210 computes a posteriori SNR as:

  • a posteriori SNR=P y 1 [n,k]/P N [n,k],
  • where:
    (a) PN[n, k] is an estimate of noise level within y1[n, k]; and (b) the DSP 210 estimates PN[n, k] in the same manner as discussed hereinbelow in connection with FIG. 8.
  • In FIG. 7, various spectral subtraction curves show how G[n, k] (“attenuation”) varies in response to both a posteriori SNR and estimated a priori SNR. One of those curves (“unshifted curve”) is a baseline curve of a relationship between a posteriori SNR and G[n, k]. But the DSP 210 shifts the baseline curve horizontally (either left or right by a variable amount X) in response to estimated a priori SNR, as shown by the remaining curves of FIG. 7. A relationship between curve shift X and estimated a priori SNR was experimentally determined as X=estimated a priori SNR−15 dB.
  • For example, if estimated a priori SNR is relatively high, then X is positive, so that the DSP 210 shifts the baseline curve left (which effectively increases G[n, k]), because the positive X indicates that y1[n, k] likely represents a smaller percentage of noise. Conversely, if estimated a priori SNR is relatively low, then X is negative, so that the DSP 210 shifts the baseline curve right (which effectively reduces G[n, k]), because the negative X indicates that y1[n, k] likely represents a larger percentage of noise. In this manner, the DSP 210 smooths G[n, k] transition and thereby reduces its rate of change, so that the DSP 210 reduces an extent of annoying musical noise artifacts (but without producing excessive smoothing distortion, such as reverberation), while nevertheless updating G[n, k] with sufficient frequency to handle relatively fast changes in the signals V1 and V2. To further achieve those objectives in various embodiments, the DSP 210 shifts the baseline curve horizontally (either left or right by a first variable amount) and/or vertically (either up or down by a second variable amount) in response to estimated a priori SNR, so that the baseline curve shifts in one dimension (e.g., either horizontally or vertically) or multiple dimensions (e.g., both horizontally and vertically).
  • In one example of the illustrative embodiments, the DSP 210 implements the curve shift X by precomputing an attenuation table of G[n, k] values (in response to various combinations of a posteriori SNR and estimated a priori SNR) for storage on the computer-readable medium 218, so that the DSP 210 determines G[n, k] in real-time operation by reading G[n, k] from such attenuation table in response to a posteriori SNR and estimated a priori SNR. In one version of the illustrative embodiments, the DSP 210 implements the curve shift X by computing G[n, k] as:

  • G[n,k]=√(1−(100.1·CurveSNR)0.01,
  • where CurveSNR=X·a posteriori SNR.
  • However, the DSP 210 imposes a floor on G[n, k] to ensure that G[n, k] is always greater than or equal to a value of the floor, which is programmable as a runtime parameter. In that manner, the DSP 210 further reduces an extent of annoying musical noise artifacts. In the example of FIG. 7, such floor value is −20 dB.
  • FIG. 8 is a graph that shows example levels of Px 1 [n] and PN[n], as they vary over time, where: (a) Px 1 [n] is a power of x1[n]; (b) Px 1 [n] is denoted as “signal” in FIG. 8; and (c) PN[n] is denoted as “estimated noise floor level” in FIG. 8. In the example of FIG. 8, the DSP 210 estimates PN[n] in response to Px 1 [n] for the BSS operation of FIGS. 3 and 4. In another example, if Py 2 [n, k] is unavailable (e.g., if the secondary voltage signal V2 is unavailable), then the DSP 210 estimates PN[n] in response to Py 1 [n] (instead of Px 1 [n]) for the post processing operation of FIGS. 3 and 5, as discussed hereinabove in connection with FIG. 7.
  • In response to Px 1 [n] exceeding PN[n] by more than a specified amount (“GAP”) for more than a specified continuous duration, the DSP 210: (a) determines that such excess is more likely representative of noise level increase instead of speech; and (b) accelerates its adjustment of PN[n]. In the illustrative embodiments, the DSP 210 measures the specified continuous duration as a specified number (“MAX”) of consecutive time frames, which aggregately equate to at least such duration (e.g., 0.8 seconds).
  • In response to Px 1 [n] exceeding PN[n] by less than GAP and/or for less than MAX consecutive time frames (e.g., between a time T3 and a time T5 in the example of FIG. 8), the DSP 210 determines that such excess is more likely representative of speech instead of additional noise. For example, if Px 1 [n]≦PN[n]·GAP, then Count[n]=0, and the DSP 210 clears an initialization flag. In response to the initialization flag being cleared, the DSP 210 estimates PN[n] according to the time constants Cu and Cd (discussed hereinabove in connection with FIG. 4), so that PN[n] falls more quickly than it rises.
  • Conversely, if Px 1 [n]>PN[n]·GAP, then Count[n]=Count[n−1]+1. If Count[n]>MAX, then the DSP 210 sets the initialization flag. In response to the initialization flag being set, the DSP 210 estimates PN[n] with a faster time constant (e.g., in the same manner as the DSP 210 estimates Ps[n] discussed hereinabove in connection with FIG. 4), so that PN[n] rises approximately as quickly as it falls. In an alternative embodiment, instead of determining whether Px 1 [n]≦PN[n]·GAP, the DSP 210 determines whether Px 1 [n]≦PN[n]+GAP, so that: (a), if Px 1 [n]≦PN[n]+GAP, then Count[n]=0, and the DSP 210 clears the initialization flag; and (b) if Px 1 [n]>PN[n]+GAP, then Count[n]=Count[n−1]+1.
  • In the example of FIG. 8: (a) Px 1 [n] quickly rises at a time T1; (b) shortly after T1, Px 1 [n] exceeds PN[n] by more than GAP; (c) at a time T2, more than MAX consecutive time frames have elapsed since T1; and (d) in response to Px 1 [n] exceeding PN[n] by more than GAP for more than MAX consecutive time frames, the DSP 210 sets the initialization flag and estimates PN[n] with the faster time constant. By comparison, if the DSP 210 always estimated PN[n] according to the time constants Cu and Cd, then the DSP 210 would have adjusted PN[n] with less precision and less speed (e.g., as shown by the “slower adjustment” line of FIG. 8). Also, in one embodiment, while initially adjusting PN[n] during its first 0.5 seconds of operation, the DSP 210 sets the initialization flag and estimates PN[n] with the faster time constant.
  • In the illustrative embodiments, a computer program product is an article of manufacture that has: (a) a computer-readable medium; and (b) a computer-readable program that is stored on such medium. Such program is processable by an instruction execution apparatus (e.g., system or device) for causing the apparatus to perform various operations discussed hereinabove (e.g., discussed in connection with a block diagram). For example, in response to processing (e.g., executing) such program's instructions, the apparatus (e.g., programmable information handling system) performs various operations discussed hereinabove. Accordingly, such operations are computer-implemented.
  • Such program (e.g., software, firmware, and/or microcode) is written in one or more programming languages, such as: an object-oriented programming language (e.g., C++); a procedural programming language (e.g., C); and/or any suitable combination thereof. In a first example, the computer-readable medium is a computer-readable storage medium. In a second example, the computer-readable medium is a computer-readable signal medium.
  • A computer-readable storage medium includes any system, device and/or other non-transitory tangible apparatus (e.g., electronic, magnetic, optical, electromagnetic, infrared, semiconductor, and/or any suitable combination thereof) that is suitable for storing a program, so that such program is processable by an instruction execution apparatus for causing the apparatus to perform various operations discussed hereinabove. Examples of a computer-readable storage medium include, but are not limited to: an electrical connection having one or more wires; a portable computer diskette; a hard disk; a random access memory (“RAM”); a read-only memory (“ROM”); an erasable programmable read-only memory (“EPROM” or flash memory); an optical fiber; a portable compact disc read-only memory (“CD-ROM”); an optical storage device; a magnetic storage device; and/or any suitable combination thereof.
  • A computer-readable signal medium includes any computer-readable medium (other than a computer-readable storage medium) that is suitable for communicating (e.g., propagating or transmitting) a program, so that such program is processable by an instruction execution apparatus for causing the apparatus to perform various operations discussed hereinabove. In one example, a computer-readable signal medium includes a data signal having computer-readable program code embodied therein (e.g., in baseband or as part of a carrier wave), which is communicated (e.g., electronically, electromagnetically, and/or optically) via wireline, wireless, optical fiber cable, and/or any suitable combination thereof.
  • Although illustrative embodiments have been shown and described by way of example, a wide range of alternative embodiments is possible within the scope of the foregoing disclosure.

Claims (30)

1. A method performed by an information handling system for attenuating noise, the method comprising:
receiving at least one signal that represents speech and the noise; and
in response to the at least one signal, generating frequency bands of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal;
wherein the frequency bands include at least N frequency bands, wherein k is an integer number that ranges from 1 through N, and wherein generating a kth frequency band of the output channel for a time frame n includes: within the kth frequency band of the at least one signal, determining a first ratio of a clean version of the speech for a preceding time frame to the noise for the preceding time frame, and determining a second ratio of a noisy version of the speech for the time frame n to the noise for the time frame n; in response to the first and second ratios, determining a gain for the time frame n; and generating the kth frequency band of the output channel for the time frame n in response to multiplying the gain for the time frame n and the kth frequency band of the at least one signal for the time frame n.
2. The method of claim 1, wherein the frequency bands include at least first and second frequency bands that partially overlap one another.
3. The method of claim 1, and comprising: performing a filter bank operation for converting a time domain version of the at least one signal to the frequency bands of the at least one signal.
4. The method of claim 3, and comprising: generating the output channel, wherein generating the output channel includes performing an inverse of the filter bank operation for converting a sum of the frequency bands of the output channel to a time domain.
5. The method of claim 1, wherein the at least one signal includes: a first signal that represents the speech and the noise; and a second signal that represents at least the noise.
6. The method of claim 5, wherein the noise includes directional noise and diffused noise, wherein the second signal represents the noise and leakage of the speech, and comprising:
in response to the first and second signals, generating: a first channel that represents the speech and the diffused noise while attenuating most of the directional noise from the first signal; and a second channel that represents the noise while attenuating most of the speech from the second signal; and
in response to the first and second channels, generating the frequency bands of the output channel that represents the speech while attenuating most of the noise from the first channel.
7. The method of claim 6, wherein generating the kth frequency band of the output channel for a time frame n includes: from the second channel, determining the noise for the preceding time frame and determining the noise for the time frame n.
8. The method of claim 1, wherein generating the kth frequency band of the output channel for a time frame n includes: determining the clean version of the speech for the preceding time frame by multiplying:
a square of a gain for the preceding time frame; and
a noisy version of the speech for the preceding time frame.
9. The method of claim 1, and comprising: imposing a floor on the gain for the time frame n.
10. The method of claim 1, wherein determining the gain for the time frame n includes: in response to the first ratio, shifting a curve of a relationship between the second ratio and the gain for the time frame n.
11. A system for attenuating noise, the system comprising:
at least one device for: receiving at least one signal that represents speech and the noise; and, in response to the at least one signal, generating frequency bands of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal;
wherein the frequency bands include at least N frequency bands, wherein k is an integer number that ranges from 1 through N, and wherein generating a kth frequency band of the output channel for a time frame n includes: within the kth frequency band of the at least one signal, determining a first ratio of a clean version of the speech for a preceding time frame to the noise for the preceding time frame, and determining a second ratio of a noisy version of the speech for the time frame n to the noise for the time frame n; in response to the first and second ratios, determining a gain for the time frame n; and generating the kth frequency band of the output channel for the time frame n in response to multiplying the gain for the time frame n and the kth frequency band of the at least one signal for the time frame n.
12. The system of claim 11, wherein the frequency bands include at least first and second frequency bands that partially overlap one another.
13. The system of claim 11, wherein the at least one device is for: performing a filter bank operation for converting a time domain version of the at least one signal to the frequency bands of the at least one signal.
14. The system of claim 13, wherein the at least one device is for: generating the output channel, wherein generating the output channel includes performing an inverse of the filter bank operation for converting a sum of the frequency bands of the output channel to a time domain.
15. The system of claim 11, wherein the at least one signal includes: a first signal that represents the speech and the noise; and a second signal that represents at least the noise.
16. The system of claim 15, wherein the noise includes directional noise and diffused noise, wherein the second signal represents the noise and leakage of the speech, and wherein the at least one device is for: in response to the first and second signals, generating: a first channel that represents the speech and the diffused noise while attenuating most of the directional noise from the first signal; and a second channel that represents the noise while attenuating most of the speech from the second signal; and, in response to the first and second channels, generating the frequency bands of the output channel that represents the speech while attenuating most of the noise from the first channel.
17. The system of claim 16, wherein generating the kth frequency band of the output channel for a time frame n includes: from the second channel, determining the noise for the preceding time frame and determining the noise for the time frame n.
18. The system of claim 11, wherein generating the kth frequency band of the output channel for a time frame n includes: determining the clean version of the speech for the preceding time frame by multiplying:
a square of a gain for the preceding time frame; and
a noisy version of the speech for the preceding time frame.
19. The system of claim 11, wherein the at least one device is for: imposing a floor on the gain for the time frame n.
20. The system of claim 11, wherein determining the gain for the time frame n includes: in response to the first ratio, shifting a curve of a relationship between the second ratio and the gain for the time frame n.
21. A computer program product for attenuating noise, the computer program product comprising:
a tangible computer-readable storage medium; and
a computer-readable program stored on the tangible computer-readable storage medium, wherein the computer-readable program is processable by an information handling system for causing the information handling system to perform operations including: receiving at least one signal that represents speech and the noise; and, in response to the at least one signal, generating frequency bands of an output channel that represents the speech while attenuating at least some of the noise from the at least one signal;
wherein the frequency bands include at least N frequency bands, wherein k is an integer number that ranges from 1 through N, and wherein generating a kth frequency band of the output channel for a time frame n includes: within the kth frequency band of the at least one signal, determining a first ratio of a clean version of the speech for a preceding time frame to the noise for the preceding time frame, and determining a second ratio of a noisy version of the speech for the time frame n to the noise for the time frame n; in response to the first and second ratios, determining a gain for the time frame n; and generating the kth frequency band of the output channel for the time frame n in response to multiplying the gain for the time frame n and the kth frequency band of the at least one signal for the time frame n.
22. The computer program product of claim 21, wherein the frequency bands include at least first and second frequency bands that partially overlap one another.
23. The computer program product of claim 21, wherein the operations include: performing a filter bank operation for converting a time domain version of the at least one signal to the frequency bands of the at least one signal.
24. The computer program product of claim 23, wherein the operations include: generating the output channel, wherein generating the output channel includes performing an inverse of the filter bank operation for converting a sum of the frequency bands of the output channel to a time domain.
25. The computer program product of claim 21, wherein the at least one signal includes: a first signal that represents the speech and the noise; and a second signal that represents at least the noise.
26. The computer program product of claim 25, wherein the noise includes directional noise and diffused noise, wherein the second signal represents the noise and leakage of the speech, and wherein the operations include: in response to the first and second signals, generating: a first channel that represents the speech and the diffused noise while attenuating most of the directional noise from the first signal; and a second channel that represents the noise while attenuating most of the speech from the second signal; and, in response to the first and second channels, generating the frequency bands of the output channel that represents the speech while attenuating most of the noise from the first channel.
27. The computer program product of claim 26, wherein generating the kth frequency band of the output channel for a time frame n includes: from the second channel, determining the noise for the preceding time frame and determining the noise for the time frame n.
28. The computer program product of claim 21, wherein generating the kth frequency band of the output channel for a time frame n includes: determining the clean version of the speech for the preceding time frame by multiplying:
a square of a gain for the preceding time frame; and
a noisy version of the speech for the preceding time frame.
29. The computer program product of claim 21, wherein the operations include: imposing a floor on the gain for the time frame n.
30. The computer program product of claim 21, wherein determining the gain for the time frame n includes: in response to the first ratio, shifting a curve of a relationship between the second ratio and the gain for the time frame n.
US13/589,237 2011-08-24 2012-08-20 Method, system and computer program product for attenuating noise in multiple time frames Active 2034-04-18 US9666206B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/589,237 US9666206B2 (en) 2011-08-24 2012-08-20 Method, system and computer program product for attenuating noise in multiple time frames
US13/592,708 US20130054233A1 (en) 2011-08-24 2012-08-23 Method, System and Computer Program Product for Attenuating Noise Using Multiple Channels
US13/594,401 US9137611B2 (en) 2011-08-24 2012-08-24 Method, system and computer program product for estimating a level of noise

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161526962P 2011-08-24 2011-08-24
US13/589,237 US9666206B2 (en) 2011-08-24 2012-08-20 Method, system and computer program product for attenuating noise in multiple time frames

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/592,708 Continuation-In-Part US20130054233A1 (en) 2011-08-24 2012-08-23 Method, System and Computer Program Product for Attenuating Noise Using Multiple Channels
US13/594,401 Continuation-In-Part US9137611B2 (en) 2011-08-24 2012-08-24 Method, system and computer program product for estimating a level of noise

Publications (2)

Publication Number Publication Date
US20130054232A1 true US20130054232A1 (en) 2013-02-28
US9666206B2 US9666206B2 (en) 2017-05-30

Family

ID=47744884

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/589,237 Active 2034-04-18 US9666206B2 (en) 2011-08-24 2012-08-20 Method, system and computer program product for attenuating noise in multiple time frames

Country Status (1)

Country Link
US (1) US9666206B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015139938A3 (en) * 2014-03-17 2015-11-26 Koninklijke Philips N.V. Noise suppression
US9257132B2 (en) 2013-07-16 2016-02-09 Texas Instruments Incorporated Dominant speech extraction in the presence of diffused and directional noise sources
GB2580057A (en) * 2018-12-20 2020-07-15 Nokia Technologies Oy Apparatus, methods and computer programs for controlling noise reduction

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4811404A (en) * 1987-10-01 1989-03-07 Motorola, Inc. Noise suppression system
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
US20040049383A1 (en) * 2000-12-28 2004-03-11 Masanori Kato Noise removing method and device
US20070055505A1 (en) * 2003-07-11 2007-03-08 Cochlear Limited Method and device for noise reduction
US20080167866A1 (en) * 2007-01-04 2008-07-10 Harman International Industries, Inc. Spectro-temporal varying approach for speech enhancement
EP2006841A1 (en) * 2006-04-07 2008-12-24 BenQ Corporation Signal processing method and device and training method and device
US20090012786A1 (en) * 2007-07-06 2009-01-08 Texas Instruments Incorporated Adaptive Noise Cancellation
US20090106021A1 (en) * 2007-10-18 2009-04-23 Motorola, Inc. Robust two microphone noise suppression system
US20090164212A1 (en) * 2007-12-19 2009-06-25 Qualcomm Incorporated Systems, methods, and apparatus for multi-microphone based speech enhancement
US20090254340A1 (en) * 2008-04-07 2009-10-08 Cambridge Silicon Radio Limited Noise Reduction
US20090310796A1 (en) * 2006-10-26 2009-12-17 Parrot method of reducing residual acoustic echo after echo suppression in a "hands-free" device
US20130046535A1 (en) * 2011-08-18 2013-02-21 Texas Instruments Incorporated Method, System and Computer Program Product for Suppressing Noise Using Multiple Signals
US20130246060A1 (en) * 2010-11-25 2013-09-19 Nec Corporation Signal processing device, signal processing method and signal processing program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8340278B2 (en) 2009-11-20 2012-12-25 Texas Instruments Incorporated Method and apparatus for cross-talk resistant adaptive noise canceller

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4811404A (en) * 1987-10-01 1989-03-07 Motorola, Inc. Noise suppression system
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
US20040049383A1 (en) * 2000-12-28 2004-03-11 Masanori Kato Noise removing method and device
US20070055505A1 (en) * 2003-07-11 2007-03-08 Cochlear Limited Method and device for noise reduction
EP2006841A1 (en) * 2006-04-07 2008-12-24 BenQ Corporation Signal processing method and device and training method and device
US20090310796A1 (en) * 2006-10-26 2009-12-17 Parrot method of reducing residual acoustic echo after echo suppression in a "hands-free" device
US20080167866A1 (en) * 2007-01-04 2008-07-10 Harman International Industries, Inc. Spectro-temporal varying approach for speech enhancement
US20090012786A1 (en) * 2007-07-06 2009-01-08 Texas Instruments Incorporated Adaptive Noise Cancellation
US20090106021A1 (en) * 2007-10-18 2009-04-23 Motorola, Inc. Robust two microphone noise suppression system
US20090164212A1 (en) * 2007-12-19 2009-06-25 Qualcomm Incorporated Systems, methods, and apparatus for multi-microphone based speech enhancement
US20090254340A1 (en) * 2008-04-07 2009-10-08 Cambridge Silicon Radio Limited Noise Reduction
US20130246060A1 (en) * 2010-11-25 2013-09-19 Nec Corporation Signal processing device, signal processing method and signal processing program
US20130046535A1 (en) * 2011-08-18 2013-02-21 Texas Instruments Incorporated Method, System and Computer Program Product for Suppressing Noise Using Multiple Signals

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9257132B2 (en) 2013-07-16 2016-02-09 Texas Instruments Incorporated Dominant speech extraction in the presence of diffused and directional noise sources
WO2015139938A3 (en) * 2014-03-17 2015-11-26 Koninklijke Philips N.V. Noise suppression
US10026415B2 (en) 2014-03-17 2018-07-17 Koninklijke Philips N.V. Noise suppression
GB2580057A (en) * 2018-12-20 2020-07-15 Nokia Technologies Oy Apparatus, methods and computer programs for controlling noise reduction

Also Published As

Publication number Publication date
US9666206B2 (en) 2017-05-30

Similar Documents

Publication Publication Date Title
TWI463817B (en) System and method for adaptive intelligent noise suppression
US9361901B2 (en) Integrated speech intelligibility enhancement system and acoustic echo canceller
US9137611B2 (en) Method, system and computer program product for estimating a level of noise
US8180064B1 (en) System and method for providing voice equalization
US20120263317A1 (en) Systems, methods, apparatus, and computer readable media for equalization
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
CN106257584B (en) Improved speech intelligibility
US20130322643A1 (en) Multi-Microphone Robust Noise Suppression
CN111418010A (en) Multi-microphone noise reduction method and device and terminal equipment
US10726857B2 (en) Signal processing for speech dereverberation
US11664040B2 (en) Apparatus and method for reducing noise in an audio signal
US20200286501A1 (en) Apparatus and a method for signal enhancement
US20200154202A1 (en) Method and electronic device for managing loudness of audio signal
US8880394B2 (en) Method, system and computer program product for suppressing noise using multiple signals
US9666206B2 (en) Method, system and computer program product for attenuating noise in multiple time frames
US20130054233A1 (en) Method, System and Computer Program Product for Attenuating Noise Using Multiple Channels
Habets et al. Dual-microphone speech dereverberation in a noisy environment
KR101394504B1 (en) Apparatus and method for adaptive noise processing
US11322168B2 (en) Dual-microphone methods for reverberation mitigation
Vashkevich et al. Speech enhancement in a smartphone-based hearing aid
Dashtbozorg et al. Joint Noise Reduction and Dereverberation of Speech Using Hybrid TF-GSC and Adaptive MMSE Estimator

Legal Events

Date Code Title Description
AS Assignment

Owner name: TEXAS INSTRUMENTS INCORPORATED, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UNNO, TAKAHIRO;REEL/FRAME:028810/0430

Effective date: 20120817

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4