CN100466061C - Broadband wave beam forming method and apparatus - Google Patents

Broadband wave beam forming method and apparatus Download PDF

Info

Publication number
CN100466061C
CN100466061C CNB200510090740XA CN200510090740A CN100466061C CN 100466061 C CN100466061 C CN 100466061C CN B200510090740X A CNB200510090740X A CN B200510090740XA CN 200510090740 A CN200510090740 A CN 200510090740A CN 100466061 C CN100466061 C CN 100466061C
Authority
CN
China
Prior art keywords
signal
centerdot
subband
microphone
omega
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB200510090740XA
Other languages
Chinese (zh)
Other versions
CN1866356A (en
Inventor
居太亮
邵怀宗
林静然
彭启琮
余水安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
University of Electronic Science and Technology of China
Original Assignee
Huawei Technologies Co Ltd
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, University of Electronic Science and Technology of China filed Critical Huawei Technologies Co Ltd
Priority to CNB200510090740XA priority Critical patent/CN100466061C/en
Publication of CN1866356A publication Critical patent/CN1866356A/en
Application granted granted Critical
Publication of CN100466061C publication Critical patent/CN100466061C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The disclosed forming method for broadband waveform comprises: determining the sub-band signal opposite to the microphone signal, as well as the signal frequency-domain correlation matrix; according to 3D space transmission vector of signal source and former matrix, determining the weight vector for every sub-band signal; then deciding the output signal. This invention combines frequency and space domain for speech process, and improves SNR for wide application.

Description

A kind of broadband beams forms method and apparatus
Technical field
The present invention relates to audio signal processing technique, be specifically related to a kind of broadband beams and form method and apparatus.
Background technology
Flourish along with modern science, communication or message exchange have become the necessary condition that human society exists, and voice are as the acoustics performance of language, are that human exchange of information is the most natural, the most effective, one of the means of most convenient.
In voice communication course, can be subjected to interference inevitably from noise, communication facilities internal electrical noise and even other talkers of surrounding environment, transmission medium introducing.These disturb the voice that the voice recipient is received no longer is pure raw tone, but the noisy speech of being crossed by noise pollution.For example, the phone in automobile, street, the airport, regular meeting be subjected to strong background noise in disturbing, thereby had a strong impact on speech quality.The pollution of neighbourhood noise also can make the performance rapid deterioration of many speech processing systems.For example, speech recognition system makes substantial progress, just steps into the practical stage, but, present speech recognition system is mostly worked in quiet environment, and especially in strong noise environment, the discrimination of speech recognition system will be had a strong impact in noise circumstance.Low rate voice coding, particularly parameter coding also run into similar problem.Because speech production model is the basis of low rate coding, when the extraction of model parameter is subjected to being mingled in when ground unrest seriously disturbs in the voice, the quality of reconstructed speech is rapid deterioration, even becomes and can not understand fully.
Speech enhancement technique can effectively suppress ground unrest, improves voice communication quality, improves the antijamming capability of speech processing system, keeps the performance of speech processing system.Therefore, the research speech enhancement technique has significant values in actual applications.Oneself has obtained application more and more widely in fields such as speech processing system, communication system, multimedia technology, digitizing household electrical appliances speech enhancement technique.
The fundamental purpose of speech enhancement technique is: extract pure as far as possible raw tone from Noisy Speech Signal.Yet, all be at random owing to disturb usually, extracting fully from noisy speech, pure voice almost are impossible.In the case, the fundamental purpose that voice strengthen is: by noisy speech is handled, to eliminate ground unrest, improve voice quality, improve sharpness, intelligibility and the comfort level of voice, improve the performance of speech processing system.These purposes often can not get both, and need decide according to the concrete needs of speech processing system usually.
The research of speech enhancement technique starts from 20th century the mid-1970s.Along with the maturation of digital signal processing theory, voice strengthen an important branch that has developed into field of voice signal.1978, Lim and Oppenheim proposed the Wiener filtering method that voice strengthen.1979, Boll proposed the spectrum subtraction method and has suppressed noise.1980, Maulay and Malpass proposed the soft-decision noise suppressing method.1984, Ephraim and Malah proposed the sound enhancement method based on MMSE short-time spectrum amplitude Estimation.1987, Paliwal was incorporated into voice to Kalman filtering and strengthens the field.In nearly 30 years research, various sound enhancement methods constantly are suggested, and it has been established the basis of voice enhancing theory and has made it to move to maturity gradually.
In the last few years, along with the development of VLSI (VLSI (very large scale integrated circuit)) technology and the appearance of high-speed dsp (digital signal processing) chip, it is practical that speech enhancement technique is progressively moved towards, and emerges in large numbers in succession again with stylish speech enhancement technique.
Voice strengthen, the method for denoising can simply be divided into the filtering technique based on time domain, frequency domain and spatial domain, as the speech enhancement technique of Wiener filtering, based on the spectrum cancellation technology of frequency domain etc.In recent years, the ARRAY PROCESSING technology also is incorporated in the speech processes, has formed the airspace filter technology based on wave beam, as time delay summation beam-forming technology (DSB) etc.
MVDR (Minim Variance Distortion Response, the arrowband minimum variance is undistorted) beam-forming technology is mainly used in traditional narrow band signal processing procedure.
Be set with M sensor composition aerial array as shown in Figure 1, receive from direction
Figure C200510090740D0008144016QIETU
Narrow band signal s (t), the key step of carrying out airspace filter with the MVDR beam-forming technology to received signal is as follows:
Step 1, the analog signal conversion that each sensor is received are digital signal, and digital signal is formed input data matrix X (n):
X(n)=[x 1(n)x 2(n)…x M(n)] T (1)
Wherein, [] TExpression is made transposition computing, x to matrix or vector i(n) expression n constantly i sensor receive pass through digital signal and i=1 after the AD conversion ..., M.
To step 2, the L point data of getting L snap, promptly get n constantly, n-1 constantly ..., data that n-L+1 gathers on each sensor constantly, and ask the frequency domain correlation matrix R of input signal according to formula (2):
R = 1 L Σ l = 1 L { X ( n - l + 1 ) * X H ( n - l + 1 ) } - - - ( 2 )
Wherein, [] HExpression is made transposition and each element is got common volume computing matrix or vector.For example:
A = 1 + 2 i 2 + 4 i 4 - 4 i 5 - 8 i , Then A H = 1 - 2 i 4 + 4 i 2 - 4 i 5 + 8 i .
To step 3, according to the direction of signal source Obtain the direction vector a of signal source with array topology.When obtaining the direction vector a of signal source, array topology is not limit, as can being uniform circular array, uniform straight line array or other array structure, and the direction of signal source
Figure C200510090740D0009144053QIETU
Preparation method do not limit.
Setting M sensor is reference point with spacing d composition uniform straight line array row and with first sensor, and then the direction vector a of signal source is:
a=[1?e -jφ…e -j(M-1)φ] T (3)
In formula (3), φ be space phase and
Figure C200510090740D00094
Wherein, λ is the wavelength of incoming signal, and d is an array pitch,
Figure C200510090740D0009144116QIETU
Incident angle for incoming signal.To step 4, ask optimal weight vector W according to the direction vector a and the frequency domain correlation matrix R of signal source Opt:
W opt = R - 1 a a H R - 1 a - - - ( 5 )
To step 5, input signal is carried out spatial filtering, obtain output signal y (n) according to optimal weight vector:
y ( n ) = W opt H * X ( n ) - - - ( 6 )
Then, converting digital signal y (n) to simulating signal gets final product.
Above-mentioned MVDR beam-forming technology can only be applicable to the narrow band signal source, when this method is used for wideband signal source, its voice are strengthened the property and can be descended significantly, and, this technology can only be applicable to the far-field signal source, and promptly incoming signal is a plane wave, when this technology is applicable to the near-field signals source, be incoming signal when being spherical wave, voice are strengthened the property and can be descended significantly equally.
Summary of the invention
The objective of the invention is to, provide a kind of broadband beams to form method and apparatus, by voice signal being handled, to realize improving the purpose that voice are strengthened the property in conjunction with frequency domain and spatial domain.
For achieving the above object, a kind of broadband beams formation method provided by the invention comprises:
Each subband signal that the signal of a, definite each microphone of input is corresponding respectively;
B, determine the frequency domain correlation matrix of described each subband signal;
C, determine the weight vector of each subband signal according to the three dimensions used for vector transfer of signal source, described each frequency domain correlation matrix;
D, determine the signal of output according to the weight vector of described each subband signal and each subband signal.
Described step a specifically comprises:
A1, the signal of importing each microphone is carried out speech detection, and definite speech frame;
A2, determine each subband signal of described speech frame correspondence.
The signal of setting each microphone of input is: F (t)=[f i(t) ... f i(t) ... f M(t)] T
Wherein: f i(t) i signal that microphone receives of expression, i=1 ..., M, M are the quantity of microphone, [] TExpression is done the transposition computing to matrix or vector; And described step a1 specifically comprises the steps:
A11, the signal of importing each microphone is carried out AD conversion according to predetermined sampling frequency:
F (n)=[f 1(n) ... f i(n) ... f M(n)] TWherein, n is a discrete time;
Choose signal frame a12, the signal after described AD changes and carry out short time discrete Fourier transform:
F ( ω ) = Σ m = 1 N F ( n ) w ( n - m ) exp ( - jωm ) = Σ m = 1 N f 1 ( n ) w ( n - m ) exp ( - jωm ) · · · Σ m = 1 N f M ( n ) w ( n - m ) exp ( - jωm ) ; Wherein, w (n) is a window function, and n, m are discrete time;
A13, the signal frame behind the Fourier transform is carried out speech detection, determine speech frame.
Described step a13 specifically comprises the steps:
Signal frame behind the Fourier transform is carried out speech detection;
When definite signal frame is not speech frame, this signal frame is stored as current estimating noise spectrum;
When definite signal frame is speech frame, according to current estimating noise spectrum described speech frame is composed the counteracting denoising, determine that the speech frame S (ω) after the spectrum counteracting denoising is:
S ( ω ) = F ( ω ) - N ( ω ) = s 1 ( 1 ) · · · s 1 ( NFFT ) · · · · · · · · · s M ( 1 ) · · · s M ( NFFT ) M × NFFT ;
Wherein: N ( ω ) = n 1 ( 1 ) · · · n 1 ( NFFT ) · · · · · · · · · n M ( 1 ) · · · n M ( NFFT ) M × NFFT Be current estimating noise spectrum, NFFT is that the frequency sampling of short time discrete Fourier transform is counted, and F (ω) is the signal frame behind the short time discrete Fourier transform, and M is the quantity of microphone.
Described step a2 specifically comprises:
According to K predetermined frequency band speech frame is divided into K subband signal, and with K preset frequency ω i, i=1 ... K is defined as the centre frequency of each subband;
Determine the component of signal S (ω of i subband i) be: S ( ω i ) = S 1 ( i ) · · · S M ( i ) ;
Wherein: M is the number of microphone of microphone array, i=1 ... K, K are number of sub-bands.
Described step b specifically comprises:
Determine the frequency domain correlation matrix R (i) of each subband signal:
R (i)=E{S (ω i) S Hi); Wherein: HExpression is carried out transposition and conjugate operation to matrix or vector, S (ω i) be the component of signal of i subband, and S ( ω i ) = S 1 ( i ) · · · S M ( i ) .
The three dimensions used for vector transfer of signal source can obtain as follows among the described step c:
C1, obtain source location (r 0, θ 0, φ 0) coordinate vector S=r 0* [sin θ 0Cos φ 0Sin θ 0Sin φ 0Cos θ 0];
C2, obtain the coordinate vector P of each microphone i=r i* [sin θ iCos φ iSin θ iSin φ iCos θ i];
C3, determine source location (r 0, θ 0, φ 0) to the relative amplitude decay factor of i microphone
Figure C200510090740D0012144306QIETU
For: ∂ i = | | S | | | | P i - S | | ; Wherein: ‖ * ‖ represents the norm of vector *;
C4, determine source location (r 0, θ 0, φ 0) to the relative time delay factor τ of i microphone iFor:
τ i = | | S - P i | | - | | S | | c ;
Wherein: c is the aerial velocity of propagation of sound, and ‖ * ‖ represents the norm of vector *;
C5, determine source location (r 0, θ 0, φ 0) three dimensions used for vector transfer a (r 0, θ 0, φ 0) be:
a ( r 0 , θ 0 , φ 0 ) = ∂ 1 e - j ω i τ 1 · · · ∂ m e - j ω i τ m · · · ∂ M e - j ω i τ M ;
Wherein: ω iBe the centre frequency of each subband, r 0Be the distance of signal source to true origin, θ 0Be the angle of the Z axle of signal source and three-dimensional coordinate, φ 0Be the projection of signal source on XOY plane and the angle of X-axis.
Described step c specifically comprises: the optimal weight vector of determining i subband
Figure C200510090740D0012144328QIETU
For:
W opt i = R ( i ) - 1 a a H R ( i ) - 1 a ;
Wherein: R (i) is the frequency domain correlation matrix of i subband signal, and a is described source location (r 0, θ 0, φ 0) the three dimensions used for vector transfer.
Described steps d comprises: the optimal weight vector according to each subband signal carries out the subband spatial filtering to each subband signal, obtains the frequency domain output signal y (ω of i subband i):
y ( ω i ) = ( W opt i ) H * S ( ω i ) ; Wherein: HExpression is carried out transposition and conjugate operation to vector or matrix,
Figure C200510090740D00132
Be the optimal weight vector of i subband, S (ω i) be the component of signal of i subband;
The frequency domain output signal of each subband is combined as Y (ω): Y (ω)=[y (ω 1) y (ω 2) ... y (ω K)] T
Frequency domain output signal Y (ω) after the combination is carried out contrary fast fourier transform obtain output signal Y (n);
Convert described Y (n) to simulating signal y (t), and y (t) is carried out the voice signal that signal after the low-pass filtering needing to be defined as output.
The present invention also provides a kind of broadband beams to form device, comprising:
Divide the subband signal module: determine each corresponding respectively subband signal of signal of each microphone of input, and each subband signal is transferred to frequency domain correlation matrix module;
Frequency domain correlation matrix module: determine the frequency domain correlation matrix of described each subband signal, and transmit it to the weight vector module;
Weight vector module: determine the weight vector of each subband signal, and transmit it to output module according to the three dimensions used for vector transfer of signal source, described each frequency domain correlation matrix;
Output module: according to the weight vector of described each subband signal, the signal that each subband signal is determined output.
Described division subband signal module comprises:
Sampling submodule: according to predetermined sampling frequency the signal of importing each microphone is carried out the AD conversion, and from the signal after the described AD conversion, choose signal frame and carry out short time discrete Fourier transform;
Speech detection submodule: the signal frame behind the Fourier transform is carried out speech detection, when definite signal frame is not speech frame, this signal frame is stored as current estimating noise spectrum, when definite signal frame is speech frame, this speech frame is transferred to spectrum offset the denoising submodule;
Spectrum is offset the denoising submodule: according to current estimating noise spectrum the speech frame of its reception is composed the counteracting denoising, and transfer to division subband signal submodule;
Divide the subband signal submodule: according to predetermined frequency band the speech frame of its reception is divided into a plurality of subband signals, and each subband signal is transferred to frequency domain correlation matrix module.
Description by technique scheme as can be known, the present invention is by adopting three dimensions propagation vector a (r to the source location vector 0, θ 0, φ 0), solved the three dimensions filtering problem, suppressed spatial interference signal and noise, improved the signal to noise ratio (S/N ratio) of output signal; By voice signal being divided into a plurality of subbands, each subband is carried out three dimensions filtering respectively, the present invention can be combined frequency domain and spatial domain voice signal is handled, make the present invention can be good at being applicable to wideband signal source, near-field signals source; By adopting speech detection technology such as zero-crossing rate, short-time energy combine to determine speech frame, to have avoided when not having the voice signal input, the phenomenon of consume system resources has improved accuracy and the stability of exporting voice signal; By adopting the spectrum cancellation technology to remove system noise, avoided of the influence of non-white Gauss noise to system, effectively improved the filtering performance of voice signal; At definite source location vector a (r 0, θ 0, φ 0) in the process, by adopting relative amplitude decay factor and the relative time delay factor of source location to each microphone, and the amplitude fading factor adopts ratio, the time delay factor of signal source to the distance of each microphone and signal source to the distance of reference microphone to adopt the poor of the time delay of signal source to the time delay of each microphone and signal source to reference microphone, make the present invention consistent with the model hypothesis of subspace theory, reduce model error, improved the three dimensions filtering performance; By characteristics according to narrow band signal assumed condition and voice signal, the voice signal of input microphone is divided into several subbands, determines the frequency domain correlation matrix of each subband to have significantly reduced operand according to the component of signal of each subband, improve the real-time of system, saved hardware cost; Thereby realized that by technical scheme provided by the invention the raising voice strengthen the property, improved the purpose of voice system practicality.
Description of drawings
Fig. 1 is even straight line microphone array synoptic diagram;
Fig. 2 is that broadband beams of the present invention forms method flow diagram;
Fig. 3 is a near-field signals model synoptic diagram.
Embodiment
The core of method and apparatus of the present invention is: each subband signal of determining the signal correspondence of each microphone of input, determine the frequency domain correlation matrix of each subband signal, according to the three dimensions used for vector transfer of signal source, the weight vector that each frequency domain correlation matrix is determined each subband signal, according to the weight vector of each subband signal, the signal that each subband signal is determined output.
Based on core concept of the present invention technical scheme provided by the invention is further described below.
Microphone type among the present invention is an omnidirectional microphone, the pickup distance of microphone can be determined according to concrete applied environment, be long 5 meters, wide 10 meters, high 4 meters as room-size, if require all sound in this room are handled, then the pickup of microphone distance is at least 10 meters.The present invention does not limit the timbering material that constitutes microphone, and still, the physical dimension of timbering material is the smaller the better, to reduce the reflection of support to sound, reduces multipath effect.
The topological structure of the microphone array among the present invention can be arbitrary form, as ULA (uniform straight line array row), UCA (evenly circle ring array) etc.
The present invention is based on microphone array wideband signal source wave beam formation method process flow diagram as shown in Figure 2.
In Fig. 2, method of the present invention mainly comprises three parts, i.e. signals collecting preprocessing part, The Wideband Signal Processing part and output signal processing section.
Detailed implementation procedure below in conjunction with 2 pairs in the accompanying drawing broadband beams formation method based on microphone array of the present invention is described.
The signals collecting preprocessing part mainly comprises following 5 steps:
Step 1, set M common omnidirectional microphone and form microphone array according to certain topological structure, the voice signal that send in microphone array pickoff signals source, and other be in all voice signals in the microphone range of receiving.
The signal that microphone array picks up can be expressed as with mathematical formulae:
F(t)=[f 1(t)…f i(t)…f M(t)] T (7)
In the formula (7): f i(t) i voice signal that microphone receives of expression, i=1 ..., M, M are the quantity of microphone in the microphone array, [] TThe transposition computing of representing matrix.
The coordinate vector of setting i microphone is:
In the formula (8): r iBe the distance of i microphone apart from the microphone array center, the microphone array center is a true origin, and true origin can be the center of microphone array, also can be the position of any one microphone in the microphone array, or other positions; θ iBe the coordinate vector of i microphone and the angle of Z axle positive dirction,
Figure C200510090740D00162
Be that i microphone coordinate vector is at the projection of XOY plane and the angle of X-axis positive dirction.
The coordinates matrix that the coordinate vector of M microphone is formed whole microphone array is:
Step 2, the signal that each microphone is received carry out the AD conversion.
When carrying out the AD conversion, can be according to sample frequency and sampling precision be chosen in the requirement of sound quality, can be 16KHz, 22KHz or 44Khz etc. as sample frequency, sampling precision can be 8bit, 16bit or 32bit etc.The present invention does not limit the technology and the chip of sampling.
The sampling back forms the multi-path digital voice signal, that is:
F(n)=[f 1(n)…f i(n)…f M(n)] T (10)
In the formula (10): i=1 ..., M, M are the quantity of microphone in the microphone array.
Step 3, choose the signal frame that frame length is 32ms from every road sampled signal of formula (10), carry out short time discrete Fourier transform, short time discrete Fourier transform can be selected Hamming window or other window function for use.
Can realize short time discrete Fourier transform with 512 the FFT (fast fourier transform) of NFFT generally speaking.That is:
F ( ω ) = Σ m = 1 N F ( n ) w ( n - m ) exp ( - jωm ) = Σ m = 1 N f 1 ( n ) w ( n - m ) exp ( - jωm ) · · · Σ m = 1 N f M ( n ) w ( n - m ) exp ( - jωm ) - - - ( 11 )
Step 4, to carrying out speech detection through the signal frame behind the short time discrete Fourier transform in the formula (11), the speech detection technology can combine etc. for zero-crossing rate, short-time energy, the present invention does not limit the speech detection technology that adopts.
According to the speech detection technology, when definite signal frame is non-speech frame, non-speech frame is stored as current estimating noise spectrum, the initial value of current estimating noise spectrum can be set to 0 matrix; When definite signal frame is speech frame, this speech frame is carried out the processing of following step 5.
Step 5, adopt the spectrum counteracting method promptly to subtract spectrometry to speech frame to compose the counteracting denoising.
Setting current estimating noise spectrum is: N ( ω ) = n 1 ( 1 ) · · · n 1 ( NFFT ) · · · · · · · · · n M ( 1 ) · · · n M ( NFFT ) M × NFFT - - - ( 12 )
Speech frame is composed the speech frame of offsetting after the denoising is:
S ( ω ) = F ( ω ) - N ( ω ) = s 1 ( 1 ) · · · s 1 ( NFFT ) · · · · · · · · · s M ( 1 ) · · · s M ( NFFT ) M × NFFT - - - ( 13 )
To step 5, finished the collection preprocessing process of the present invention through above-mentioned steps 1 to signal.Following step 6 is for having realized The Wideband Signal Processing process of the present invention.
Step 6, according to the frequency characteristics of signal the S (ω) in the formula (13) is divided into each and does subband, then, from each subband, choose an interested frequencies omega respectively i, wherein: i=1 ... K, K are the quantity of subband.With frequencies omega iCentre frequency as its corresponding subband.
If the component of signal S (ω of i subband signal i) be:
S ( ω i ) = S 1 ( i ) · · · S M ( i ) - - - ( 14 )
In the formula (14): M is the quantity of microphone in the microphone array.
Each subband of formula (14) is done the processing of following four aspects:
1, obtains the frequency domain correlation matrix R (i) of speech frame: R (i)=E{S (ω i) S Hi) (15)
2, picked up signal source location vectors a (r 0, θ 0, φ 0):
The coordinate vector of setting i microphone is P i, source location (r 0, θ 0, φ 0) coordinate vector be S, as shown in Figure 3, r 0Be the distance of signal source to true origin, θ 0Be the angle of the Z axle of signal source and three-dimensional coordinate, φ 0Be the projection of signal source on XOY plane and the angle of X-axis.That is:
S=r 0*[sinθ 0cosφ 0 sinθ 0sinφ 0 cosθ 0] (16)
P i=r i*[sinθ icosφ i?sinθ isinφ i?cosθ i] (17)
From source location (r 0, θ 0, φ 0) to the relative amplitude decay factor of i microphone
Figure C200510090740D00181
For:
∂ i = | | S | | | | P i - S | | - - - ( 18 )
From source location (r 0, θ 0, φ 0) to the relative time delay factor τ of i microphone iFor:
τ i = | | S - P i | | - | | S | | c - - - ( 19 )
C is the aerial velocity of propagation of sound in the formula (19), can get 340 meter per seconds when room temperature, and ‖ ‖ represents to ask the norm of vector, as vector a=[x y z], then | | a | | = x 2 + y 2 + z 2 .
Source location (r 0, θ 0, φ 0) position vector a (r 0, θ 0, φ 0) be:
a ( r 0 , θ 0 , φ 0 ) = ∂ 1 e - j ω i τ 1 · · · ∂ m e - j ω i τ m · · · ∂ M e - j ω i τ M - - - ( 20 )
3, obtain the optimal weight vector of i subband
Figure C200510090740D00186
Position vector a (the r of the signal source that obtains according to formula (20) 0, θ 0, φ 0) and the frequency domain frequency domain correlation matrix R (i) that obtains of formula (15), obtain the optimal weight vector of i subband
The optimal weight vector of i subband
Figure C200510090740D00188
For:
W opt i = R ( i ) - 1 a a H R ( i ) - 1 a - - - ( 21 )
4, utilize optimal weight vector and subband signal to carry out the subband spatial filtering, obtain the frequency domain output signal of i subband:
y ( ω i ) = ( W opt i ) H * S ( ω i ) - - - ( 22 )
In above-mentioned each formula, represent the position and the source location of microphone by adopting three dimensional space coordinate, make method of the present invention can be used for randomly topologically structured microphone array, microphone array of the present invention is not limited to circle battle array, linear array etc.Because position information such as the microphone among the present invention, signal source all are three-dimensional, so filtering technique of the present invention belongs to the three dimensions filtering technique, still, when microphone uses one-dimensional array such as ULA, uniform straight line array row etc., the three-dimensional filtering characteristic disappears.
The direction vector method for solving that uses among the present invention is applicable to randomly topologically structured microphone array.
Following step 7, step 8 are the output signal processing section.
Step 7, the frequency domain output signal of K subband is combined into frequency domain output signal Y (ω):
Y(ω)=[y(ω 1)y(ω 2)…y(ω K)] T (23)
Step 8, Y (ω) is carried out contrary FFT, obtain output signal Y (n), then Y (n) is converted to simulating signal y (t), y (t) is carried out low-pass filtering, obtain speech output signal.
Broadband beams based on microphone array provided by the invention forms device and mainly comprises: divide subband signal module, frequency domain correlation matrix module, weight vector module and output module.The function of dividing the subband signal module is realized by sampling submodule, speech detection submodule, spectrum counteracting denoising submodule and division subband signal submodule.
The sampling submodule is mainly used in according to predetermined sampling frequency the signal of importing each microphone is carried out the AD conversion, then, chooses signal frame and carry out short time discrete Fourier transform from the signal after the AD conversion.Above-mentioned sample frequency can be 16KHz, 22KHz or 44Khz etc., sampling precision can for: 8bit, 16bit or 32bit etc., short time discrete Fourier transform can realize by 512 the FFT of NFFT and short time discrete Fourier transform can be selected Hamming window or other window function etc. for use.The description of F in specific implementation process such as the method (n) and F (ω).
The speech detection submodule is mainly used in signal frame that the sampling submodule is handled, behind the Fourier transform and carries out speech detection, when definite signal frame is not speech frame, this signal frame is stored as current estimating noise spectrum, when definite signal frame is speech frame, this speech frame is transferred to spectrum offset the denoising submodule.The speech detection technology that the speech detection submodule adopts can combine etc. for zero-crossing rate, short-time energy.
Spectrum counteracting denoising submodule is mainly used in the current estimating noise spectrum of storing according to the speech detection submodule, and the speech frame that the transmission of speech detection submodule comes is composed the counteracting denoising, and the speech frame that will compose after the counteracting denoising transfers to division subband signal submodule.The description of S (ω) in signal after the spectrum counteracting denoising after process and the spectrum counteracting denoising such as the above-mentioned method.
Dividing the subband signal submodule is mainly used in and will compose the speech frame of offsetting the transmission of denoising submodule according to predetermined frequency band and be divided into a plurality of subband signals, and each subband signal transferred to frequency domain correlation matrix module and output module, S (ω in each subband signal of speech frame such as the above-mentioned method i) description.
Frequency domain correlation matrix module is mainly used in the frequency domain correlation matrix of each subband signal of determining its reception, and transmits it to the weight vector module.Obtain the method for frequency domain correlation matrix R (i) such as the description in the above-mentioned method.
The weight vector module is mainly used in the three dimensions used for vector transfer a (r according to signal source 0, θ 0, φ 0), each frequency domain correlation matrix R (i) of its reception determines the optimal weight vector of each subband signal
Figure C200510090740D00201
And
Figure C200510090740D0020084355QIETU
Transfer to output module.The three dimensions used for vector transfer a (r of signal source 0, θ 0, φ 0), optimal weight vector
Figure C200510090740D00203
Acquisition methods such as the description in the above-mentioned method.
Output module is mainly used in the weight vector of each subband signal that transmission comes according to the weight vector module and carries out the subband spatial filtering to dividing each next subband signal of subband signal submodule transmission, obtain the frequency domain output signal of each subband, the frequency domain output signal of K subband is combined into the frequency domain output signal, and the frequency domain output signal that is combined into carried out contrary FFT, then, convert simulating signal to, this simulating signal is carried out the voice signal that signal after the low-pass filtering needing to be defined as output.
Though described the present invention by embodiment, those of ordinary skills know, the present invention has many distortion and variation and do not break away from spirit of the present invention, and the claim of application documents of the present invention comprises these distortion and variation.

Claims (11)

1, a kind of broadband beams formation method is characterized in that, comprises step:
Each subband signal that the signal of a, definite each microphone of input is corresponding respectively;
B, determine the frequency domain correlation matrix of described each subband signal;
C, determine the weight vector of each subband signal according to the three dimensions used for vector transfer of signal source, described each frequency domain correlation matrix;
D, determine the signal of output according to the weight vector of described each subband signal and each subband signal.
2, a kind of broadband beams formation method as claimed in claim 1 is characterized in that described step a specifically comprises the steps:
A1, the signal of importing each microphone is carried out speech detection, and definite speech frame;
A2, determine each subband signal of described speech frame correspondence.
3, a kind of broadband beams formation method as claimed in claim 2 is characterized in that:
The signal of setting each microphone of input is: F (t)=[f 1(t) ... f i(t) ... f M(t)] T
Wherein: f i(t) i signal that microphone receives of expression, i=1 ..., M, M are the quantity of microphone, [] TExpression is done the transposition computing to matrix or vector; And described step a1 specifically comprises the steps:
A11, the signal of importing each microphone is carried out AD conversion according to predetermined sampling frequency:
F (n)=[f 1(n) ... f i(n) ... f M(n)] TWherein, n is a discrete time;
Choose signal frame a12, the signal after described AD changes and carry out short time discrete Fourier transform:
F ( ω ) = Σ m = 1 N F ( n ) w ( n - m ) exp ( - jωm ) = Σ m = 1 N f 1 ( n ) w ( n - m ) exp ( - jωm ) · · · Σ m = 1 N f M ( n ) w ( n - m ) exp ( - jωm ) ; Wherein, w (n) is a window function, and n, m are discrete time;
A13, the signal frame behind the Fourier transform is carried out speech detection, determine speech frame.
4, a kind of broadband beams formation method as claimed in claim 3 is characterized in that described step a13 specifically comprises the steps:
Signal frame behind the Fourier transform is carried out speech detection;
When definite signal frame is not speech frame, this signal frame is stored as current estimating noise spectrum;
When definite signal frame is speech frame, according to current estimating noise spectrum described speech frame is composed the counteracting denoising, determine that the speech frame S (ω) after the spectrum counteracting denoising is:
S ( ω ) = F ( ω ) - N ( ω ) = s 1 ( 1 ) · · · s 1 ( NFFT ) · · · · · · · · · s M ( 1 ) · · · s M ( NFFT ) M × NFFT ;
Wherein: N ( ω ) = n 1 ( 1 ) · · · n 1 ( NFFT ) · · · · · · · · · n M ( 1 ) · · · n M ( NFFT ) M × NFFT Be current estimating noise spectrum, NFFT is that the frequency sampling of short time discrete Fourier transform is counted, and F (ω) is the signal frame behind the short time discrete Fourier transform, and M is the quantity of microphone.
5, as claim 2,3 or 4 described a kind of broadband beams formation methods, it is characterized in that described step a2 specifically comprises the steps:
According to K predetermined frequency band speech frame is divided into K subband signal, and with K preset frequency ω i, i=1 ... K is defined as the centre frequency of each subband;
Determine the component of signal S (ω of i subband i) be: S ( ω i ) = S 1 ( i ) · · · S M ( i ) ;
Wherein: M is the number of microphone of microphone array, i=1 ... K, K are number of sub-bands.
6, a kind of broadband beams formation method as claimed in claim 5 is characterized in that described step b specifically comprises:
Determine the frequency domain correlation matrix R (i) of each subband signal:
R (i)=E{S (ω i) S Hi); Wherein: HExpression is carried out transposition and conjugate operation to matrix or vector, S (ω i) be the component of signal of i subband, and S ( ω i ) = S 1 ( i ) · · · S M ( i ) .
7, a kind of broadband beams formation method as claimed in claim 5 is characterized in that, the three dimensions used for vector transfer of signal source can obtain as follows among the described step c:
C1, obtain source location (r 0, θ 0, φ 0) coordinate vector S=r 0* [sin θ 0Cos φ 0Sin θ 0Sin φ 0Cos θ 0];
C2, obtain the coordinate vector P of each microphone i=r i* [sin θ iCos φ iSin θ iSin φ iCos θ i];
C3, determine source location (r 0, θ 0, φ 0) to the relative amplitude decay factor of i microphone
Figure C200510090740C00042
For:
∂ i = | | S | | | | P i - S | | ;
Wherein: ‖ * ‖ represents the norm of vector *;
C4, determine source location (r 0, θ 0, φ 0) to the relative time delay factor τ of i microphone iFor:
τ i = | | S - P i | | - | | S | | c ;
Wherein: c is the aerial velocity of propagation of sound, and ‖ * ‖ represents the norm of vector *;
C5, determine source location (r 0, θ 0, φ 0) three dimensions used for vector transfer a (r 0, θ 0, φ 0) be:
a ( r 0 , θ 0 , φ 0 ) = ∂ 1 e - j ω 1 τ 1 · · · ∂ m e - j ω i τ m · · · ∂ M e - j ω i τ M ;
Wherein: ω iBe the centre frequency of each subband, r 0Be the distance of signal source to true origin, θ 0Be the angle of the Z axle of signal source and three-dimensional coordinate, φ 0Be the projection of signal source on XOY plane and the angle of X-axis.
8, as claim 1,2,3 or 4 described a kind of broadband beams formation methods, it is characterized in that described step c specifically comprises:
Determine the optimal weight vector of i subband
Figure C200510090740C00046
For:
W opt i = R ( i ) - 1 a a H R ( i ) - 1 a ;
Wherein: R (i) is the frequency domain correlation matrix of i subband signal, and a is described source location (r 0, θ 0, φ 0) the three dimensions used for vector transfer.
9, as claim 1,2,3 or 4 described a kind of broadband beams formation methods, it is characterized in that described steps d specifically comprises the steps:
Optimal weight vector according to each subband signal carries out the subband spatial filtering to each subband signal, obtains the frequency domain output signal y (ω of i subband i):
y ( ω i ) = ( W opt i ) H * S ( ω i ) ;
Wherein: HExpression is carried out transposition and conjugate operation to vector or matrix, Be the optimal weight vector of i subband, S (ω i) be the component of signal of i subband;
The frequency domain output signal of each subband is combined as Y (ω): Y (ω)=[y (ω 1) y (ω 2) ... y (ω K)] T
Frequency domain output signal Y (ω) after the combination is carried out contrary fast fourier transform obtain output signal Y (n);
Convert described Y (n) to simulating signal y (t), and y (t) is carried out the voice signal that signal after the low-pass filtering needing to be defined as output.
10, a kind of broadband beams forms device, it is characterized in that, comprising:
Divide the subband signal module: determine each corresponding respectively subband signal of signal of each microphone of input, and each subband signal is transferred to frequency domain correlation matrix module;
Frequency domain correlation matrix module: determine the frequency domain correlation matrix of described each subband signal, and transmit it to the weight vector module;
Weight vector module: determine the weight vector of each subband signal, and transmit it to output module according to the three dimensions used for vector transfer of signal source, described each frequency domain correlation matrix;
Output module: according to the weight vector of described each subband signal, the signal that each subband signal is determined output.
11, a kind of broadband beams as claimed in claim 10 forms device, it is characterized in that described division subband signal module comprises:
Sampling submodule: according to predetermined sampling frequency the signal of importing each microphone is carried out the AD conversion, and from the signal after the described AD conversion, choose signal frame and carry out short time discrete Fourier transform;
Speech detection submodule: the signal frame behind the Fourier transform is carried out speech detection, when definite signal frame is not speech frame, this signal frame is stored as current estimating noise spectrum, when definite signal frame is speech frame, this speech frame is transferred to spectrum offset the denoising submodule;
Spectrum is offset the denoising submodule: according to current estimating noise spectrum the speech frame of its reception is composed the counteracting denoising, and transfer to division subband signal submodule;
Divide the subband signal submodule: according to predetermined frequency band the speech frame of its reception is divided into a plurality of subband signals, and each subband signal is transferred to frequency domain correlation matrix module.
CNB200510090740XA 2005-08-15 2005-08-15 Broadband wave beam forming method and apparatus Expired - Fee Related CN100466061C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200510090740XA CN100466061C (en) 2005-08-15 2005-08-15 Broadband wave beam forming method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200510090740XA CN100466061C (en) 2005-08-15 2005-08-15 Broadband wave beam forming method and apparatus

Publications (2)

Publication Number Publication Date
CN1866356A CN1866356A (en) 2006-11-22
CN100466061C true CN100466061C (en) 2009-03-04

Family

ID=37425362

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200510090740XA Expired - Fee Related CN100466061C (en) 2005-08-15 2005-08-15 Broadband wave beam forming method and apparatus

Country Status (1)

Country Link
CN (1) CN100466061C (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324237B (en) * 2011-05-30 2013-01-02 深圳市华新微声学技术有限公司 Microphone-array speech-beam forming method as well as speech-signal processing device and system
JP2015502524A (en) * 2011-11-04 2015-01-22 ブリュエル アンド ケアー サウンド アンド ヴァイブレーション メジャーメント エー/エス Computationally efficient broadband filter and sum array focusing
JP6162220B2 (en) * 2012-04-27 2017-07-12 ソニーモバイルコミュニケーションズ, エービー Noise suppression based on sound correlation in microphone arrays
CN104768099B (en) * 2014-01-02 2018-02-13 中国科学院声学研究所 Mode Beam-former and frequency domain bandwidth realization method for annular battle array
CN103873977B (en) * 2014-03-19 2018-12-07 惠州Tcl移动通信有限公司 Recording system and its implementation based on multi-microphone array beam forming
CN105590631B (en) * 2014-11-14 2020-04-07 中兴通讯股份有限公司 Signal processing method and device
CN105848062B (en) * 2015-01-12 2018-01-05 芋头科技(杭州)有限公司 The digital microphone of multichannel
CN108447499B (en) * 2018-04-18 2020-08-04 佛山市顺德区中山大学研究院 Double-layer circular-ring microphone array speech enhancement method
CN108717855B (en) * 2018-04-27 2020-07-28 深圳市沃特沃德股份有限公司 Noise processing method and device
CN109166590B (en) * 2018-08-21 2020-06-30 江西理工大学 Two-dimensional time-frequency mask estimation modeling method based on spatial correlation
CN110111807B (en) * 2019-04-27 2022-01-11 南京理工大学 Microphone array-based indoor sound source following and enhancing method
CN110333504B (en) * 2019-07-16 2022-11-18 哈尔滨工程大学 Space-time two-dimensional filtering fast broadband beam forming method
CN111413649B (en) * 2020-04-23 2021-07-06 中国科学技术大学 Large-scale reactor fault detection method and system based on near-field broadband beam forming
CN111650556B (en) * 2020-06-15 2023-09-01 中国人民解放军国防科技大学 Broadband radiation source parameter estimation method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154552A (en) * 1997-05-15 2000-11-28 Planning Systems Inc. Hybrid adaptive beamformer

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154552A (en) * 1997-05-15 2000-11-28 Planning Systems Inc. Hybrid adaptive beamformer

Also Published As

Publication number Publication date
CN1866356A (en) 2006-11-22

Similar Documents

Publication Publication Date Title
CN100466061C (en) Broadband wave beam forming method and apparatus
CN107221336B (en) Device and method for enhancing target voice
Vaseghi Advanced digital signal processing and noise reduction
CN106782590A (en) Based on microphone array Beamforming Method under reverberant ambiance
CN105869651B (en) Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
CN106710601A (en) Voice signal de-noising and pickup processing method and apparatus, and refrigerator
Ren et al. A novel multiple sparse source localization using triangular pyramid microphone array
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
Fingscheidt et al. Environment-optimized speech enhancement
CN107369460B (en) Voice enhancement device and method based on acoustic vector sensor space sharpening technology
Ito et al. Designing the Wiener post-filter for diffuse noise suppression using imaginary parts of inter-channel cross-spectra
CN110534126B (en) Sound source positioning and voice enhancement method and system based on fixed beam forming
CN108447499A (en) A kind of double-layer circular ring microphone array voice enhancement method
CN109637554A (en) MCLP speech dereverberation method based on CDR
CN110534127A (en) Applied to the microphone array voice enhancement method and device in indoor environment
CN107248413A (en) Hidden method for acoustic based on Difference Beam formation
Ahmad et al. Wideband DOA estimation based on incoherent signal subspace method
Xia et al. Noise reduction method for acoustic sensor arrays in underwater noise
CN110415720B (en) Quaternary differential microphone array super-directivity frequency-invariant beam forming method
CN112820312B (en) Voice separation method and device and electronic equipment
CN111341339A (en) Target voice enhancement method based on acoustic vector sensor adaptive beam forming and deep neural network technology
Cobos et al. Two-microphone separation of speech mixtures based on interclass variance maximization
Pertilä Acoustic source localization in a room environment and at moderate distances
Georgiou et al. Robust maximum likelihood source localization: The case for sub-Gaussian versus Gaussian
CN112420068B (en) Quick self-adaptive beam forming method based on Mel frequency scale frequency division

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090304

Termination date: 20170815

CF01 Termination of patent right due to non-payment of annual fee