CAPTURING SOUND FROM A TARGET REGION
FIELD OF THE INVENTION
The present invention relates to processing sound signals to capture and capturing sound originating from a specific target region. This can be achieved through the use of interference cancellation.
BACKGROUND
The efficient and effective acquisition of sounds while in very noisy environments has been a long sought after goal. To some extent, many of the problems associated with such acquisition can be overcome by the use of highly directional microphones and microphone arrays. However, while both approaches improve directionality in acquiring desired sounds, they are unable to take into account any distance information, and thus to concentrate on a specific local sound source. Close- talking microphones exist but there are obvious problems with them, not least in terms of the need for their physical proximity to the sound source. How to capture a local sound source while excluding other interference is still an open problem in both acoustic engineering and communications systems.
Interference cancellation has been studied for a while in the area of microphone arrays and similar issues have been developed for a much longer time in the antenna and sonar fields. Among various solutions put forward, adaptive beamforming techniques are regarded as an effective way of achieving much better interference cancellation than is achieved by conventional fixed beamforming (delay-and-sum). Generally speaking, the final aim of the adaptive beamforming is to obtain enhanced signal of interest and to seek an optimal spatial performance under some constraint.
In most existing proposals and systems, any desired sound sources are located in a certain direction or in a certain range of directions (a beam) with no distance issue concerns. Since the distance cannot be confined, interference within the same direction
or range of directions as the desired signal is not discriminated and thus cannot be cancelled.
Earlier methods sought to achieve the highest signal-to interference ratio in only one direction. Later, so-called robust beamformers sought to solve the target- signal cancellation problem encountered in the earlier methods. More recently, O. Hoshuyama, A. Sugiyama and A. Hirano proposed "A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters" [IEEE Transactions on Signal Processing, Vol. 47, no. 10, pp. 2677 - 2684, Oct. 1999]. The purpose of this is to target a desired sound source to within a certain direction range.
Arrays of microphones can be combined, for a unidirectional approach. For example a dual microphone scheme is described by Elko G. W. and Anh-Tho Nguyen Pong, in "A steerable and variable first-order differential microphone array", ICASSP- 97, pp. 223 -226 vol. 1, where each array has two elements only. However, this and most other research in this field focuses on improvements in direction selectivity only.
Electronic directional control of microphones to track sound is proposed by M.S. Branstein, J.E. Adcock and H.F. Silverman, in "A closed-form location estimation for use with room environment microphone arrays", IEEE Trans, on Signal Processing, vol. 5, no. 1, pp. 1583 -1590, Jan. 1997.
Interference cancellation is also used in active noise control, as proposed in US Patent Publication No. 5,699,437, issued on 16 December 1997 to Finn, entitled "Active noise control system using phased-array sensors". It proposes using two orthogonal microphone arrays to capture the sound in a target space, the sound from which is to be quieted. The signals from the microphones of the two arrays are input to a beamforming and beam steering logic, the output of which is input to an active noise control logic. The active noise control logic then controls acoustic speakers to generate anti-noise to cancel the noise in the target region.
Such a proposed system uses a simple delay-and-sum beamformer to enhance the desired signal, and appears to assume there is no interference in the environment. Further, the proposed system simply sums up all the output of multiple microphone arrays, without cancelling any interference which falls into the two main side lobes of each array. Thus all the sounds within the two main lobes would be included in the final output.
R. Renomeron, D. Rabinkin, J. French, and J. Flanagan, "Small-Scale Matched Filter Array Processing for Spatially Selective Sound Capture", J. Acous. Soc. Am., Vol. 102, No. 5 Pt. 2, p. 3208, Nov. 1997. (134th Meeting of the Acoustical Society of
America, Dec. 1997) proposed a so-called Matched Filter Array (MFA) which claimed that it possessed the ability of three-dimensional sound source selectivity. The MFA exploits a known geometric relationship between the sound source and array sensors and the associated set of acoustic transfer functions, in order to perform a time aligned addition of the source signal components of the captured signals as described in the delay-and-sum beamforming approach. Thus the method is supposed to be used in single sound source scenario. The exact location of the sound source and the microphone array sensors should be known in advance. Therefore there is actually no interference cancellation function. The method is claimed to be "especially effective in highly reverberant environments where the primary source of interference is the reflected energy of the source signal".
In practice it is difficult to determine source localisation as accurately as required. Additionally, it is difficult to estimate the acoustic transfer function and its inverse transfer function precisely, because both the source location and the acoustic transfer function vary all the time in most real applications. Even a small deviation from the real transfer function would degrade the sound quality dramatically rather than improve it, due to the system's sensitivity to the transfer function.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention, there is provided apparatus for processing sound signals to capture sound originating from a target region of specific limited length and width, within a space comprising a plurality of regions. The apparatus comprises first outputting means, first secondary output means and second secondary output means. The first outputting means outputs one or more first signals representing sound originating from the plurality of regions including the target region. The first secondary output means outputs one or more first secondary signals representing sound originating from the plurality of regions, the one or more first secondary signals not representing or having reduced representation of sound originating from a first area extending in a first direction within the plurality of regions and including the target region. The second secondary output means outputs one or more second secondary signals representing sound originating from the plurality of regions, the one or more second secondary signals not representing or having reduced representation of sound originating from a second area extending in a second direction within the plurality of regions and including the target region. The first and second areas overlap and define the target region within the area of overlap.
According to a second aspect of the present invention, there is provided a method of processing sound signals to capture sound originating from a target region of specific limited length and width, within a space comprising a plurality of regions. The method comprises providing one or more first signals, providing one or more first secondary signals and providing one or more second secondary signals. The one or more first signals represent sound originating from the plurality of regions including the target region. The one or more first secondary signals represent sound originating from the plurality of regions, the one or more first secondary signals not representing or having reduced representation of sound originating from a first area extending in a first direction within the plurality of regions and including the target region. The one or more second secondary signals represent sound originating from the plurality of regions, the one or more second secondary signals not representing or having reduced representation of sound originating from a second area extending in a second direction within the plurality of regions and including the target region. The first and second areas overlap and define the target region within the area of overlap.
According to further aspects of the present invention, there are provided an apparatus and a method for capturing sound using the above apparatus and method, respectively, together with microphones or microphone arrays or receiving sounds, as appropriate.
According to another aspect of the present invention, there is provided apparatus for capturing sound originating from a target region of specific limited length and width, within a space comprising a plurality of regions. The apparatus comprises means for obtaining a first signal, means for obtaining one or more secondary signals and means for subtracting. The first signal is representative of sound originating from the plurality of regions including the target region. The one secondary signal is or the more than one secondary signals together are representative of the sound originating from the plurality of regions other than the target region. The means for subtracting subtracts the one or more first secondary signals from the first signal to leave a sound signal at least predominantly representative of the sounds originating from the target region.
According to again another aspect of the present invention, there is provided a method of capturing sound originating from a target region of specific limited length and width, within a space comprising a plurality of regions. The method comprises obtaining a first signal, obtaining one or more first secondary signals and subtracting the one or more first secondary signals from the first signal. The first signal is representative of sound originating from the plurality of regions including the target region. The one second signal is or the more than one second signal together are representative of the sound originating from the plurality of regions other than the target region. Subtracting the one or more first secondary signals from the first signal leaves a third signal representative of the sounds originating from the target region.
According to yet another aspect of the present invention, there is provided a computer program product having a computer usable medium having a computer readable program code means embodied therein for processing sound signals to capture sound originating from a target region. The computer program product comprises
computer readable program code means which, when downloaded onto a computer renders the computer into apparatus according to the first aspect above.
According to yet again another aspect of the present invention, there is provided a computer program product having a computer usable medium having a computer readable program code means embodied therein for processing sound signals to capture sound originating from a target region. The computer program product comprises computer readable program code means for operating according to the method of the second aspect above.
INTRODUCTION TO THE DRAWINGS
The invention is now described by way of non-limitative example, with reference to the accompanying drawings, in which:-
Figure 1 is a schematic illustration of a microphone array system according to a first embodiment of the invention;
Figure 2 is a schematic illustration of a microphone array system according to a second embodiment of the invention; Figure 3 is a block diagram of a GSC adaptive beamforming structure for use with the microphone array system of Figure 2;
Figure 4 is a schematic illustration of a first variation of the microphone array system of Figure 2;
Figure 5 is a schematic illustration of a second variation of the microphone array system of Figure 2;
Figure 6 is a schematic illustration of a third variation of the microphone array system of Figure 2; and
Figure 7 is a schematic illustration of a microphone array system according to a third embodiment of the invention.
DESCRIPTION
Where the same reference numeral or letter appears in more than one of the Figures, it is used to refer to the same or a similar component.
By way of non-limiting summary, in one embodiment of the invention, the outputs of two or three microphones exclude sounds picked up from within certain areas extending in certain directions from the microphones. These outputs are processed in an adaptive filter. In another embodiment of the invention, the outputs of two or three arrays of microphones are processed to exclude sounds picked up from within certain areas extending in certain directions from the arrays. For either of these two embodiments, a combination of the processed outputs provides a signal representing sounds from all regions in a space, except where the two certain areas overlap, which is a target region. Subtracting the combination of the processed outputs from a signal representing sounds from all regions in the space, including the target region, leaves a signal representing sounds from only the target region.
A First Embodiment
Figure 1 schematically illustrates a microphone array system 10, according to a first embodiment of the invention. The microphone array system 10 is a cross notch microphone array system, achieved using two cardioid microphones 12, 14 and one omni-directional microphone 16, as well as a multiple input adaptive filter 18.
The three microphones 12, 14, 16 are arranged side by side, in this embodiment in a straight line, with the omni-directional microphone 16 in the middle. The polar patterns for the three microphones are shown. The first cardioid microphone 12 has a generally cardioid polar pattern 20, with a null in a specific direction, the null direction. The second cardioid microphone 14 similarly has a generally cardioid polar pattern 22 with a null direction. The omni-directional microphone 16 has a generally spherical polar pattern 24.
Null sectors 26, 28 of minimum pick-up for the cardioid microphones 12, 14 extend in cones out from the microphones 12, 14, with sectors of open angle O£ and Ctø,
respectively around the null direction. The null sectors 26, 28 for the two cardioid microphones 12, 14 in this embodiment have the same angle, although they can be different.
The two cardioid microphones 12, 14 are directed away from each other, and angled such that the null sectors in their polar patterns 20, 22, are directed across and in front of the omni-directional microphone 16. The null directions of the two cardioid microphones are steered at βι and -β^ respectively relative to a horizontal axis. The null directions of the two cardioid microphones 12, 14 are steered in such a way that the two null sectors intersect at a common volume to form a closed focal zone 30, which is the target region. Although the zone 30 in Figure 1 appears as a 2-D zone, it represents a closed 3-D volume.
According to the different system requirements, the three microphones can be arranged either in a line or in a triangular shape, either symmetrically or asymmetrically. These changes affect the location and the dimension of the focal zone 30. In this embodiment the arrangement is symmetric.
The intention when using this embodiment is to arrange the microphone array system 10 such that a sound source of interest, producing a target signal, is within the focal zone 30. When the target signal is present in the focal zone, both cardioid microphones 12, 14 have a much lower response compared to that of the omni¬ directional microphone 16. The cardioid microphones 12, 14 can therefore be viewed as secondary output means, outputting secondary signals not representing or having reduced representation of sound originating in the null sectors and including the target region.
hi Figure 1, reference "S" represents the sound signal in the focal zone 30. Reference "S1" represents the sound signal in a volume within the null sector 28 for the second cardioid microphone 14 but to the right side of and outside the null sector 26 for the first cardioid microphone 12 (between the null sector 26 for the first cardioid microphone 12 and the second cardioid microphone 14). Reference "i?2" represents the sound signal in a volume within the null sector 28 for the second cardioid microphone 14
but to the left side of and outside the null sector 26 for the first cardioid microphone 12 (on the other side of the null sector 26 for the first cardioid microphone 12 from the second cardioid microphone 14). Reference "C1" represents the sound signal in a volume within the null sector 26 for the first cardioid microphone 12 but to the left side of and outside the null sector 28 for the second cardioid microphone 14 (between the null sector 28 for the right hand cardioid microphone 14 and the first cardioid microphone 12). Reference "C2" represents the sound signal in a volume within the null sector 26 for the first cardioid microphone 12 but to the right side of and outside the null sector 28 for the second cardioid microphone 14 (on the other side of the null sector 28 for the second cardioid microphone 14 from the first cardioid microphone 12). Reference "D" represents the sound signals from everywhere else.
Basic Principle
The letters O, L and R are used below as a simple notation for distinguishing between the microphones, their outputs and the processed outputs without importing any suggestion or requirement that the arrangement should have a left-hand and/or a right- hand microphone.
The main components of an output signal XL(K) for the first cardioid microphone
12 are a\B\ + a^Bi- + a^D (where a\, α2, «3 are the response coefficients of the first cardioid microphone 12 relating to the different directions out from the microphone corresponding to the sound signals indicated). As the sound signals S, C1 and C2 fall within the null sector 26 for the first cardioid microphone 12, they do not contribute to these main components.
The main components of an output signal XR(IC) for the first cardioid microphone 12 are O1C1 + O2C2 + b^D (where b\, ό2, 63 are the response coefficients of the right-hand cardioid microphone 14 relating to the different directions out from the microphone corresponding to the sound signals indicated). As the sound signals S, B\ and i?2 fall within the null sector 28 for the second cardioid microphone 12, they do not contribute to these main components.
The main components of an output signal xo(k) for the omni-directional microphone 16 are c\S + C2B \ + c352 + C4Ci + CsC2 + C(JD (where c\, c2, c3; c4, c5 are the response coefficients of the omni-directional microphone 16 relating to the different directions out from the microphone corresponding to the sound signals indicated). As this is an omni-directional microphone, all the sound signals contribute to these main components.
In that the outputs from the two cardioid microphones do not include or only have a reduced representation of sounds from their null sectors 26, 28, relative to the signal output from the omni-directional microphone, these may be termed secondary signals.
The output signals xχ(£), XR{K) and xo(k) from the three microphones 12, 14, 16 are fed into the multiple input adaptive filter 18. The filter 18 has a first adaptive filter 32 to which is input the output signal XL(K) from the first cardioid microphone 12. The response from the first adaptive filter 32 is modified according to an adaptive algorithm. The output from the first adaptive filter 32 is a first adapted signal yi(k), where y∑(k) = WL (aλBλ + a2B2 + Ci2D) (1) (WL being the filter response for the first adaptive filter 32)
The filter 18 has a second adaptive filter 34 to which is input the output signal XR(K) from the second cardioid microphone 14. The response from the second adaptive filter 34 is also modified according to the adaptive algorithm. The output from the second adaptive filter 34 is a first adapted signal yR(k), where yκ(k) = IWiCi + O2C2 + b3D) (2)
(WR being the filter response for the second adaptive filter 34)
The inverses of the two adapted signals Ji(Ar), yR(k) are input into an adder 36, together with the output signal xo(k) from the omni-directional microphone 16, to give rise to a filter output signal z(k). Thus z(k) = xo(k) - \yL(k) +yR(k)}
= CiS + C2B1 + C3B2 + C4Q + C5C2 + C6D
- [ WL{axBχ + a2B2 + a3D) + WR (bx C1 + b2C2 + b3D)] (3)
Within the multiple input adaptive filter 18, the highly correlated components between xo(k) and xι(k)+XR(k) are diminished and finally there is only S left, which is just the desired one, i.e. ideally zQc) = cxS (4)
In practice a portion of the target signal S is picked up by the two cardioid microphones 12, 14 and thus does contribute something to both their output signals XL(K), XR(JC). However, it is at a much lower level than the other components of those output signals and a much lower level than the contribution target sound S makes to the output xo(k) from the omni-directional microphone 16. Even so, the adaptive algorithm should be controlled delicately to avoid target-signal cancellation.
Adaptive Algorithm for the First Embodiment
When there is interference present, the adaptive filter algorithm is used in real¬ time to perform the cancellation function. The output of the central microphone 16 acts as a primary signal and the outputs from the other two microphones 12, 14 act as reference signals.
The output signals XL(K), XR(K) and xo(k) from the three microphones 12, 14, 16 are fed into the multiple input adaptive filter 18. The filter 18 has a first adaptive filter 32 of response WL to which is input the output signal XL(K) from the first cardioid microphone 12 and a second adaptive filter 32 of response WR to which is input the output signal XR{K) from the second cardioid microphone 14. The inverses of the two adapted signals ^(A;), yR.(k) are input into an adder 36, together with the output signal xo(k) from the omni-directional microphone 16, to give rise to a filter output signal z(k).
An N-tap adaptive filter with Norm-constrained LMS structure can be used to cancel the interferences outside the cross areas.
z(k) = xo(k)-WL(k)TXL(k) -WR(k)TXR(k) (5) where
XL(k) = [xL(k) xL(k-ϊ) - xL(k-N+l)f (8)
XR(k) = [xR(k) xR(k-\) ... χR(k-N+ϊ)f
For instance, WL(K) can obtained by using the following steps.
Ω = wL (11)
where μ denotes the step size (0<μ<2),
N denotes the tap length of the two filters 32, 34, L denotes the temporal vector,
Ω and K denote the total squared-norm of WJk+ 1) and the norm constraint threshold (if Ω exceeds K, Wd(Jc+!) is restrained by scaling), and z(k) denotes the output signal vector.
Similarly, W^(k) can be updated using the same steps as in Equations (10)-(12).
Due to the directivity of two cardioid microphones 12, 14, the main components ofjci(t) andxij(t) are the interference, whilst xo(t) is a mixture of all the sound sources. The interferences have strong responses in the omni directional microphone 16 and at least one of the cardioid microphones 12, 14. Therefore the interference components
between the primary signal and the reference signals are highly correlated. Through the adaptive filter in Equation (5) it is expected that such components be cancelled out. On the other hand, for the desired signal S which is located at the target zone 30, the responses of the two cardioid microphones 12, 14 are far less than that of the omni- directional microphone 16. The responses of the two cardioid microphones 12, 14 have a much weaker correlation and remain in the final output. The adaptive processing algorithm, as given above, adjusts the filter responses WL and WR continuously in real time to make sure that the output z[k] is minimised, to consist mainly of just the desired target signal S.
With the two outer microphones 12, 14, it is the null sector, that is the limited zone where there is little pick-up that is useful, rather than a limited zone of best pick-up. Thus, other unidirectional microphones can be used, although cardioid or bidirectional microphones are preferred. What is useful is at least one relatively narrow null sector, without another null sector too close. It is also possible to use different types of microphones in the same apparatus, for instance one cardioid and one bidirectional microphone.
Bidirectional microphones produce annular null sectors, rather than conical ones. Where two such rings cross also generates a closed three-dimensional space. Further, the null sectors in such rings tend to be narrower than for cardioid microphones (at least at useful distances from the microphones). As such, the focal zone generated using bidirectional microphones can be smaller than that generated using cardioid microphones.
A Second Embodiment
Whilst the above first embodiment is simple, the location and the size of the focal zone 30 cannot be controlled easily (without physically moving the microphones). Once the device is built and all the microphones are physically soldered on a PCB board, the focal zone is fixed and cannot be moved 'electronically' by changing parameters in the software. Thus the null direction and the widths of the sensors are physically fixed.
An alternative embodiment which overcomes this drawback, in effect, exchanges each cardioid microphone with two small microphone arrays which, when combined, provide a beam pattern which is similar to that of a single bidirectional microphone. Other approaches which provide, in effect, a response pattern with a relatively narrow null sector, without another null sector too close, may be applicable. By using microphone arrays, the angle O£ and C& of the relevant null sectors and the angle βι and jSfl directing the null directions can be modified digitally.
Figure 2 schematically illustrates a microphone array system 40, according to a second embodiment of the invention. The microphone array system 40 is a cross-beam scheme for local sound capture and interference cancellation microphone array system, achieved using two microphone arrays 42, 44, where the beams in the cross-beam are null sectors.
The first and second microphone arrays 42, 44 are generally straight or linear in this embodiment, although they can be arranged on a curve, for instance in an arc, in a plane or in a 3-D arrangement. A straight line extending from a first, left end of the first microphone array 42 meets a straight line extending from a second, right end of the second microphone array 44 at a point X and at an angle θ. The distance between the meeting point X and the middle of the nearest microphone of the first microphone array 42 is d]. The distance between the meeting point X and the middle of the nearest microphone of the second microphone array 44 is cfe-
The first and second microphone arrays 42, 44 generate polar patterns with null sectors 46, 48. As with the first embodiment, the null sectors are useful for excluding sounds from specific directions which include a target region where the two null sectors cross. For the first microphone array 42, the first null sector 46 has a sector angle oci, and is at an angle βL relative to the line of the first microphone array 42. For the second microphone array 44, the second null sector 48 has a sector angle cfø and is at an angle βR relative to the line of the second microphone array 44. The region in which the two null sectors overlap is a target, focal zone 50.
In Figure 2, reference "S" represents the sound signal in the focal zone 50. Reference "U1" represents the sound signal in a volume within the second null sector 48 and between the first null sector 46 and the second microphone array 44. Reference "S2" represents the sound signal in a volume within the second null sector 48 and on the other side of the first null sector 46 from the second microphone array 44. Reference "C1" represents the sound signal in a volume within the first null sector 46 and between the second null sector 48 and the first microphone array 42. Reference "C2" represents the sound signal in a volume within the first null sector 46 and on the other side of the second null sector 48 from the first microphone array 42. Reference "D" represents the sound signals from everywhere else.
Modified general sidelobe cancellation (GSC) is employed for adaptive interference or noise cancellation of the outputs from each of the first and second microphone arrays 42, 44. This is achieved in a single combined GSC system 60, as shown in the block diagram of Figure 3.
The outputs of each microphone array 42, 44 are input to separate first and second sub-arrays 62 A, 62B in an array of time delay units 62, whereby the output of each sensor of the two arrays 42, 44 is delayed. For M sensors in the first microphone array 42, the M outputs from the first sub-array of time delay units 62 A are xu(k), X2L{K) ..., XML(K)- For P sensors in the second microphone array 44, the P outputs from the second sub-array of time delay units 62B are xm(k), X2R(K) ■•-, xpiι(k). The numbers M and P may be the same or different. The letters L and R are used as a simple notation for distinguishing between the two arrays, their sensors, their outputs and the processed outputs without importing any suggestion or requirement that there should be a left and/or right array.
The M + P outputs from the array of time delay units 62 are input to a shared fixed beam-former (FBF) 64. The combination of the array of time delay units 62 and the fixed beam-former 64 may be termed a delay-and-sum beamformer. The output b(k) from the FBF 64 pass to a second delay unit 66 to delay the output from the FBF 64 by a
desired amount, to output b(k-L\). The second delay unit 66 is used to synchronise or time-align the signals from the (FBF) 64 and relevant other components of the GSC system 60, so that the interference can be cancelled properly.
The M outputs xidk), X2∑(k) ■ ■ ■, *ML(k) from the first sub-array of time delay units
62 A, corresponding to the M outputs from the first microphone array 42 are also input to a first blocking matrix (BM) 68. The P outputs xm(k), X2R(K) ..., XPR{K) from the second sub-array of time delay units 62B, corresponding to the P outputs from the second microphone array 44 are also input to a second blocking matrix (BM) 70, separate from the first BM 68. The output b(k) of the FBF 64 from before the second delay unit 66 is applied to both the first and second BMs 68, 70. The two BMs 66, 68 block the respective desired signals of the first and second microphone arrays 42, 44, but allow the rest pass through. The outputs b(k-Lj), cn{k) ... CM∑ik), cm(k) ... CPR{K) of the second time delay unit 66 and the first and second BMs 68, 70, respectively, are input to a multiple input canceller (MC) 72. The MC 72 outputs a signal z(k) only containing components of the target signal "S".
Within the MC 72, each input ciL(k) ... cML(k), c1R(k) ... cPR{k) from one of the two BMs 68, 70 passes through a separate Norm-Constraint Adaptive Filter (NCAF) which is controlled by way of the output signal z(k) from the MC 72. The outputs from the M + N NCAFs are added together and the sum is subtracted from the delayed output b(k-Lj) from the second delay unit 66, with the result being the output signal z(k).
In the MC unit 72, correlated components between the output of the FBF 64 and the outputs of the first and second BMs 68, 70 are diminished and finally there is only the "S" component left. As with the control of the filter of the first embodiment, control of the GSC system 60 is adaptive to avoid target-signal cancellation.
The purpose of the two microphones arrays 42, 44 is twofold. On the one hand, each of them is used to simulate the function of one of the two cardioid microphones in Figure 1. On the other hand, the two microphone arrays 42, 44, as a whole, act as the omni-directional microphone of Figure 1, although in this case, it is not to be omni-
directional, but spatially selective. The FBF 64 is used to enhance the sound signal in the target region to a certain extent with very limited interference cancellation ability.
Where the microphone arrays 42, 44 are linear, they cannot discriminate sound coming from above or below or from any elevation angles. If the arrays 42, 44 are used in a broadside way (as in Figure 2), the response is a bidirectional pattern rather than a cardioid pattern. However, if they are used in endfire (end on) orientation, a good cardioid pattern can be achieved.
Whether to use broadside or endfire orientations depends on the requirements of the application. For endfire orientation usage, the null sector is cone shaped but is fixed in one direction, just like a physical cardioid microphone. The broadside usage is more flexible and can steer the null sectors to any direction.
Adaptive Algorithm for the Second Embodiment
To simplify the description, it is assumed that the target signal is located at 90° relative to each array, i.e. fo = 90° and βϋ = 90°. hi this case no time delay is required for the FBF 64, i.e. there is no delay in the array of time delay units 62. Otherwise, the delays in the array of time delay units 62 can be controlled to align each microphone array 42, 44 to the target digitally.
The output of the FBF 64, b(k), is obtained by summing up the outputs of the two linear arrays, i.e.,
M b(k) = ∑χ iL(k)+∑χ jR(k) (13)
M + P /=i 7=1
The outputs of the two blocking matrices 68, 70, are obtained in a standard manner, for instance as is described in O. Hoshuyama, A. Sugiyama and A. Hirano "A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters" [IEEE Transactions on Signal Processing, Vol. 47, no. 10, pp. 2677 - 2684, Oct. 1999], particularly in Equations (1) to (5).
The blocking matrices 68, 70 can be viewed as secondary output means since they output secondary signals representing sound originating from the plurality of regions, not representing or having reduced representation of sound originating from the null sectors and including the target region.
Subsequently, the outputs of the two blocking matrices 68, 70 are fed into NCAFs in the MC 72 and to be cancelled from b(k-L\), that is:
M P z(k) = b(k-L1)-∑Wl(k)CiL(k)-∑WfR(k)CjR(k) (14)
1=1 7=1 W
iL(k) = [w
iLfi(k) w
iLΛ(k) ...
(15)
CiL(k) = [ciL(k) ciL(k-l) ... ciL(k-N+V)f (17)
CJR{k) = [c]R(k) cJR(k-\) ... CjR(k-N+ϊ)J (18) for (i = l, 2, . :, M), (j = l, 2, ..., F) where
Z1 is the number of delay samples for causality, Nis the number of taps in each NCAF,
Walk) and WjR(k) are the coefficient vectors of the zth and theyth NCAF for the first and second arrays 42, 44, respectively, and QL(JC) and Cjdk) are the corresponding signal vectors of the zth NCAF and they'th
NCAF.
The NCAF coefficients are updated by using a Normalized Least Mean Square (NLMS) algorithm, for example as described below. For instance, Wu(K) can be obtained by using the following steps.
WiL , otherwise ,„. , for (i = 1, 2, ..., M) where μ is the step size (0<μ<2),
W lX is the temporal vector for the constraint,
Ω is the total squared-norm of Wn(k+1), and K is the norm constraint threshold.
WJR(K) can be updated by using the same steps as in Equations (19) to (21), for (j = 1, 2, ..., P)
The multiple-input canceller 72 following the blocking matrices adaptively cancels, from the delayed FBF output b(k-L]), those components that correlate to the output of each blocking matrix 68, 72. Therefore any interference "D" outside the current null sector capture region of each microphone array 42, 44 is cancelled out.
Within the GSC system 60, the BMs 68, 70 for the two arrays 42, 44 independently block the capture range for their respective array (within the null sectors of the relevant arrays) and allow the remaining signals to pass through. The combination of what passes through the BMs 68, 70 is the full set of interferences to be cancelled
(including what passes through one BM but not the other). What passes through neither BM, being prohibited by both of the blocking branches, is the component which is desired by both arrays.
In the embodiment of Figure 3, there are only two microphone arrays 42, 44 and two BMs 68, 70. A similar approach can be taken for more than two arrays, F arrays, where the are F BMs, one per array, but still only one delay-and-sum beamformer 62, 64 and one MC 72.
In the embodiment of Figure 2, the location of the focal zone 50 can be controlled. The "spotlight" of each array (i.e. the direction of the first and second null sectors 46, 48) can be steered. This is realised digitally by time delay alignment instead of physical adjustment. Specifically, the time delay is varied in the array of time delay units 62 in the GSC system 60. Given the presence of the two arrays 42, 44, the focal zone 50 can be generated almost anywhere.
The focal zone 50 can be controlled in real time to follow a desired sound source, for instance a person speaking as he moves around a room, by varying the time delays in the array of time delay units 62 in real time. Existing techniques can be used to locate such a speaker, for instance as is described in the Branstein paper mentioned earlier ("A closed-form location estimation for use with room environment microphone arrays").
It is known that minimally two sensors can detect the direction of a sound source. However, to track the 3D location of a sound source in the room, more than 2 sensors are required. Theoretically as long as there are more than 3 sensors, the location in a certain plane can be determined by using the geometrical method. However, to estimate the exact location and in 3D, it is far from enough. In this Branstein paper a generalised algorithm is proposed and a close-formed algorithm is provided for 3D location tracking when using N distributed sensors. This algorithm is applied to the present embodiment by simply substituting the detailed position of each sensor to the algorithm.
Comparing the embodiment of Figure 1, with that of Figures 2 and 3, there are some similarities. The first and second BMs 68, 70 of Figure 3 perform a similar function to the notches of the two unidirectional microphones 12, 14 of Figure 1, while the delay-and-sum beamformer 62, 64 of Figure 3 plays a similar role to the omni¬ directional microphone 16 of Figure 1. The adaptive filter 18 of Figure 1 is functionally the same as the Multiple Canceller 72 in Figure 3 is in effect the same. Nevertheless, the second embodiment allows the equivalent of the "notch" of Figure 1 to be steered in any direction. Another difference is that the fixed beamformer 64 of the GSC system 60 provides a signal of a better quality than that of the single sensor, the omni-directional microphone 16 of Figure 1.
The angle # between the two arrays 42, 44 and the distances dj, d,2 between the meeting point X and the nearest microphone of each array can be adjusted to required values, θ can be set to an obtuse angle or an acute angle to ensure an appropriated common region. The distances dj, d∑ can be set from zero upwards.
Figure 4 shows an arrangement of the microphone array system 40 of Figure 2, where the two arrays 42, 44 are arranged in a single line. This is equal to the case when θ is set to be 180°. However, this does not mean that the overlapping area between the two null sectors 46, 48 cannot be focused and closed. This is achieved by way of controlling the time delays, as is mentioned earlier. This can be thought of as physically turning the arrays, as is shown by the arrays 42A, 44A shown in dotted lines. In this manner, it is possible to use a single straight array of microphones to perform in accordance with the second embodiment, by treating some as belonging to the first array and the rest as belonging to the second array.
Figure 5 shows another variation of the microphone array system 40 of Figure 2. In this case the two arrays 42, 44 share one microphone 52 between them. This is equivalent to the arrangement of Figure 2, with the shared microphone 52 at the meeting point X and the distances dj, d2 between the meeting point X and the nearest microphone of each array both set at zero. The output from the shared microphone 52 is input twice to the FBF 64 and also to both BMs 68, 70 of the GSC system 60. This allows the same sensitivity for one microphone fewer compared with splitting them into two distinct arrays (provided the overall geometry is useful for the specific circumstances).
The widths of the first and second null sectors 46, 48, more particularly the angles aι, <XR of the first and second null sectors 46, 48 can also be modified independently of each other, for instance as is described in O. Hoshuyama, A. Sugiyama and A. Hirano, "A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters" [EEEE Transactions on Signal Processing, Vol. 47, no. 10, pp. 2677 - 2684, Oct. 1999].
Figure 6 shows an extreme position where the angles ccz and GLR of the null sectors 46, 48 are both equal to zero, and the angle θ between the arrays 42, 44 is 90°. With this arrangement the focal zone 50 is simplified into a point. The control of the target region is flexible and the target region can be as small as a point. This is useful in some acoustic field measurement applications where a certain point of the sound is to be measured and the resolution requirement is high.
The features of the above arrangements can be combined as required, whether it is in sharing a microphone, varying the null sector angle or varying the direction of the null sector for one or both microphone arrays. The combinations of shapes and positions of the focal zone are almost limitless.
In the above embodiments of Figure 2 and its variations in Figures 4, 5 and 6, the microphone arrays are themselves linear. However, linear arrays inherently suffer from spatial ambiguity, i.e., they are unable to differentiate the signals impinging from the same azimuth but different elevations. Thus the focal zone 50 in the embodiment of Figure 2 and its variations is in a column. In the case of the arrangement of Figure 6, the focal zone 50, is more shaped as a line in 3-D space.
A Third Embodiment
Figure 7 schematically illustrates a microphone array system 80, according to a third embodiment of the invention, to overcome this problem on non-discrimination of elevations. This embodiment is the same as that of Figure 2, except that it includes an additional, third microphone array 82, orthogonal to the first two arrays 42, 44, to provide a third dimension constraint. The focal zone 84 is now defined in 3-D space, by the first and second null sectors 42, 44, with angles a∑, OR respectively, and by a third null sector 86, from the third array with an angle ay. The outputs from the microphones of the third microphone array 82 are also input into a GSC system together with those from the first and second microphone arrays 42, 44, each output from the three microphone arrays being input into the same FBF and each being input to a separate BM.
The embodiments of the invention provide adaptive microphone array systems for sound capture and interference cancellation. More specifically, the embodiments of the invention provide high performance spatial filter apparatus using a distributed plurality of sensors, cross beam strategy and adaptive technology. The microphone array systems capture only sounds originating from a confined region of space, and whose size and location is controllable. The microphone array systems have a strong interference attenuation ability. All the interference outside the target region is attenuated to a quite low level. Thus, the embodiments provide super volumetric selectivity of a desired sound source (direction and distance), with high flexibility in defining the sound source region of interest. Adaptive technology (including adaptive beamforming and adaptive filtering) is strategically used on multiple microphone arrays/sensors and the overall apparatus acts as a virtual wireless close-talking microphone with a confined position constrained in both distance and directions. Further various embodiments are capable of capturing mobile sound source by incorporating sound source localisation.
In one embodiment mentioned above, three sensors, one omni-directional and the other two unidirectional (preferably cardioid) are used. They are arranged in such a way that the notch sectors of the two unidirectional sensors intersect to form a closed common volume. In another embodiment, more sensors are involved and configured in a form of two or more sub-arrays. The sub-arrays can be arranged in various ways to meet different requirements as may be desired. Existing techniques make it possible to define and adjust the capture region to a desired direction and width. In an extreme situation, the common area can be shrunk into a point. Furthermore, in another embodiment, the invention successfully simulates super directionality in a cone-shape by employing a cross array.
The embodiments of the invention are effective in reducing point interferences and anisotropic diffuse noise and can be readily applied in many speech communication systems, such as hands-free mobile 'phones, desktop voice input systems, the front end of speech recognition systems, teleconferencing, entertainment, intelligent rooms, intelligence gathering, acoustic measurement and other circumstances where there is a requirement is to capture sound from a pre-defined zone.
The components of the apparatus for processing the signals from the microphones in the above description may be provided as separate modules or integrated together. A module and, in particular its functionality, can be implemented in either hardware or software. In the software sense, a module is a process, program, or portion thereof, that usually performs a particular function or related functions. Li the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Likewise the overall processing apparatus can be embodied in an integral circuit or integral software. Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules.
Thus the invention would also cover an embodiment where the outputs of microphones are directed into a standard desktop, laptop or other computer, where appropriate software is used to derive the signal from a sound source in a target area and, perhaps, even track it round a room or otherwise as the sound source moves.
The above-described embodiments are directed towards obtaining a sound signal from a specific source or region. The embodiments of the invention are able to do so using several variants in implementation. From the above description of a specific embodiment and alternatives, it will be apparent to those skilled in the art that modifications/changes can be made without departing from the scope and spirit of the invention. In addition, the general principles defined herein may be applied to other embodiments and applications without moving away from the scope and spirit of the invention. Consequently, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and featured disclosed herein.