METHOD OF INCREASING SPEECH INTELLIGIBILITY AND
DEVICE THEREFOR
tfϊELD OF THE INVENTION
The present invention relates to a pair of spectacles, an audio system, a device for improving the intelligibility of speech, a circuit module and a method of improving the intelligibility of sounds.
BACKGROUND OF THE INVENTION
In the prior art, there are many patents relating to improving the ability to hear and understand speech. Some of these relate to hearing aids for the hearing-impaired, and some of these use modifications of eyeglasses or spectacles to pick up sounds . Hereinafter the term "spectacles" is used, and includes eyeglasses for corrective and cosmetic purposes, sunglasses, goggles, eye- protectors, and other eyewear devices.
Such patents can be broadly divided into three classes .
In the first class, a pair of spectacles is combined with a pair of stereo earphones and a microphone (US Patent 4902120, US Patent 5164987, US Patent 5327178, US Patent 5457751 and US Patent 5608808) . These patents mainly focus on the connector and structure design.
In the second class, hearing aid functionality is integrated into a spectacle frame. The main idea is to attach a normal hearing aid assembly including microphone amplifier circuitry and control elements to the temple bar of spectacles, adding one or two earphones/earplugs beside the ears (US Patent 4451709, US Patent 5159639 and US Patent 5533130) .
In the third class, a noise reduction effect is incorporated to provide hearing aid functionality for the hearing impaired, or to provide hearing protection. The known techniques include:
(a) using a passive method, e.g. covering the earplugs with acoustic dampening material to muffle the external noise (US Patent 6012812 and US Patent 6176576) ;
(b) using unidirectional microphones to attenuate ambient noise (US Patent 4904078, US Patent 5289544 and US Patent 6101258) ; and
(c) using more than one microphone to synthesize a super directional microphone (US Patent 4712244, US Patent 4751738, US Patent 4773095, US Patent 5483599, US Patent 5737430, US Patent 5793875 and US Patent 6154552) .
In these cases, the microphones are typically installed on a temple bar of a pair of spectacles (US
Patent 6154552) . In other similar arrangements microphones are worn as a necklace (US Patent 5793875) . An array processing technique such as delay-and-sum beam-forming is employed. However, the performance from the viewpoint of noise reduction is very limited.
From the technical point of view, there are several possible ways of overcoming noise problems .
(a) Beam-forming techniques have been well -developed in radar and sonar systems for many years to reduce noise by using spatial information. The techniques have not found favor in microphone array applications. This is because the available beam- formers were either over- selective to the desired signal source direction or had unsatisfactory capability for noise reduction. On the other hand, microphone arrays of the prior art were not well adapted for beam-forming.
(b) Speech enhancement is an accepted solution to improve speech quality. Among the available techniques, a method based upon spectral subtraction uses time-domain and frequency-domain statistical information to remove noise from noisy speech signals.
(c) Active Noise Cancellation (ANC) technology is becoming mature and has been successfully applied in headsets to act as both a hearing protection and a communication device in ultra-noisy environments, especially in military aircraft conditions. However, such technology is normally adapted to reduce the noise around a specific point by use of a reference microphone. It is not possible to capture a user's voice clearly in such high noise environments unless the user has a close-talk microphone, which itself is not practical for normal use as well as being uncomfortable.
Although devices combining earphones, microphones, and spectacles together have been proposed, they have not achieved success at least partly because they do not adequately reduce noise.
The use of spectacles as a basis for an audio interface is attractive for a number of reasons:
(a) The number of people wearing spectacles for corrective reasons is increasing, especially among teenagers. Also, sunglasses are widely worn in places such as southern Europe, southeastern Asia and Africa.
(b) Older people with some degree of hearing impairment form an increasing part of the population. A large percentage of them wear spectacles in their daily life.
(c) Labor Laws require employers to provide protective equipment for employees where the working environment is noisy. Where this happens, eye protection is often needed as well. An easily worn device combining the functions of hearing and eye protection would thus be very attractive in these situations.
Given the motivation to provide audio input to a user via a spectacle-type devices, a number of desirable features arise for such a device:
Firstly, the number of mobile communication users is expanding rapidly and the demand for a high-performance and convenient hands-free device is rising accordingly. Embodiments of the invention thus provide a device that
allows hands-free and private reception of telephone calls. Embodiments of the invention also allow speech to be picked up from the user and recorded or sent via a telephone, such as mobile or cordless telephones. Known prior art is unable to take care of both incoming and outgoing signals.
Secondly, people seek personal and private living space and auditory space. Embodiments of the invention allow the user to listen to music or speech from replay devices or broadcast reception devices, with an improvement in the auditory experience while avoiding ambient noise.
Embodiments of the invention also provide a good interface for applications such as intelligent rooms and wearable computers.
SUMMARY OF THE INVENTION
According to a first aspect of the invention there is provided a pair of spectacles comprising a portion for disposition transverse to the brow of a user and at least three microphones secured along said portion and arranged when the spectacles are worn by the user for picking up sounds ahead of the user.
Preferably the microphones are arranged spatially symmetrical about a midline of the spectacles .
Advantageously, there are four microphones.
Conveniently the spectacles comprise an electrical connection device for feeding out electrical signals from said microphones.
Preferably the spectacles comprise a pair of temple bars, and two earphones secured to said temple bars for transducing electrical signals into audible sounds, the earphones being arranged to be selectively disposed in cooperation with a user's ear canal when the spectacles are worn .
Each earphone may be mounted on an arm, and the arms be pivotally secured to said temple bars.
Each of the said earphones may comprise a microphone for picking up ambient sound in the region of a user's ear.
The spectacles may further comprise an electrical connection device for supplying electrical signals to said earphones .
According to a second aspect of the invention there is provided an audio system comprising the combination of a pair of spectacles, at least three microphones, signal conditioning circuitry and at least one earphone, the spectacles having a portion for disposition transverse to a users brow, the at least three microphones being secured along the said portion, the signal conditioning circuitry having an input for receiving signals from the microphones and being constructed and arranged to provide output signals having reduced amounts of signals representative of sounds from unwanted sources and having an output for said output signals, and the or each earphone connectable to
receive the output signals from the signal conditioning circuitry thereby to transduce the output signals into sound .
The use of an array of microphones secured to the spectacles transverse the users brow allows for the ready production of a sensing pattern that conforms to the normal habits of the user. The signal conditioning circuitry may provide directionality from the microphones so as to reduce or eliminate sounds that originate in a direction different from the directionality of the microphones. The signal conditioning circuitry may include filter circuitry for eliminating undesired frequencies or emphasizing desired frequencies. The signal conditioning circuitry may also make use of the properties of speech to reduce the amount of noise .
Advantageously the microphones are disposed symmetrically about a midline of the spectacles.
In a preferred embodiment the system comprises four microphones .
Advantageously again the spectacles further comprise a pair of temple bars, and the said portion comprises a frame arranged to support a pair of lenses, wherein the microphones of the array are integrated into the frame .
Preferably there are two earphones, each secured to a respective temple bar.
Use of two earphones allows for stereo sound reproduction when appropriate. Earphones engaging in the ear canal of the user can also reduce ambient sounds. This may be especially important where hearing protection is required.
Preferably the signal conditioning circuitry comprises controllable beam-former circuitry for processing signals produced by the microphones, the beam-former circuitry being operable to influence the spatial characteristic of pick-up of sound by said microphones.
The ability to control the beam-forming effect, taken with the use of a front-facing microphone array allows for desired sounds to be satisfactorily picked up to the exclusion of sound sources in other directions . The system may also pick up the user's own speech, by suitable control of the beam-former.
Preferably the beam-former circuitry is self-adaptive.
The or each earphone may further comprise a microphone for picking up ambient sound in the region of a user's ear, and further comprising active noise cancellation circuitry for receiving signals from the or each microphone and modifying the output of the signal conditioning circuitry for application to the earphone.
Preferably the signal conditioning circuitry comprises circuitry for reducing noise in said signals from said microphones .
The signal conditioning circuitry may comprise spectral subtraction circuitry.
The use of spectral subtraction techniques allows for noise components, typically. estimated during speech pauses, to be subtracted from the combined speech and noise in the microphone signals.
The system may have interface circuitry having an input from the signal conditioning circuitry and at least one other input, said at least one other input having an electrical connector.
Thus input signals from sound reproduction devices may be provided to the earphones, enabling the user to listen to music, radio, a telephone or the like.
The spectacles may be sunglasses .
The spectacles may be eye protectors.
The use of spectacles as goggles or the like to prevent eye damage from foreign bodies, as noted above, often occurs where high ambient noise occurs .
The system may comprise filter circuitry adapted to the hearing characteristics of a user to afford a hearing aid.
Filter circuitry may have additionally or alternatively at least one stored selectable standard filter characteristic to afford a hearing aid.
Controllable means may be provided for enabling the microphone array to pick up speech of the user and speech of a third party. Typically part of this will be in the form of control to a beam-former.
According to a third aspect of the invention there is provided a device for improving the intelligibility of speech from a specific source in a noisy environment, the device comprising at least three microphones, the microphones being secured to a mounting device constructed and arranged to be supported by the head of a user whereby in use the microphones are disposed substantially in front of the forehead of the user and transverse the face of the user, and further comprising signal conditioning circuitry receiving signals from the microphones and a pair of earphones receiving output signals from the signal conditioning circuitry.
Thus, an array of microphones may be incorporated into the rim of a helmet or other device worn on the head, where the use of spectacles is not appropriate.
Preferably the signal conditioning circuitry comprises controllable beam- former circuitry for processing signals produced by the microphones, the beam-former circuitry being operable to influence the spatial characteristic of pick-up of sound by said microphones .
Preferably the signal conditioning circuitry comprises circuitry for reducing noise in said signals from said microphones .
According to a fourth aspect of the invention there is provided an electric circuit having an input port and an output port, said input port for simultaneously receiving at least three analog electrical signals each representing sound from a noisy environment, said sound including speech, the electric circuit comprising converter circuitry for converting said signals to digital representations thereof, processing circuitry for processing said digital representations of said at least three sound signals to provide an output signal, said processing circuitry operable to controllably combine said digital representations according to a beam-forming algorithm and amplifier circuitry for increasing the power of the output signal, said amplifier circuitry having an output to said output port .
Preferably the processing circuitry further is operable to process said combined digital representations to enhance the intelligibility of said speech.
According to a fifth aspect of the invention there is provided a method of improving the intelligibility of sounds comprising: transducing the sounds into electrical signals using at least three microphones disposed transverse the brow of a user; using a beam-forming algorithm, processing the said electrical signals to provide output signals.
The method may further comprise applying the output signals to transducers to provide audio signals to the user.
The method may further comprise applying the output signals to a telephone.
Embodiments of the invention provide a device which has an inconspicuous appearance and is easily accepted.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described by way of example only with reference to the accompanying drawings in which:
Figure 1 shows a perspective view of an embodiment of an audio system in accordance with the present invention; Figure 2 shows a block schematic diagram of a part of the signal-processing device of Figure 1;
Figure 3 shows the audio system of Figure 1 being used to pick up sound from the mouth of the user;
Figure 4 shows the audio system of Figure 1 being used to pick up sound from a remote person; Figure 5 shows the external appearance of a preferred multiple-pin jack and plug; and
Figure 6 shows a block diagram of an ANC earphone .
DESCRIPTION OF THE INVENTION
In the various figures, like reference numerals refer to like parts.
Referring to Figure 1, the embodiment has a mounting device arranged to be supported in use by the head of a user (here a pair of spectacles) , a microphone array, a pair of earphones, and a signal processing device.
The spectacles illustrated have a first part (200) consisting of two lens-support frame portions (201) and a nose bridge portion (202) linking the two lens-support frame portions. The first part is disposed before the eyes of a user/wearer, and has an upper edge disposed in use generally transverse and in front of the brow of the user/wearer. The first part may be of any suitable material, for example metal or plastics. In the described embodiment, the first part is of plastics. At each outer extremity of the lens support frame portions is a hinged joint (203) connected to a respective temple bar part (204,205). The temple bar parts each have a first generally straight web portion (206) extending to an ear- engaging portion (207) , which curves around the ear to maintain the spectacles in position. It should be noted that the spectacles may be any type of eyeglasses, such as sunglasses, nearsighted glasses, protective glasses. The term "lens" thus includes transparent members not intended to correct the optical properties of the eye; indeed in some instances one lens may be opaque.
The invention is not restricted to spectacles of any particular form or configuration, and can thus be applied to spectacles having a single brow bar from which lenses depend or indeed rimless spectacles .
It will be clear that the mounting device arranged to be supported in use by the head of a user could be of any type, including a hat, a helmet, a head guard, a brow bar solely for the purpose of mounting the microphone array and earphones, or even a headband.
The component elements will now be described. It should be born in mind that although described as discrete elements, some or all of them may be formed as an integrated circuit in some embodiments.
Continuing to refer to Figure 1, the arrangement of microphones (1, 2, 3, and 4), earphones (6, 7), signal processing device (10) , and multiple-pin jack (8) and plug (9) is shown. A plurality of microphones for example four microphones (1-4) is secured to the upper edge of the spectacle frame to form a small microphone array (15) . In the embodiment shown in Figure 1, the first part (200) of the spectacle frame is of plastics, and the microphones (1- 4) are embedded or over-molded into the frame.
Where other types of material are used different securing methods may be appropriate. For example, metal frames may be provided with through holes or recesses for the microphones (1-4) . The microphones (1-4) are secured therein by appropriate means, for example by threaded connections, by force fitting or by adhesive.
Modern manufacturing technology can make small microphones of an order of a millimeter in size, while having acceptable characteristics, comparable to normal microphones. Such small microphones are also light, stable,
and durable as well as having high sensitivity. For example, microphones using MicroElectroMechanical Systems (MEMS) technology (so-called silicon microphones) can be as small as 1 mm in diameter and less than 1 mm high. Normal condenser microphones produced by companies such as
Vansonic Enterprise Co Ltd, Taiwan, can be obtained with dimensions of around 3mm diameter, 2.5mm high and as light as 0.15 gram.
In preferred embodiments, the microphones are secured to a spectacle frame without impairing the appearance of the spectacles, by concealing the microphones. A windproof design may be adopted. It is not essential to arrange the microphones strictly equi-spaced, nor need they be disposed in a single straight line, but a symmetrical distribution is preferred from the acoustic viewpoint. Since the width of normal spectacles is about 12cm to 15 cm, the space between two consecutive microphones of a four-microphone symmetrical array is around 3cm to 5cm.
In one preferred embodiment, first to fourth microphones (1-4) are placed in the positions shown in Figure 1. The outer two microphones, namely the first and fourth (1, 4) are located in the corners near the hinges (203) by the temple bars (204, 205) . The inner two, namely the second and third (2, 3) are located in the frame portion (200) close to the corner of the lenses and near to the bridge portion (202) .
Wires from the microphones (not shown in the figure) are, in this embodiment, hidden inside the frame. They run inside the temple bars (204, 205) to a connector (8) .
Compared to prior art devices and systems using spectacles to support earphones, the microphone array disposed across the brow of a user is superior to microphones mounted on the temple bar of the spectacles, or microphones worn or otherwise affixed to the user's body. Firstly where at least three microphones are secured along a portion for disposition transverse to the brow of a user, arranged when the spectacles are worn by the user for picking up sounds ahead of the user, the head of the user/ wearer does not cast an auditory shadow to the incoming sounds . Secondly this arrangement provides the user/ wearer with flexibility and stability, as well as conforming to normal human listening habits - i.e. looking towards the source of the sound. The user/wearer acts as a collimator and naturally adjusts the sensitive direction (main lobe) of the beam-former naturally to the desired location by turning his or her head.
Furthermore, the disposition enables the microphone array to be dual-purpose, i.e. be able to pick up the sound coming from near-field (user himself) or far-field (other sound sources) .
Two earphones (6, 7) are affixed to the temple bars via arms (13, 14) of highly elastic and flexible material. The arms are secured to the temple bars (204, 205) by pivot connectors (11,12), at pivot points. If the user/wearer does not want to use the earphones, the arms (13, 14) can be rotated around the pivot points (11, 12) so that the user can move the earphones (6,7) away from his ear to a position where they are somewhat concealed from outside view. The user/wearer may of course move only one earphone
(6,7) away from his or her ear while continuing to use the other. The earphones (6,7) may be of normal type. In a preferred embodiment, active noise control (ANC) earphones are used. For use, the arms (13, 14) are rotated so that the earphones (6, 7) either engage the ear canal or are disposed close to it, according to the selected design of earphone (6,7) . To this end the arms (13,14) are shaped suitably to allow such a disposition to occur. Wiring may run to the earphones through the pivot connector, or via external flexible wires from the temple bars (204,205) to the arms (13 , 14) .
Referring to Figure 2, the earphones (6,7) of this embodiment have an ANC function, and consist of an electrical-to-audio transducer, for example a mini-speaker (103, 104) and an audio-to-electrical transducer, for example a mini-microphone (101, 102) . The transducers (101,102) pick up ambient sounds at the earphone and transduces the sounds into signals supplied to an active noise control circuit (107A, 107B) of the signal processing device. The ANC circuit (107A, 107B) is typically an analog device that receives the input from the microphone and feeds that signal in anti-phase, and after conditioning, to the audio output transducer of the earphone. The effect is to actively cancel ambient sounds at the earphone position. An example of an ANC circuit 107A, 107B is shown in Figure 6.
ANC earphones do not give the high performance of ANC headsets, especially in the ability to attenuate high- frequency noise due to the big ear cups of the headsets. However, ANC headsets are designed for ultra-noisy environments and have high cost, size and weight. Thus,
they are unsuitable for normal day-to-day use in normal environments .
Hence, the preferred embodiment uses instead mini ANC earphones, which provide adequate performance in conditions where the noise level is not at the high levels suffered in aircraft or the like. The technology of the ANC headset is thus made available for general use. Recent reports show that mini ANC earphones now can achieve high performance, for example, a product of the Andrea Anti-Noise (R) PC
Headsets/Handsets with Active Noise Cancellation Microphone Technology. This product includes an earphone with a boom- mounted microphone, and active noise control is used to filter out background noise. (Andrea Electronics Corporation, 11-40 45th Road, Long Island City, NY 11101, USA,) . Other suitable products include C.A.T. (Cranial Audio Transmission) System, Panther Electronics, of Street Cloud, Florida, USA.
Returning again to Figure 1, the signal processing device (10) has a box-type casing having I/O ports that include a multiple pin interface to/from the spectacles (20) , a line-in stereo jack (21) , a hand-phone jack (22) , a recorder jack (23) , four control buttons (24) and an LCD display (25) . The signal processing device may have more than four buttons if needed to make the operation convenient and user-friendly.
Referring to Figure 2, the signal processing device (10) of this embodiment has a circuit board with integrated analog circuits, discrete components and digital circuits. Other possibilities exist of course, including all-digital embodiments. In the preferred embodiment shown, the digital
circuitry is provided by a digital signal processor (hereinafter referred to as "DSP") (100) running suitable software. The signal processing device contains amongst other things the signal conditioning circuitry, which in this embodiment has the functions of providing beam-forming from the input signals, speech enhancement, filtering and amplifying.
The signal processing device (10) has, as shown, a lower functional branch (140) forming an input path and an upper functional branch (141) forming an output path. The lower branch (140) has input ports (121-4) receiving four analog inputs (1-4) from the microphone array (15) , and the input ports are connected to an input signal conditioning portion (112), which contains low-pass filters, preamplifiers and ADCs (analog-to-digital converter) . The input signal conditioning portion (112) thus provides four digital outputs to the DSP (100) which performs an adaptive beam-former function (105) by operating a beam-forming algorithm on the inputs. The result of the algorithm (105) is an output (125) . The output (125) may be subject to speech enhancement (106) within the DSP (100) , or routed directly from the beam-former function (105) to a DAC (digital to analog converter) (114) having a first output node (126) . The speech enhancement function (106) can be enabled or disabled at will by the user, or indeed by the manufacturer .
The upper branch (141) has first to third input lines (131-3) connected to an output branch signal conditioning (111) . The output branch signal conditioning (111) contains I/O interface circuits designed appropriately to ensure they are well matched to the input devices having standard
characteristics, in the manner well known to those skilled in the art . Thus impedance matching and the like is provided. The signal conditioning (111) also has an ADC (analog to digital converter) and provides outputs (128A, 128B) being digital representations derived from the input signals. The outputs (128A, 128B) form the inputs to two stereo paths leading to respective earphone outputs (127) of the signal processing device (10) . The first and second input lines (131,132) are from the line-in stereo jack (21) and the third input line (133) is from a hand phone (H/P) speaker connection (22) .
When Line-In function is in use, the outputs (128A, 128B) will represent two stereo channels respectively, while in case that hand phone jack is in use, the outputs (128A, 128B) will be the same because hand phone signal is mono .
The stereo path digital outputs (128A, 128B) of the second signal conditioning portion (111) are applied to DSP (100) which can either route them without modification to a respective DAC (113A, 113B) feeding the earphone outputs (127A, 127B) , or may process the signals. The processing options include one or both of speech enhancement (109A, 109B) and filtering (110A, HOB) .
The output of the DACs (113A, 113B) of this embodiment are passed to the previously-mentioned ANC circuits (107A, 107B) via respective power amplifiers (115A, 115B) . In embodiments where normal earphones are provided instead of ANC earphones, ANC circuits (107A, 107B) are not provided, and the amplified DAC outputs are applied directly to the earphones .
Continuing to refer to Figure 2, the sound captured by the microphone array (15) is a signal stream, incoming to the user viewpoint . Similarly, the sound from the signal processing device (10) is a first signal stream that is outgoing from the user to a third party (via lower branch output (126)). The upper branch (141) outputs (127A, 127B) are signal streams from the third party, or from another signal source such as a CD player, radio or the like. The system also allows the user's own voice captured from the microphone array (15) to be provided to the upper branch output (127A, 127B) , as will be later described herein.
The filter process (110) filters signals representative of sounds to compensate for hearing defects of a particular user/wearer. For example, particular frequency components may be boosted in power with respect to a normal output and others suppressed or reduced. The filter characteristics are customized according to the user's requirement, as is known in the hearing aid device industry; to this purpose the characteristic data is read in to the device, or keyed in, and stored in EEPROM or like non-volatile storage.
The input signal conditioning portion (112) includes a low-pass filter, to remove high frequency noise, and a preamplifier. As already mentioned, it also has a synchronized A-to-D converter. The adaptive beam-forming process (105) of the DSP (100) receives digitized signals from the microphones via the input signal conditioning portion (112) . It processes these digital signals to synthesize signals from a spatial window, so as to get rid of outside interference sounds. The speech enhancement
process (106) reduces unwanted signals (noise) coming from inside the spatial window defined by the beam-forming process and also noise from the desired signal source itself. The D-to-A converter (114) provides an analog output adjusted to standard levels as a normal audio output and then feeds out a monaural signal for possible recording, or for a phone.
A modified version of the robust adaptive beam-former disclosed in US patent 5627799 is applied here by the beam- former process (105) , since the beam-former of that patent meets many of the requirements of the present application.
US patent 5627799 proposes an adaptive block matrix, which consists of coefficient-constrained adaptive filter (CCAF's) using the reference signal from a fixed beam- former. The patent further proposes a multiple-input canceler with norm-constrained adaptive filters (NCAF's) . The CCAF's adaptively cancel the undesirable influence caused by steering-vector errors, and the NCAF's prevent target-signal cancellation when the adaptation of the CCAF's is incomplete. Since the method of the patent retains a high degree of freedom for interference reduction, it can be implemented with few microphones. Simulated anechoic experiment showed that the method is able to cancel interference by over 30 dB. Simulation with real acoustic data captured in a room with 0.3 s reverberation time showed that the noise was suppressed by 19 dB . Evaluation through a real-time signal processing system demonstrated that noise reduction achieved by the method of US patent 562779 was over 12 dB even in a reverberation environment.
However, in the US patent, the step size was constant throughout the whole adaptation. According to the selected size, this can cause serious distortion when speech is present and, if the step size is small, can result in a too low convergence rate. In attempting to treat equally low signal levels and high sound signal levels, optimization of the step-size was sought by using a control algorithm that considered both the output sound quality and the complexity of the reverberant noise environment . The technique uses a variable step size to promote fast convergence and low speech distortion.
It should be kept in mind that, in practice, noise is rarely spatially localized. Thus it is unlikely to be possible to achieve desired speech quality by solely using spatial filtering, i.e. microphone array and adaptive beam- forming. Speech enhancement processes thus are available in the signal processing device. Such processes operate on the output of the adaptive beam-forming process to further improve the quality and intelligibility. However, the improvement achieved by the microphone array and beam- forming process is very important to achieve adequate speech quality.
Speech enhancement makes use of the fact that noises have characteristics differ from speech. These include computer fan noise, air conditioning noise, automobile engine and road noise.
Currently there are many speech enhancement methods, such as spectral subtraction, Kalman filters, Wiener filters, wavelet noise reduction and the Hidden Markov
Model method. Among the available processes spectral subtraction has been applied successfully in many practical applications. The main reasons lie in the fact that firstly spectral subtraction is robust and stable for general background noise and, secondly, by the using Fast Fourier Transfer (FFT) the spectral subtraction algorithm may be fast. Also, the delay introduced may be very small. Central to spectral subtraction is the additivity of speech and noise spectra in the Fourier transform domain, which allows for simple linear subtraction of the noise spectrum estimate. This technique is here used in conjunction with adaptive microphone array beam-forming to further enhance the speech signal.
In a preferred embodiment of this invention, spectral subtraction is employed for speech enhancement. The parameters required for spectral subtraction are set up via the control buttons (24) .
The previously-discussed beam-forming process (105) uses directionality to separate desired speech from interfering sounds, such as cocktail party babble, from directions away from the beam.
The filter process (110) may be used to provide a personalized hearing aid function. After a careful auditory measurement, the hearing ability of the hearing- impaired person is analyzed and reflected in a series of filter coefficients. These data are then input and saved into the signal processing device, e.g. by using the control buttons or via an input port . In the preferred embodiment, some typical filters are also preloaded and stored in memory for untested hearing-impaired users.
The output branch signal conditioning (111) unit is provided to adapt mobile phone plug-in or other audio output devices, such as CD-players or radios, to meet the electric specification required by the ADC. Mobile phone accessories normally include a mono earphone and a mono microphone, which need different treatment from stereo audio sources, that include left and right playback channels. In Figure 2, both the jack connection (21) and phone connector (22) are stereo. As for the ADC, since the input may be provided by a CD player or a similar high quality music source, the sampling rate here is preferably set to 44 100 Hz to avoid sound quality degradation. Nevertheless the sampling rate can be adjusted by user via the control button (24) . The DACs (113A and 113B) of the upper branch (141) have the same conversion rate.
Sometimes the output of lower branch (140) will need to be fed into the upper branch (141) so as to be heard by the user. For example, in case of a hearing aid or for sound boosting the user may wish to employ the microphone array (15) to pick up remote sound; in case of hand-phone mode the user may still want to hear his own voice captured by the microphone array (15) so as to allow himself to follow a natural speaking and listening habit. To enable this to occur, it would be possible to connect the output (126) of the lower branch (140) to one of the inputs (21,22) of the upper branch (141) . However this may lead to unnecessary D/A conversion (in 114) followed immediately by A/D conversion (in 111) . To avoid this, the DSP (100) is controlled to connect the digital signal at the input to the ADC (114) via an internal connection shown figuratively as (130) to inputs of the filter function (110A, HOB) . It
will be clear that there is unlikely to be an advantage from connecting it instead to the speech enhancement process (109A,109B) of the upper branch (141) as it has already been subject to speech enhancement (106) in the lower branch (140) if required.
The input signal conditioning portion (112) prepares sampling data for the adaptive beam-former process (105) . The sampling rate of the synchronous four-channel ADC of the input signal conditioning portion (112) is selected to be at least 8 000 Hz, to meet the requirements of normal communication devices. A preferred frequency band is 150 Hz-4 000 Hz involves most formants of human speech and provides good speech quality.
The DACs (113A, 113B) of the upper branch (141) and the output DAC (114) of the lower branch (140) transform digital input signals to analog. Since the sampling and conversion rate is in the audio range, various converters or CODECs can be used, such as those of the AD Company.
As previously discussed, active noise control is a technique to suppress unwanted acoustic noise by using an actively driven source, using the principle of superposition of acoustic waves. The principal and basic concept is described in US Patent 2,043,416.
Referring to Figure 6, an exemplary ANC earphone circuit (600) has an amplifier (601) connected to receive the output signal from the microphone (101) , and the amplified output (601A) is fed to a controller (602) . The output of the controller (602) is fed via a first resistor (605) to the inverting, input of an output amplifier (606) ,
having a feedback resistor (607) and an output terminal (127A) connected to the minispeaker (103) . An input sound signal (108A) is filtered in a low-pass filter (609) whose output is applied via a second resistor (610) to the summing junction of the output amplifier (606) .
The ANC controller (602) is primarily a filter circuit which receives the amplified noise signal (601A) from the amplifier (601) and performs filter functions dedicated to the ANC process. The parameters of the filter can be adjusted properly to form a desired overall transfer function which may lead to the required anti-phase signal in mini-speaker (103) . The output of the ANC controller (602) then provides an output to the first resistor (605) .
The signal processing device (10) runs in programmed mode, and so is enabled to respond to commands from the buttons (24) for working mode selection, parameter pre-set, and internal function switching. Some working modes with corresponding internal settings are listed below as examples :
Hands-free Mobile Phone Mode: Speech enhancement enabled, Adaptive beam-forming (105) enabled, Filter (110) optional: disabled for normal person; enabled for hearing- impaired person, ANC (107) optional: enabled for noise environment; disabled for quiet environment.
Line-in Mode: Microphone array (15) disabled, Speech enhancement (106,109) disabled, Adaptive beam-forming (105) disabled, Filter (110) optional: disabled for normal person; enabled for hearing impaired person, ANC (107)
optional : enabled for noise environment; disabled for quiet environment .
Record Mode : Microphone array (15) enabled, Adaptive beam-forming (105) enabled, Speech enhancement (106) enabled, Filter (110) optional: disabled for normal person; enabled for hearing impaired person, ANC (107) optional : enabled for noise environment; disabled for quiet environment .
Hearing aid: Microphone array (15) enabled, Adaptive beam-forming (105) enabled, Speech enhancement (106,109) enabled, Filter (110) enabled, ANC (107) optional: enabled for noise environment; disabled for quiet environment.
In the preferred embodiment, the user can control parameters including: (a) Speech Enhancement Level (b) Sampling Rate for the A/D and D/A in the DACs and ADCs (111, 112, 113A, 113B and 114) (c) Width of capture region: acceptance angle of adaptive beam-former (105) (d) Filter coefficients for filter (110) obtained via auditory measurement or preloaded (e) Volume.
Again in the preferred embodiment, the parameter information can be observed through the LCD display. Since there are several working modes, the function units can be enabled or disabled accordingly. This is achieved by loading the mode setting parameter. The user may change and save their preferred settings to meet their own requirements .
The entire computation is optimized to be as low as possible. Many general DSPs, for example, ADSP21065L, TMS320C6021, are available to be selected to perform the functions .
As illustrated in Figure 1 and Figure 2, four mini- sensors (1, 2, 3, and 4) for small microphone array (15) and two pairs of mini-speakers (103, 104) and mini- microphones (101, 102) in ANC earphones are wired properly within the frame. These sensors and speakers are then connected to the signal processing device (10) via a multiple-pin jack (8) and the multiple-pin plug (9) . Today connectors can be manufactured to be small, compact, durable and convenient. A button-shape connector, as shown in Figure 5, is one preferred option, but there may be more than one alternative in practice.
It has been mentioned that embodiments of the invention create a multiple purpose audio interface. Two examples are given here to demonstrate how the functions are performed, representative of typical applications. For clarity, in the following context we refer the person who wears/uses the device as the user. The other party does not wear a similar device.
Example 1 :
The user is talking to his friend via mobile phone. Both of them are within a noisy environment . The microphone array (15) is used for near-field sound capturing.
First of all, the device enables the user to talk in hands-free mode. To do this, he rotates down the earphones (6, 7), puts them into his ear and starts to talk. Referring to Figure 3, the microphone array (15) picks up his sound from his mouth (301) automatically. The adaptive beam-former (105) uses the outputs of the microphone array (15) to simulate a super directivity microphone and thus to attenuate noise, echoes and interference (302) from outside the capture region (303) . Unlike a normal microphone array and beam-forming system applied in for example a teleconferencing system, source location is not required as the user's mouth (301) is perpendicular to the array (15). The user can adjust his head to avoid interference and noise falling into the capture region (303) .
Secondly, using the speech enhancement unit (106) attenuates residual noise produced from ambient noise and other noise within the capture region. The user decides whether, how and when to use this unit by using control buttons (24) .
In Figure 2 it has been indicated that the user's voice can be picked up by the microphones and applied to the earphones so that the user can hear his own voice. This avoids shouting, which may occur where a telephone user has earphones in his ears. On the other hand, the user may also check whether his sound to be transmitted is sufficiently free from noise. If not, he can turn on the speech enhancement unit (106) using the buttons (24) .
He may also adjust the noise cancel degree to a proper level so as to control the opposing features of speech distortion and the speech enhancement level.
At the same time, the ANC earphones (6, 7) affixed in his ear can provide him a quiet space by attenuating ambient noise. This ensures he can hear both the near end (own voice) and far end (friend's voice) clearly at the same time. The ANC earphones maintain the sound fidelity, which comes from the input point (108A, 108B) , and does not affect the function of any other units. Whether to use this unit or not depends on the user's location and surround noise level. When the far end sound (his friend's voice) comes in, the user can enable the speech enhancement unit (109A, 109B) or not according to the received sound quality.
Since his friend is also in a noisy environment, the incoming sound is expected to be noisy. In this case it would be better for him to turn on the unit to attenuate the noise in the far end sound. The final result is that the user can hear his friend's sound clearly and his own voice transmitted to his friend is clear as well. However, if his friend does not wear the device, especially the ANC earphone, from his point of view, although the coming sound is clear, he cannot fully enjoy the clarity as a result of ambient noise.
This example shows that the embodiment is able to reduce both the noise of incoming and outgoing signals.
Example 2 :
Hearing aid in a noisy environment . The user is a person with hearing defect needing the assistance of a hearing aid. Most of the function blocks in Figure 2 are used in this case. It is reported that in many cases the hearing aid wearer's inability to decipher speech is caused by the poor signal-to-noise ratio of the signal, rather than by inadequate amplification. In this example, both problems are concerned and remedied.
Referring to Figure 4, similar to the previous example, ANC unit (107A, 107B) may be employed to attenuate the noise around the user. In this example, the microphone array (15) is used to capture the sound coming from a remote speaker (401) . The user may adjust the width of the capture region (403) via control buttons (24) to define the appropriate open angle of the capture region (403) , so as to remove the effects of interference and noise sources (402) spatially separate from the desired speaker (401). Speech enhancement (106) is involved if needed to enhance the speech further and in addition, the filter (110A, HOB) is used for frequency response compensation. The power amplifier (115A,115B) provides overall volume adjustment.
It should be emphasized that the microphone array working in this way can handle both interference and echo cancellation. Hence it is useful in applications such as hearing assistance in an auditorium room, classroom, cinema, sports center, landscape.
The adaptive beam-former (105) upstream of the speech enhancement unit (106) reduces point noise sources and ambient noise to some extent. This would significantly
relieve the load on the speech enhancement unit (106) . It is then much easier for the latter to detect the speech and to further attenuate noise. The beam-former can also provide more statistical and spatial information on surrounding noise and this is helpful for the speech enhancement unit in performing the time and frequency domain noise cancellation. ANC (107A, 107B) can reduce the noise around the user efficiently separately from the adaptive beam-former (105) and speech enhancement units (106, 109A, and 109B) .
For example, when listening to a high-fidelity CD music in a noisy environment, the user suffers from ambient sounds without using ANC . Neither speech enhancement nor beam-forming can deal with this issue. For the same reason, if the far end sound is originally noisy and the user is located in a noisy environment, he would not hear the clear sound by using speech enhancement unit only.
The described embodiment provides a multi-functional audio interface which can serve as a flexible hands-free mobile phone accessory, a super directional microphone, a hearing protector, or a hearing aid assembly. It can also be used as a hearing assistant to help those sitting far from the speaker in a big auditorium or classroom so that the speaker may be heard clearly, reducing the effects caused by noise and reverberation. In this embodiment, the functions can be configured to meet the requirement of users by using a controller in a signal-processing device.
Other embodiments are dedicated to particular scenarios of use. For example, for working in a noisy environment companies provide a device having no input for
external music or phone connection. A hearing aid may only store a single customized filter characteristic dedicated to the user, and have fewer or no user adjustable controls.
Embodiments of the present invention have been described with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the present invention.