WO2004038697A1 - Controlling an apparatus based on speech - Google Patents
Controlling an apparatus based on speech Download PDFInfo
- Publication number
- WO2004038697A1 WO2004038697A1 PCT/IB2003/004203 IB0304203W WO2004038697A1 WO 2004038697 A1 WO2004038697 A1 WO 2004038697A1 IB 0304203 W IB0304203 W IB 0304203W WO 2004038697 A1 WO2004038697 A1 WO 2004038697A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- user
- control unit
- recognition
- audio signals
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Definitions
- the invention relates to a speech control unit for controlling an apparatus on basis of speech, comprising:
- a microphone array comprising multiple microphones for receiving respective audio signals; - a beam forming module for extracting a speech signal of a user, from the audio signals as received by the microphones, by means of enhancing first components of the audio signals which represent an utterance originating from a first orientation of the user relative to the microphone array; and
- a speech recognition unit for creating an instruction for the apparatus based on recognized speech items of the speech signal.
- the invention further relates to an apparatus comprising:
- Such a speech control unit for controlling the apparatus on basis of speech; and processing means for execution of the instruction being created by the speech control unit.
- the invention further relates to a method of controlling an apparatus on basis of speech, comprising:
- Natural spoken language is a preferred means for human-to-human communication. Because of recent advances in automatic speech recognition, natural spoken language is emerging as an effective means for human-to-machine communication. The user is being liberated from manipulating a keyboard and mouse, which requires great hand/eye coordination. This hands-free advantage of human to machine communication through speech recognition is particularly desired in situations where the user must be free to use his/her eyes and hands, and to move about unencumbered while talking. However the user is still encumbered in present systems by hand-held, body- orn, or tethered microphone equipment, e.g. headset microphone, which captures audio signals and provides input to the speech recognition unit. This is because most speech recognition units work best with a close-talking microphone input, e.g.
- a microphone array in combination with a beam forming module appears to be a good approach that can resolve the conventionally encountered inconvenience described above.
- the microphone array is a set of microphones which are arranged at different positions.
- the multiple audio signals received by the respective microphones of the array are provided to the beam forming module.
- the beam forming module has to be calibrated, i.e. an orientation or position of a particular sound source relative to the microphone array has to be estimated.
- the particular sound source might be the source in the environment of the microphone array which generates sound having parameters corresponding to predetermined parameters, e.g. comprising predetermined frequencies matching with human voice.
- the calibration is based on the loudest sound, i.e. the particular sound source generates the loudest sound.
- a beam forming module can be calibrated on basis of the user who is speaking loudly, compared to other users in the same environment.
- a sound source direction or position can be estimated from time differences among signals from different microphones, using a delay sum array method or a method based on the cross-correlation function as disclosed in: "Knowing Who to Listen to in Speech Recognition: Visually Guided Beamforming", by U. Bub, et al. ICASSP'95, pp. 848-851, 1995.
- a parametric method estimating the sound source position (or direction) is disclosed in S. V. Pillai: "Array Signal Processing", Springer- Verlag, New York, 1989.
- the beam forming module After being calibrated, i.e. the current orientation being estimated, the beam forming module is arranged to enhance sound originating from a direction corresponding to the current direction and to reduce noise, by synthetic processing of outputs of these microphones. It is assumed that the output of the beam forming module is a clean signal that is appropriate to be provided to a speech recognition unit resulting in a robust speech recognition. This means that the components of the audio signals are processed such that the speech items of the user can be extracted.
- An embodiment of a system comprising a microphone array, a beam forming module and a speech recognition unit is known from European Patent Application EP 0795851 A2.
- the Application discloses that a sound source position or direction estimation and a speech recognition can be achieved with the system.
- the disadvantage of this system is that it does not work appropriate in a multi user situation. Suppose that the system has been calibrated for a first position of the user. Then the user starts moving. The system should be re-calibrated first to be able to recognize speech correctly. The system requires audio signals, i.e. the user has to speak something, as input for the calibration. However, if in between another user starts speaking, then the re-calibration will not provide the right result: the system will get tuned to the other user.
- the speech control unit comprises a keyword recognition system for recognition of a predetermined keyword that is spoken by the user and which is represented by a particular audio signal and the speech control unit being arranged to control the beam forming module, on basis of the recognition of the predetermined keyword, in order to enhance second components of the audio signals which represent a subsequent utterance originating from a second orientation of the user relative to the microphone array.
- the keyword recognition system is arranged to discriminate between audio signals related to utterances representing the predetermined keyword and to other utterances which do not represent the predetermined keyword.
- the speech control unit is arranged to re-calibrate if it receives sound corresponding to the predetermined keyword, from a different orientation.
- this sound has been generated by the user who initiated an attention span (see also Fig. 3) of the apparatus to be controlled. There will be no re-calibration if the predetermined keyword has not been recognized. As a consequence, speech items spoken from another orientation and which are not preceded by the predetermined keyword, will be discarded.
- the keyword recognition system is arranged to recognize the predetermined keyword that is spoken by another user and the speech control unit being arranged to control the beam forming module, on basis of this recognition, in order to enhance third components of the audio signals which represent another utterance originating from a third orientation of the other user relative to the microphone array.
- This embodiment of the speech control unit is arranged to re-calibrate on basis of the recognition of the predetermined keyword spoken by another user.
- this embodiment is arranged to calibrate on basis of sound from multiple users. That means that only authorized users, i.e. those who have authorization to control the apparatus because they have spoken the predetermined keyword, are recognized as such and hence only speech items from them will be accepted for the creation of instructions for the apparatus.
- a first one of the microphones of the microphone array is arranged to provide the particular audio signal to the keyword recognition system.
- the particular audio signal which is used for keyword recognition corresponds to one of the audio signals as received by the microphones of the microphone array.
- the beam forming module is arranged to determine a first position of the user relative to the microphone array. Besides orientation, also a distance between the user and the microphone array is determined. The position is calculated on basis of the orientation and distance.
- the apparatus comprises the speech control unit as claimed in claim 1.
- An embodiment of the apparatus according to the invention is arranged to show that the predetermined keyword has been recognized.
- An advantage of this embodiment according to the invention is that the user gets informed about the recognition.
- An embodiment of the apparatus according to the invention which is arranged to show that the predetermined keyword has been recognized, comprises audio generating means for generating an audio signal.
- audio generating means for generating an audio signal.
- an audio signal e.g. "Hello” it is clear for the user that the apparatus is ready to receive speech items from the user. This concept is also known as auditory greeting.
- the method is characterized in comprising recognition of a predetermined keyword that is spoken by the user based on a particular audio signal and controlling the extraction of the speech signal of the user, on basis of the recognition, in order to enhance second components of the audio signals which represent a subsequent utterance originating from a second orientation of the user relative to the microphone array.
- Fig. 1 schematically shows an embodiment of the speech control unit according to the invention
- Fig. 2 schematically shows an embodiment of the apparatus according to the invention
- Fig. 3 schematically shows the creation of an instruction on basis of a number of audio signals.
- Fig. 1 schematically shows an embodiment of the speech control unit 100 according to the invention.
- the speech control unit 100 is arranged to provide instructions to the processing unit 202 of the apparatus 200. These instructions are provided at the output connector 122 of the speech control unit 100, which comprises: - a microphone array, comprising multiple microphones 102, 104, 106, 108 and 110 for receiving respective audio signals 103, 105, 107, 109 and 111;
- a beam forming module 116 for extracting a clean, i.e. speech, signal 117 of a user Ul, from the audio signals 103, 105, 107, 109 and 111 as received by the microphones 102, 104, 106, 108 and 110; - a keyword recognition system 120 for recognition of a predetermined keyword that is spoken by the user and which is represented by a particular audio signal 111 and being arranged to control the beam forming module, on basis of the recognition; and
- the speech control unit 100 is calibrated on basis of utterances of user Ul being at position PI. The result is that the beam forming module 116 of the speech control unit 100 is "tuned" to sound originating from directions which substantially match direction a . Sound from directions which differ from direction a with more than a predetermined threshold, is disregarded for speech recognition. E.g. speech of user U2, being located at position P2 with a direction ⁇ relative to the microphone array is neglected.
- the speech control unit 100 is sensitive to sound with voice characteristics, i.e.
- the speech control unit 100 or more particular the beam forming module 116, the recognition of speech items probably would fail. However the speech control unit 100 will get calibrated again when user Ul starts his speaking with the predetermined keyword.
- the predetermined keyword as spoken by user Ul is recognized and used for the re-calibration.
- further words spoken by the first user Ul which succeed the keyword are also applied for the re-calibration. If another user, e.g.
- the speech control unit 100 is arranged to stay "tuned" to user Ul while he/she is moving. Speech signals of this user Ul are extracted from the audio signals 103, 105, 107, 109 and 111 and are basis for speech recognition. Other sounds are not taken into account for the control of the apparatus.
- the speech control unit 100 is arranged to "follow" one specific user Ul .
- This user might be the user who initiated the attention span of the speech control unit.
- the speech control unit 100 is arranged to get subsequently tuned to a number of users.
- Fig. 1 is depicted that the microphone 110 is connected to both the keyword recognition system 120 and the beam forming module 116. This is optional, that means that an additional microphone could have been used.
- the keyword recognition system 120 might be comprised by the speech recognition unit 118.
- the components 116-120 of the speech control unit 100 and the processing unit 202 of the apparatus 200 may be implemented using one processor. Normally, both functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there.
- the program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet.
- an application specific integrated circuit provides the disclosed functionality.
- Fig. 2 schematically shows an embodiment of the apparatus 200 according to the invention.
- the apparatus 200 optionally comprises a generating means 206 for generating an audio signal.
- an audio signal e.g. "Hello” it is clear for the user that the apparatus is ready to receive speech items from the user.
- the generating means 206 is arranged to generate multiple sounds: e.g. a first sound to indicate that the apparatus is in a state of calibrating and a second sound to indicate that the apparatus is in a state of being calibrated and hence the apparatus is in an active state of recognizing speech items.
- the generating means 206 comprises a memory device for storage of sampled audio signals, a sound generator and a speaker.
- the apparatus also comprises a display device 204 for displaying a visual representation of the state of the apparatus.
- the speech control unit 100 is preferably used in a multi-function consumer electronics system, like a TV, set top box, VCR, or DVD player, game box, or similar device. But it may also be a consumer electronic product for domestic use such as a washing or kitchen machine, any kind of office equipment like a copying machine, a printer, various forms of computer work stations etc, electronic products for use in the medical sector or any other kind of professional use as well as a more complex electronic information system. Besides that, it may be a product specially designed to be used in vehicles or other means of transport, e.g. a car navigation system.
- multifunction electronic system as used in the context of the invention may comprise a multiplicity of electronic products for domestic or professional use as well as more complex information systems, the number of individual functions to be controlled by the method would normally be limited to a reasonable level, typically in the range from 2 to 100 different functions. For a typical consumer electronic product like a TV or audio system, where only a more limited number of functions need to be controlled, e.g.
- volume control including muting, tone control, channel selection and switching from inactive or stand-by condition to active condition and vice versa, which could be initiated, by control commands such as "louder”, “softer”, “mute”, “bass” “treble” "change channel”, "on”, “off, “stand-by” etcetera.
- the speech control unit 100 is located in the apparatus 200 being controlled. It will be appreciated that this is not required and that the control method according to the invention is also possible where several devices or apparatus are connected via a network (local or wide area), and the speech control unit 100 is located in a different device then the device or apparatus being controlled.
- Fig. 3 schematically shows the creation of an instruction 318 on basis of a number of audio signals 103, 105, 107, 109 and 111 as received by the microphones 102, 104, 106, 108 and 110. From the audio signals the speech items 304-308 are extracted. The speech items 304-308 are recognized and voice commands 312-316 are assigned to these speech items 304-308. The voice commands 312-316 are "Bello”, “Channel” and “Next”, respectively. An instruction "Increase_Frequency_Band", which is interpretable for the processing unit 202 is created based on these voice commands 312-316.
- the speech control unit 100 optionally requires the user to activate the speech control unit 100 resulting in a time span, or also called attention span during which the speech control unit 100 is active. Such an activation may be performed via voice, for instance by the user speaking a keyword, like "TV” or "Device- Wake-up".
- the keyword for initiating the attention span is the same as the predetermined keyword for re-calibrating the speech control unit.
- a barrier for interaction is removed: it is more natural to address the character instead of the product, e.g. by saying "Bello" to a dog-like character.
- a product can make effective use of one object with several appearances, chosen as a result of several state elements. For instance, a basic appearance like a sleeping animal can be used to show that the speech control unit 100 is not yet active. A second group of appearances can be used when the speech control unit 100 is active, e.g. awake appearances of the animal. The progress of the attention span can then, for instance, be expressed, by the angle of the ears: fully raised at the beginning of the attention span, fully down at the end.
- the position of the eyes of the character can be used to feedback to the user where the system is expecting the user to be.
- the apparatus i.e. the speech control unit 100 is in a state of accepting further speech items.
- These speech items 304-308 will be recognized and associated with voice commands 312-316.
- a number of voice commands 312-316 together will be combined to one instruction 318 for the apparatus.
- a first speech item is associated with "Bello”, resulting in a wake-up of the television.
- a second speech item is associated with the word "channel” and a third speech item is associated with the word "next”.
- the result is that the television will switch, i.e. get tuned to a next broadcasting channel. If another user starts talking during the attention span of the television just initiated by the first user, then his/her utterances will be neglected.
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004546229A JP4837917B2 (en) | 2002-10-23 | 2003-09-22 | Device control based on voice |
US10/532,469 US7885818B2 (en) | 2002-10-23 | 2003-09-22 | Controlling an apparatus based on speech |
EP13151978.7A EP2587481B1 (en) | 2002-10-23 | 2003-09-22 | Controlling an apparatus based on speech |
EP03809389.4A EP1556857B1 (en) | 2002-10-23 | 2003-09-22 | Controlling an apparatus based on speech |
AU2003260926A AU2003260926A1 (en) | 2002-10-23 | 2003-09-22 | Controlling an apparatus based on speech |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02079421.0 | 2002-10-23 | ||
EP02079421 | 2002-10-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004038697A1 true WO2004038697A1 (en) | 2004-05-06 |
Family
ID=32116290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2003/004203 WO2004038697A1 (en) | 2002-10-23 | 2003-09-22 | Controlling an apparatus based on speech |
Country Status (7)
Country | Link |
---|---|
US (1) | US7885818B2 (en) |
EP (2) | EP2587481B1 (en) |
JP (1) | JP4837917B2 (en) |
KR (1) | KR101034524B1 (en) |
CN (1) | CN100508029C (en) |
AU (1) | AU2003260926A1 (en) |
WO (1) | WO2004038697A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007138503A1 (en) * | 2006-05-31 | 2007-12-06 | Philips Intellectual Property & Standards Gmbh | Method of driving a speech recognition system |
WO2009140781A1 (en) * | 2008-05-20 | 2009-11-26 | Svox Ag | Method for classification and removal of undesired portions from a comment for speech recognition |
EP2362681A1 (en) * | 2010-02-15 | 2011-08-31 | Dietmar Ruwisch | Method and device for phase-dependent processing of sound signals |
US8214219B2 (en) | 2006-09-15 | 2012-07-03 | Volkswagen Of America, Inc. | Speech communications system for a vehicle and method of operating a speech communications system for a vehicle |
US8290178B2 (en) * | 2005-07-26 | 2012-10-16 | Honda Motor Co., Ltd. | Sound source characteristic determining device |
WO2013106133A1 (en) * | 2012-01-12 | 2013-07-18 | Qualcomm Incorporated | Augmented reality with sound and geometric analysis |
WO2017184149A1 (en) | 2016-04-21 | 2017-10-26 | Hewlett-Packard Development Company, L.P. | Electronic device microphone listening modes |
EP3343559A1 (en) * | 2016-12-29 | 2018-07-04 | Beijing Xiaoniao Tingting Technology Co., Ltd | De-reverberation control method and device of sound producing equipment |
GB2586783B (en) * | 2019-08-29 | 2022-11-16 | Singh Digva Kavalijeet | Vehicle safety apparatus |
Families Citing this family (62)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE400474T1 (en) * | 2005-02-23 | 2008-07-15 | Harman Becker Automotive Sys | VOICE RECOGNITION SYSTEM IN A MOTOR VEHICLE |
JP2009508560A (en) * | 2005-09-21 | 2009-03-05 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Ultrasound imaging system with voice activated control using a remotely located microphone |
KR100738414B1 (en) * | 2006-02-06 | 2007-07-11 | 삼성전자주식회사 | Method for improving performance of speech recognition in telematics environment and device for executing the method |
KR100776803B1 (en) * | 2006-09-26 | 2007-11-19 | 한국전자통신연구원 | Apparatus and method for recognizing speaker using fuzzy fusion based multichannel in intelligence robot |
CN101377797A (en) * | 2008-09-28 | 2009-03-04 | 腾讯科技(深圳)有限公司 | Method for controlling game system by voice |
US8243952B2 (en) * | 2008-12-22 | 2012-08-14 | Conexant Systems, Inc. | Microphone array calibration method and apparatus |
JP5493611B2 (en) * | 2009-09-09 | 2014-05-14 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
TW201123823A (en) * | 2009-12-18 | 2011-07-01 | Kinpo Elect Inc | System and method for communication with hands-free profile |
CN102131015A (en) * | 2010-01-18 | 2011-07-20 | 金宝电子工业股份有限公司 | Home hands-free conversation system integrating illumination system and home hands-free conversation method |
US8738377B2 (en) * | 2010-06-07 | 2014-05-27 | Google Inc. | Predicting and learning carrier phrases for speech input |
KR101327112B1 (en) * | 2010-08-23 | 2013-11-07 | 주식회사 팬택 | Terminal for providing various user interface by using surrounding sound information and control method thereof |
US9171551B2 (en) * | 2011-01-14 | 2015-10-27 | GM Global Technology Operations LLC | Unified microphone pre-processing system and method |
CN102595281B (en) * | 2011-01-14 | 2016-04-13 | 通用汽车环球科技运作有限责任公司 | The microphone pretreatment system of unified standard and method |
CN102671381A (en) * | 2011-03-08 | 2012-09-19 | 德信互动科技(北京)有限公司 | Acoustic control-based game implementation device and method |
CN102800312A (en) * | 2011-05-24 | 2012-11-28 | 鸿富锦精密工业(深圳)有限公司 | Voice control system and method |
TWI406266B (en) | 2011-06-03 | 2013-08-21 | Univ Nat Chiao Tung | Speech recognition device and a speech recognition method thereof |
CN103127718A (en) * | 2011-11-30 | 2013-06-05 | 北京德信互动网络技术有限公司 | Game achieving device and method based on voice control |
CN104285452A (en) | 2012-03-14 | 2015-01-14 | 诺基亚公司 | Spatial audio signal filtering |
EP2817801B1 (en) | 2012-03-16 | 2017-02-22 | Nuance Communications, Inc. | User dedicated automatic speech recognition |
US9111542B1 (en) * | 2012-03-26 | 2015-08-18 | Amazon Technologies, Inc. | Audio signal transmission techniques |
EP2660813B1 (en) * | 2012-04-30 | 2014-12-17 | BlackBerry Limited | Dual microphone voice authentication for mobile device |
US9881616B2 (en) * | 2012-06-06 | 2018-01-30 | Qualcomm Incorporated | Method and systems having improved speech recognition |
JP5972372B2 (en) * | 2012-06-25 | 2016-08-17 | 三菱電機株式会社 | Car information system |
KR102091236B1 (en) * | 2012-09-28 | 2020-03-18 | 삼성전자 주식회사 | Electronic apparatus and control method of the same |
WO2014063099A1 (en) * | 2012-10-19 | 2014-04-24 | Audience, Inc. | Microphone placement for noise cancellation in vehicles |
WO2014064324A1 (en) * | 2012-10-26 | 2014-05-01 | Nokia Corporation | Multi-device speech recognition |
US9265458B2 (en) | 2012-12-04 | 2016-02-23 | Sync-Think, Inc. | Application of smooth pursuit cognitive testing paradigms to clinical drug development |
US10102850B1 (en) * | 2013-02-25 | 2018-10-16 | Amazon Technologies, Inc. | Direction based end-pointing for speech recognition |
US9380976B2 (en) | 2013-03-11 | 2016-07-05 | Sync-Think, Inc. | Optical neuroinformatics |
CN104053088A (en) * | 2013-03-11 | 2014-09-17 | 联想(北京)有限公司 | Microphone array adjustment method, microphone array and electronic device |
JP6114915B2 (en) * | 2013-03-25 | 2017-04-19 | パナソニックIpマネジメント株式会社 | Voice input selection device and voice input selection method |
US9269350B2 (en) * | 2013-05-24 | 2016-02-23 | Google Technology Holdings LLC | Voice controlled audio recording or transmission apparatus with keyword filtering |
US9984675B2 (en) * | 2013-05-24 | 2018-05-29 | Google Technology Holdings LLC | Voice controlled audio recording system with adjustable beamforming |
US9747899B2 (en) * | 2013-06-27 | 2017-08-29 | Amazon Technologies, Inc. | Detecting self-generated wake expressions |
CN103529726B (en) * | 2013-09-16 | 2016-06-01 | 四川虹微技术有限公司 | A kind of intelligent switch with speech identifying function |
US9508345B1 (en) | 2013-09-24 | 2016-11-29 | Knowles Electronics, Llc | Continuous voice sensing |
US9245527B2 (en) | 2013-10-11 | 2016-01-26 | Apple Inc. | Speech recognition wake-up of a handheld portable electronic device |
US9953634B1 (en) | 2013-12-17 | 2018-04-24 | Knowles Electronics, Llc | Passive training for automatic speech recognition |
CN103873977B (en) * | 2014-03-19 | 2018-12-07 | 惠州Tcl移动通信有限公司 | Recording system and its implementation based on multi-microphone array beam forming |
US9437188B1 (en) | 2014-03-28 | 2016-09-06 | Knowles Electronics, Llc | Buffered reprocessing for multi-microphone automatic speech recognition assist |
EP3154052A4 (en) * | 2014-06-03 | 2018-01-10 | Sony Corporation | Information processing device, information processing method, and program |
CN105637895B (en) * | 2014-07-10 | 2019-03-26 | 奥林巴斯株式会社 | The control method of recording device and recording device |
US9432769B1 (en) | 2014-07-30 | 2016-08-30 | Amazon Technologies, Inc. | Method and system for beam selection in microphone array beamformers |
WO2016033269A1 (en) * | 2014-08-28 | 2016-03-03 | Analog Devices, Inc. | Audio processing using an intelligent microphone |
US9734822B1 (en) * | 2015-06-01 | 2017-08-15 | Amazon Technologies, Inc. | Feedback based beamformed signal selection |
US9973641B2 (en) * | 2015-10-22 | 2018-05-15 | Kabushiki Kaisha Toshiba | Multi-function printer |
CN105427860B (en) * | 2015-11-11 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | Far field audio recognition method and device |
US10157629B2 (en) * | 2016-02-05 | 2018-12-18 | Brainchip Inc. | Low power neuromorphic voice activation system and method |
DE102016118007A1 (en) | 2016-09-23 | 2018-03-29 | Gira Giersiepen Gmbh & Co. Kg | Method for operating at least one building intercom and a corresponding building intercom system |
US10510362B2 (en) * | 2017-03-31 | 2019-12-17 | Bose Corporation | Directional capture of audio based on voice-activity detection |
KR102304342B1 (en) * | 2017-08-14 | 2021-09-23 | 에스케이텔레콤 주식회사 | Method for recognizing voice and apparatus used therefor |
JP6755843B2 (en) | 2017-09-14 | 2020-09-16 | 株式会社東芝 | Sound processing device, voice recognition device, sound processing method, voice recognition method, sound processing program and voice recognition program |
WO2019093123A1 (en) | 2017-11-07 | 2019-05-16 | ソニー株式会社 | Information processing device and electronic apparatus |
JP6991041B2 (en) * | 2017-11-21 | 2022-01-12 | ヤフー株式会社 | Generator, generation method, and generation program |
JP6853163B2 (en) * | 2017-11-27 | 2021-03-31 | 日本電信電話株式会社 | Speaker orientation estimator, speaker orientation estimation method, and program |
US20190172240A1 (en) * | 2017-12-06 | 2019-06-06 | Sony Interactive Entertainment Inc. | Facial animation for social virtual reality (vr) |
KR101972545B1 (en) * | 2018-02-12 | 2019-04-26 | 주식회사 럭스로보 | A Location Based Voice Recognition System Using A Voice Command |
CN110364166B (en) * | 2018-06-28 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Electronic equipment for realizing speech signal recognition |
JP6998289B2 (en) * | 2018-11-19 | 2022-01-18 | ヤフー株式会社 | Extractor, learning device, extraction method, extraction program, learning method and learning program |
CN110111805B (en) * | 2019-04-29 | 2021-10-29 | 北京声智科技有限公司 | Automatic gain control method and device in far-field voice interaction and readable storage medium |
EP4026118A4 (en) * | 2019-09-02 | 2023-05-24 | Cerence Operating Company | Vehicle avatar devices for interactive virtual assistant |
TWI725668B (en) * | 2019-12-16 | 2021-04-21 | 陳筱涵 | Attention assist system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1289247A2 (en) * | 2001-08-31 | 2003-03-05 | Mitel Knowledge Corporation | System and method of indicating and controlling sound pickup direction and location in a teleconferencing system |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3437617B2 (en) * | 1993-06-03 | 2003-08-18 | 株式会社東芝 | Time-series data recording / reproducing device |
US5749072A (en) * | 1994-06-03 | 1998-05-05 | Motorola Inc. | Communications device responsive to spoken commands and methods of using same |
JP3522954B2 (en) * | 1996-03-15 | 2004-04-26 | 株式会社東芝 | Microphone array input type speech recognition apparatus and method |
JP3332143B2 (en) * | 1997-06-23 | 2002-10-07 | 日本電信電話株式会社 | Sound pickup method and device |
US5906870A (en) * | 1997-12-29 | 1999-05-25 | Lo; Szu Wei | Electric rotary decoration |
DE19943875A1 (en) * | 1999-09-14 | 2001-03-15 | Thomson Brandt Gmbh | Voice control system with a microphone array |
WO2001022404A1 (en) * | 1999-09-23 | 2001-03-29 | Koninklijke Philips Electronics N.V. | Speech recognition apparatus and consumer electronics system |
JP2002034092A (en) * | 2000-07-17 | 2002-01-31 | Sharp Corp | Sound-absorbing device |
EP1189206B1 (en) * | 2000-09-19 | 2006-05-31 | Thomson Licensing | Voice control of electronic devices |
GB2375698A (en) * | 2001-02-07 | 2002-11-20 | Canon Kk | Audio signal processing apparatus |
JP3771812B2 (en) * | 2001-05-28 | 2006-04-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Robot and control method thereof |
US7149691B2 (en) * | 2001-07-27 | 2006-12-12 | Siemens Corporate Research, Inc. | System and method for remotely experiencing a virtual environment |
JP3910898B2 (en) * | 2002-09-17 | 2007-04-25 | 株式会社東芝 | Directivity setting device, directivity setting method, and directivity setting program |
NL1021485C2 (en) * | 2002-09-18 | 2004-03-22 | Stichting Tech Wetenschapp | Hearing glasses assembly. |
-
2003
- 2003-09-22 EP EP13151978.7A patent/EP2587481B1/en not_active Expired - Lifetime
- 2003-09-22 CN CNB038245434A patent/CN100508029C/en not_active Expired - Lifetime
- 2003-09-22 AU AU2003260926A patent/AU2003260926A1/en not_active Abandoned
- 2003-09-22 US US10/532,469 patent/US7885818B2/en active Active
- 2003-09-22 EP EP03809389.4A patent/EP1556857B1/en not_active Expired - Lifetime
- 2003-09-22 JP JP2004546229A patent/JP4837917B2/en not_active Expired - Fee Related
- 2003-09-22 WO PCT/IB2003/004203 patent/WO2004038697A1/en active Application Filing
- 2003-09-22 KR KR1020057006866A patent/KR101034524B1/en active IP Right Grant
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1289247A2 (en) * | 2001-08-31 | 2003-03-05 | Mitel Knowledge Corporation | System and method of indicating and controlling sound pickup direction and location in a teleconferencing system |
Non-Patent Citations (3)
Title |
---|
BASU S ET AL: "Wearable phased arrays for sound localization and enhancement", WEARABLE COMPUTERS, THE FOURTH INTERNATIONAL SYMPOSIUM ON ATLANTA, GA, USA 16-17 OCT. 2000, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 16 October 2000 (2000-10-16), pages 103 - 110, XP010525999, ISBN: 0-7695-0795-6 * |
LLEIDA E ET AL: "Robust continuous speech recognition system based on a microphone array", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, 12 May 1998 (1998-05-12), pages 241 - 244, XP010279154, ISBN: 0-7803-4428-6 * |
STURIM D E ET AL: "Tracking multiple talkers using microphone-array measurements", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1997. ICASSP-97., 1997 IEEE INTERNATIONAL CONFERENCE ON MUNICH, GERMANY 21-24 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 21 April 1997 (1997-04-21), pages 371 - 374, XP010226212, ISBN: 0-8186-7919-0 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8290178B2 (en) * | 2005-07-26 | 2012-10-16 | Honda Motor Co., Ltd. | Sound source characteristic determining device |
WO2007138503A1 (en) * | 2006-05-31 | 2007-12-06 | Philips Intellectual Property & Standards Gmbh | Method of driving a speech recognition system |
US8214219B2 (en) | 2006-09-15 | 2012-07-03 | Volkswagen Of America, Inc. | Speech communications system for a vehicle and method of operating a speech communications system for a vehicle |
WO2009140781A1 (en) * | 2008-05-20 | 2009-11-26 | Svox Ag | Method for classification and removal of undesired portions from a comment for speech recognition |
US8477964B2 (en) | 2010-02-15 | 2013-07-02 | Dietmar Ruwisch | Method and device for phase-sensitive processing of sound signals |
US8340321B2 (en) | 2010-02-15 | 2012-12-25 | Dietmar Ruwisch | Method and device for phase-sensitive processing of sound signals |
EP2362681A1 (en) * | 2010-02-15 | 2011-08-31 | Dietmar Ruwisch | Method and device for phase-dependent processing of sound signals |
WO2013106133A1 (en) * | 2012-01-12 | 2013-07-18 | Qualcomm Incorporated | Augmented reality with sound and geometric analysis |
US9563265B2 (en) | 2012-01-12 | 2017-02-07 | Qualcomm Incorporated | Augmented reality with sound and geometric analysis |
WO2017184149A1 (en) | 2016-04-21 | 2017-10-26 | Hewlett-Packard Development Company, L.P. | Electronic device microphone listening modes |
EP3434024A4 (en) * | 2016-04-21 | 2019-12-18 | Hewlett-Packard Development Company, L.P. | Electronic device microphone listening modes |
EP3343559A1 (en) * | 2016-12-29 | 2018-07-04 | Beijing Xiaoniao Tingting Technology Co., Ltd | De-reverberation control method and device of sound producing equipment |
US10410651B2 (en) | 2016-12-29 | 2019-09-10 | Beijing Xiaoniao Tingting Technology Co., LTD. | De-reverberation control method and device of sound producing equipment |
GB2586783B (en) * | 2019-08-29 | 2022-11-16 | Singh Digva Kavalijeet | Vehicle safety apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN1689073A (en) | 2005-10-26 |
KR101034524B1 (en) | 2011-05-12 |
KR20050055776A (en) | 2005-06-13 |
EP2587481B1 (en) | 2020-01-08 |
EP1556857B1 (en) | 2013-07-31 |
CN100508029C (en) | 2009-07-01 |
JP4837917B2 (en) | 2011-12-14 |
US7885818B2 (en) | 2011-02-08 |
JP2006504130A (en) | 2006-02-02 |
EP2587481A3 (en) | 2013-07-03 |
AU2003260926A1 (en) | 2004-05-13 |
EP1556857A1 (en) | 2005-07-27 |
US20060074686A1 (en) | 2006-04-06 |
EP2587481A2 (en) | 2013-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1556857B1 (en) | Controlling an apparatus based on speech | |
US10586534B1 (en) | Voice-controlled device control using acoustic echo cancellation statistics | |
US9734845B1 (en) | Mitigating effects of electronic audio sources in expression detection | |
US20030138118A1 (en) | Method for control of a unit comprising an acoustic output device | |
JP5442703B2 (en) | Method and apparatus for voice control of devices associated with consumer electronics | |
EP0867860A2 (en) | Method and device for voice-operated remote control with interference compensation of appliances | |
US20070015467A1 (en) | Communication system using short range radio communication headset | |
WO2003107327A1 (en) | Controlling an apparatus based on speech | |
JP2005084253A (en) | Sound processing apparatus, method, program and storage medium | |
JP5380777B2 (en) | Audio conferencing equipment | |
US20070198268A1 (en) | Method for controlling a speech dialog system and speech dialog system | |
WO2005003685A1 (en) | Method and device for controlling a speech dialog system | |
US11455980B2 (en) | Vehicle and controlling method of vehicle | |
JP2024001353A (en) | Headphone, acoustic signal processing method, and program | |
CN113314121A (en) | Silent speech recognition method, silent speech recognition device, silent speech recognition medium, earphone, and electronic apparatus | |
EP1316944B1 (en) | Sound signal recognition system and method, and dialog control system and method using it | |
JP2006251061A (en) | Voice dialog apparatus and voice dialog method | |
JP2016206646A (en) | Voice reproduction method, voice interactive device, and voice interactive program | |
EP1091347A2 (en) | Multi-stage speech recognition | |
JP3846500B2 (en) | Speech recognition dialogue apparatus and speech recognition dialogue processing method | |
Marquardt et al. | A natural acoustic front-end for Interactive TV in the EU-Project DICIT | |
Nakatoh et al. | Speech recognition interface system for digital TV control | |
JP2005148764A (en) | Method and device for speech recognition interaction | |
Fujimoto et al. | Hands-free speech recognition in real environments using microphone array and 2-levels MLLR adaptation as a front-end system for conversational TV |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003809389 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020057006866 Country of ref document: KR |
|
ENP | Entry into the national phase |
Ref document number: 2006074686 Country of ref document: US Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10532469 Country of ref document: US Ref document number: 20038245434 Country of ref document: CN Ref document number: 2004546229 Country of ref document: JP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020057006866 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2003809389 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 10532469 Country of ref document: US |