« PrécédentContinuer »
SIMULTANEOUS SUPPORT OF ISOLATED
AND CONNECTED PHRASE COMMAND
RECOGNITION IN AUTOMATIC SPEECH
FIELD OF THE INVENTION
The invention relates to speech recognition systems, and more specifically to a system and method for the speech recognition which simultaneous recognizes isolated and connected phrase commands.
BACKGROUND OF THE INVENTION
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words, numbers, or symbols by a computer. These recognized words then can be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. For example, speech recognition systems may be used in modern operating rooms to control various medical systems and devices. A surgeon or other user, by giving a simple voice command, may direct the functionality of a device controlled by the speech recognition system. For example, the surgeon may deliver a voice command to adjust a patient table or adjust the pressure of a pump.
To enable speech recognition in an operation room, medical devices and/or other equipment are connected with a component (e.g., a call system) through communication channels (e.g., an Ethernet connection, device bus, etc.). A speech recognition system is also connected providing the voice driven user interface and recognition software. When a voice command is issued, the command may be recognized and converted to a text string. If it is successfully identified as valid command corresponding to one of the connected devices, the system will send an appropriate signal so that the desired control action is taken.
In order to present what commands can be issued by the user, generally such systems adopt a tree-structured command menu. Each level of the command menu contains a collection of acceptable voice commands, each of which then leads to a sub-menu ofnew commands, if it is recognized. For example, U.S. Pat. No. 6,591,239 discloses a voice controlled surgical suite employing a tree-structured command menu. If a surgeon is attempting to adjust an operating table, the surgeon must first issue the command "table", pause to allow the command to be recognized and the table command sub-menu loaded, issue an applicable table command from the submenu, and so on.
Therefore, the known speech recognition systems generally do not save time in the operating room. The surgeon must issue multiple voice commands to effectuate a single action. Further, the known systems force surgeons to adopt an unnatural way of giving voice commands (e.g., isolated speech) that requires considerable practice before the surgeon can use the system efficiently.
It is therefore desired to provide a system and method for implementing speech commands which recognizes multiple speech commands delivered in a single utterance.
It is further desired to provide a system and method for implementing speech commands which recognizes both the
Accordingly, it is an object of the present invention to provide a speech recognition system which simultaneously supports isolated and connected or continuous speech.
It is a further object of the present invention to provide a speech recognition system which supports multiple modes of speech and further exceeds the speed and accuracy of known speech recognition systems.
These and other objectives are achieved by providing a system for operating one or more devices using speech input including a receiver for receiving a speech input, a controller in communication with said receiver, software executing on said controller for converting the speech input into computerreadable data, software executing on said controller for generating a table of active commands, the table including a portion of all valid commands of the system, software executing on said controller for identifying at least one active command represented by the data, and software executing on said controller for transmitting the active command to at least one device operable by the active command.
Further provided is a method of controlling a device using a speech input, including the steps of determining valid commands associated with each device of a system, generating a table of active commands, wherein the table includes a portion of the valid commands, receiving a speech input; converting the speech input into computer-readable data, identifying at least one active command represented by the data, and transmitting the active command to at least one device to which the active command pertains.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is schematic diagram of a system according to the present invention.
FIG. 2 is a schematic diagram of a controller of the system shown in FIG. 1.
FIG. 3 A is a schematic diagram of a language model of the system shown in FIG. 1.
FIG. 3B is a schematic diagram of an exemplary system command menu of the system shown in FIG. 1.
FIG. 3C is a schematic diagram of an exemplary active command menu of the system shown in FIG. 1.
FIG. 4 is a method of generating a table of active commands employable by the system shown in FIG. 1.
FIG. 5 is a method of processing a speech input employable by the system shown in FIG. 1.
DETAILED DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a system 100 for operating one or more devices using speech input according to the present invention. As will be described, the system 100 provides for the operation of one or more devices using isolated and continuous speech inputs or spoken commands. The system 100 may further accommodate either mode of speech, e.g., in succession or simultaneously, without having to reset or reconfigure. The system 100 may be useful for any number of applications including, for example, control of devices and/or processes in a medical operating room.
The system 100 includes a receiver 104 for receiving a speech input 102. The receiver 104 may be any instrument or device for receiving an incoming sound or sound wave and converting it into a digital waveform and/or an electric current
traditional isolated speech commands as well as non-traditional commands without reconfiguring.
SUMMARY OF THE INVENTION
or electric energy (e.g., audio signal 106). For example, the receiver 104 may be a microphone. The speech input 102 received by the receiver 104 may be any spoken utterance from a user such as a spoken word or phrase, or a collection of words or phrases. The speech input 102 preferably includes 5 words or phrases indicative of one or more commands which a user desires to be communicated or implemented by the system 100.
The system 100 further includes a controller 108. The controller 108 may be any device, system, or part thereof that l o controls at least one operation or receives and/or execute one or more software programs. The controller 108 may, for example, be one of a digital signal processor, a microcontroller, a microprocessor, or a computer programmable logic device. It should be noted that the functionality associated 15 with the controller 108 may be centralized or distributed, whether locally or remotely. The controller 108 is in communication with the receiver 104 and may receive information from the receiver 104, such as the audio signal 106. As will be described in detail below, the controller 108 may then trans- 20 mit or otherwise communicate a command 114 to either or both of a device 116 or monitor 118 (e.g., display).
The system 100 may further include a language model 110. The language model 110 may exist in a storage of the system 100, in temporary memory and/or in a storage remote to the 25 system. The language model 110 includes information used for recognizing commands indicated by a speech input 102. For example, the language model 110 may include each valid command or command sequence pertaining to devices operable by the system. The language model 110 may further 30 include a table of active commands, including a portion or subset of the system's valid commands (or valid command sequences), for use in command recognition. As such, the controller 108 includes software for generating the table of active commands. 35
The system 100 may also include command equivalents 112. The command equivalents 112 include common variations and/or "out-of-vocabulary" speech inputs known to be indicative of a particular valid command. For example, the command equivalents 112 may include shorted forms of valid 40 commands, descriptive forms, common mispronunciations, and, in some cases, foreign language equivalents of valid commands. The command equivalents 112 may exist in a storage of the system 100, in temporary memory, in a storage remote to the system, and/or in a portable storage (e.g., unique 45 to a particular user of the system). The command equivalents 112 may further operate in conjunction with an error-rejection algorithm or software executed by the controller 108. As one of ordinary skill in the art will understand, the command equivalents 112 may therefore be continuously updated or 50 improved by the system over time.
FIG. 2 shows an exploded view of the controller 108. The controller 108 may include a converter 202. The converter 202 may be hardware, software, or a combination thereof, for converting the audio signal 106 into the data 206. The data 55 206 may include computer-readable data or any form of data capable of being interpreted, analyzed and/or read by a machine or computer. For example, the data 206 may include a text representation (e.g., text string) of a literal translation of a user's speech input. The data 206 may be representative of 60 a complete speech input or utterance of a user. Alternatively, the data 206 may be representative of a portion of a speech input recognized by the system (e.g., in real time) to pertain to a particular active command.
The controller 108 further includes a recognizer 208. The 65 recognizer 208 may be hardware, software, or a combination thereof. The recognizer 208 may include software for identi
fying at least one active command (e.g., from a table of active commands) represented by the data 206. In some embodiments, the recognizer 208 of the system 100 may continuously receive data (e.g., as speech input is spoken) and attempt to identify active commands contained therein in real time. The recognizer 208 may also recognize portions of commands to determine whether to expect additional data. For example, an exemplary system according to the present invention may recognize any number of command sequences beginning with the word "instrument" but only one command or command sequence beginning with "settings" (as shown in FIG. 3B). Therefore, the recognizer 208 may provide a variable inter-command pause as necessary to prevent early cutoff of a command sequence. For example, the recognizer may wait for additional data when "instrument" is recognized but immediately transmit or execute a command when "settings" is recognized. In some embodiments or modes of operation, the recognizer 208 may receive data indicative of a complete speech utterance prior to attempting to identify active commands.
The recognizer 208 of the system 100 may also include software for parsing the data 206 into one or more potential commands. The recognizer 208 may then attempt to match each potential command to an active command. Therefore, also included in the recognizer 208 may be software for querying the table of active commands to identify active commands. As described above, the recognizer 208 may likewise query the command equivalents 112 and/or a table of command equivalents to identify equivalent commands if necessary. The controller 108 further includes software for transmitting a command 114, or an active command, to at least one device operable by the active command.
FIG. 3A shows an exploded view of the language model 110 shown in FIGS. 1 and 2. The language model 110 includes a system command menu 300. The system command menu 300 preferably includes each valid command (e.g., including valid command sequences) associated with the system 100 or a substantial portion thereof. Referring to FIG. 3B, the system command menu 300 may be organized or at least displayable (e.g., via the monitor 118) in a hierarchical and/or "tree-structured" menu format. As one or ordinary skill in the art will understand, the system command menu 300 is preferably displayed to provide a user of the system 100 with a visual representation of the commands that are available during use. The display may further highlight the user's previous or current command or menu level selection. In some instances, the system 100 (e.g., via the display) may also request confirmation from a user prior to transmitting and/or executing a recognized command.
As shown in FIG. 3B, the menu 300 may include a first set of valid commands 302 including system level and/or general commands applicable to the system itself. The menu 300 may further include any number of command sets (e.g., or nodes) associated with devices operable by the system 100. For example, the menu 300 may include a set 304 of valid command/command sequences pertaining to an instrument (e.g., medical instrument or device). The exemplary instrument is shown to have three first level commands (e.g., start, stop, adjust) and two second level commands (e.g., up, down) pertaining to "adjust." Further included are sets 306 and 308 pertaining to additional devices light and unit, respectively. It should be noted, however, that the system command menu 300 shown in FIG. 3B and the commands contained therein is only a simplified example of a menu 300. As one of ordinary skill in the art will understand, the menu 300 and/or language