US20020128843A1 - Voice controlled computer interface - Google Patents

Voice controlled computer interface Download PDF

Info

Publication number
US20020128843A1
US20020128843A1 US09/852,049 US85204901A US2002128843A1 US 20020128843 A1 US20020128843 A1 US 20020128843A1 US 85204901 A US85204901 A US 85204901A US 2002128843 A1 US2002128843 A1 US 2002128843A1
Authority
US
United States
Prior art keywords
command
operating system
mouse
menu
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/852,049
Inventor
Thomas Firman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MULTIMODAL TECHNOLOGIES LLC
Fonix Corp
Original Assignee
Lernout and Hauspie Speech Products NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=23461140&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20020128843(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Lernout and Hauspie Speech Products NV filed Critical Lernout and Hauspie Speech Products NV
Priority to US09/852,049 priority Critical patent/US20020128843A1/en
Publication of US20020128843A1 publication Critical patent/US20020128843A1/en
Assigned to LERNOUT & HAUSPIE SPEECH PRODUCTS N.V. reassignment LERNOUT & HAUSPIE SPEECH PRODUCTS N.V. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FONIX CORPORATION
Assigned to FONIX CORPORATION reassignment FONIX CORPORATION MERGER (SEE DOCUMENT FOR DETAILS). Assignors: FONIX/ASI CORPORATION
Assigned to ASI ACQUISITION CORPORATION reassignment ASI ACQUISITION CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARTICULATE SYSTEMS, INC.
Assigned to ARTICULATE SYSTEMS, INC. reassignment ARTICULATE SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FIRMAN, THOMAS R.
Assigned to SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H reassignment SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H OFFICIAL COMMITTEE OF UNSECURED CREDITORS OF LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.'S PLAN OF LIQUIDATION FOR LERNOUT & HAUSPIE SPEECH PRODUCTS N.V. UNDER CHAPTER 11 OF THE BANKRUPTCY CODE Assignors: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.
Assigned to ASI ACQUISITION CORPORATION reassignment ASI ACQUISITION CORPORATION MERGER (SEE DOCUMENT FOR DETAILS). Assignors: ARTICULATE SYSTEMS, INC.
Assigned to SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H reassignment SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H PLAN ADMINISTRATION AGREEMENT Assignors: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.
Assigned to FONIX/ASI CORPORATION reassignment FONIX/ASI CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ASI ACQUISITION CORPORATION
Assigned to SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H reassignment SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.
Assigned to MULTIMODAL TECHNOLOGIES, INC. reassignment MULTIMODAL TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H
Assigned to MULTIMODAL TECHNOLOGIES, LLC reassignment MULTIMODAL TECHNOLOGIES, LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: MULTIMODAL TECHNOLOGIES, INC.
Assigned to ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT reassignment ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: MMODAL IP LLC, MULTIMODAL TECHNOLOGIES, LLC, POIESIS INFOMATICS INC.
Assigned to MULTIMODAL TECHNOLOGIES, LLC reassignment MULTIMODAL TECHNOLOGIES, LLC RELEASE OF SECURITY INTEREST Assignors: ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT
Assigned to WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT reassignment WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT SECURITY AGREEMENT Assignors: MMODAL IP LLC
Assigned to CORTLAND CAPITAL MARKET SERVICES LLC reassignment CORTLAND CAPITAL MARKET SERVICES LLC PATENT SECURITY AGREEMENT Assignors: MULTIMODAL TECHNOLOGIES, LLC
Assigned to MULTIMODAL TECHNOLOGIES, LLC reassignment MULTIMODAL TECHNOLOGIES, LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CORTLAND CAPITAL MARKET SERVICES LLC, AS ADMINISTRATIVE AGENT
Assigned to MEDQUIST CM LLC, MEDQUIST OF DELAWARE, INC., MULTIMODAL TECHNOLOGIES, LLC, MMODAL IP LLC, MMODAL MQ INC. reassignment MEDQUIST CM LLC TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS Assignors: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/038Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This invention relates to voice controlled computer interfaces.
  • Voice recognition systems can convert human speech into computer information.
  • voice recognition systems have been used, for example, to control text-type user interfaces, e.g., the text-type interface of the disk operating system (DOS) of the IBM Personal Computer.
  • DOS disk operating system
  • Voice control has also been applied to graphical user interfaces, such as the one implemented by the Apple Macintosh computer, which includes icons, pop-up windows, and a mouse. These voice control systems use voiced commands to generate keyboard keystrokes.
  • the invention features enabling voiced utterances to be substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard.
  • voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, converting some voiced utterances into commands corresponding to actions to be taken by said operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system.
  • the invention features generating a table for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button.
  • the invention features enabling a user to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language and do not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing.
  • the invention enables a user to easily control the graphical interface of a computer. Any actions that the operating system can be commanded to take can be commanded by voiced utterances.
  • the commands may include commands that are normally entered through the keyboard as well as commands normally entered through a mouse or any other input device.
  • the user may switch back and forth between voiced utterances that correspond to commands for actions to be taken and voiced utterances that correspond to text strings to be used in an application program without giving any indication that the switch has been made.
  • Any application may be made susceptible to a voice interface by automatically parsing the application instruction sequence for menus and control buttons that control the application.
  • FIG. 1 is a functional block diagram of a Macintosh computer served by a Voice Navigator voice controlled interface system.
  • FIG. 2A is a functional block diagram of a Language Maker system for creating word lists for use with the Voice Navigator interface of FIG. 1.
  • FIG. 2B depicts the format of the voice files and word lists used with the Voice Navigator interface.
  • FIG. 3 is an organizational block diagram of the Voice Navigator interface system.
  • FIG. 4 is a flow diagram of the Language Maker main event loop.
  • FIG. 5 is a flow diagram of the Run Edit module.
  • FIG. 6 is a flow diagram of the Record Actions submodule.
  • FIG. 7 is a flow diagram of the Run Modal module.
  • FIG. 8 is a flow diagram of the In Button? routine.
  • FIG. 9 is a flow diagram of the Event Handler module.
  • FIG. 10 is a flow diagram of the Do My Menu module.
  • FIGS. 11A through 11I are flow diagrams of the Language Maker menu submodules.
  • FIG. 12 is a flow diagram of the Write Production module.
  • FIG. 13 is a flow diagram of the Write Terminal submodule.
  • FIG. 14 is a flow diagram of the Voice Control main driver loop.
  • FIG. 15 is a flow diagram of the Process Input module.
  • FIG. 16 is a flow diagram of the Recognize submodule.
  • FIG. 17 is a flow diagram of the Process Voice Control Commands routine.
  • FIG. 18 is a flow diagram of the ProcessQ module.
  • FIG. 19 is a flow diagram of the Get Next submodule.
  • FIG. 20 is a chart of the command handlers.
  • FIGS. 21A through 21G are flow diagrams of the command handlers.
  • FIG. 22 is a flow diagram of the Post Mouse routine.
  • FIG. 23 is a flow diagram of the Set Mouse Down routine.
  • FIGS. 24 and 25 illustrate the screen displays of Voice Control.
  • FIGS. 26 through 29 illustrate the screen displays of Language Maker.
  • FIG. 30 is a listing of a language file.
  • a Macintosh operating system 132 provides a graphical interactive user interface by processing events received from a mouse 134 and a keyboard 136 and by providing displays including icons, windows, and menus on a display device 138 .
  • Operating system 132 provides an environment in which application programs such as Macwrite 139 , desktop utilities such as Calculator 137 , and a wide variety of other programs can be run.
  • the operating system 132 also receives events from the Voice Navigator voice controlled computer interface 102 to enable the user to control the computer by voiced utterances.
  • the user speaks into a microphone 114 connected via a Voice Navigator box 112 to the SCSI (Small Computer Systems Interface) port of the computer 100 .
  • the Voice Navigator box 112 digitizes and processes analog audio signals received from a microphone 114 , and transmits processed digitized audio signals to the Macintosh SCSI port.
  • the Voice Navigator box includes an analog-to-digital converter (A/D) for digitizing the audio signal, a DSP (Digital Signal Processing) chip for compressing the resulting digital samples, and protocol interface hardware which configures the digital samples to obey the SCSI protocols.
  • A/D analog-to-digital converter
  • DSP Digital Signal Processing
  • Recognizer Software 120 (available from Dragon Systems, Newton, Mass.) runs under the Macintosh operating system, and is controlled by internal commands 123 received from Voice Control driver 128 (which also operates under the Macintosh operating systems.
  • Voice Control driver 128 which also operates under the Macintosh operating systems.
  • One possible algorithm for implementing Recognizer Software 120 is disclosed by Baker et al, in U.S. Pat. No. 4,783,803, incorporated by reference herein.
  • Recognizer Software 120 processes the incoming compressed, digitized audio, and compares each utterance of the user to prestored utterance macros. If the user utterance matches a prestored utterance macro, the utterance is recognized, and a command string 121 corresponding to the recognized utterance is delivered to a text buffer 126 .
  • Command strings 121 delivered from the Recognizer Software represent commands to be issued to the Macintosh operating system (e.g., menu selections to be made or text to be displayed), or internal
  • the Recognizer Software 120 compares the incoming samples of an utterance with macros in a voice file 122 . (The system requires the user to space apart his utterances briefly so that the system can recognize when each utterance ends.) The voice file macros are created by a “training” process, described below. If a match is found (as judged by the recognition algorithm of the Recognizer Software 120 ), a Voice Control command string from a word list 124 (which has been directly associated with voice file 122 ) is fetched and sent to text buffer 126 .
  • command strings in text buffer 126 are relayed to Voice Control driver 128 , which drives a Voice Control interpreter 130 in response to the strings.
  • a command string 121 may indicate an internal command 123 , such as a command to the Recognizer Software to “learn” new voice file macros, or to adjust the sensitivity of the recognition algorithm.
  • Voice Control interpreter 130 sends the appropriate internal command 123 to the Recognizer Software 120 .
  • the command string may represent an operating system manipulation, such as a mouse movement.
  • Voice Control interpreter 130 produces the appropriate action by interacting with the Macintosh operating system 132 .
  • Each application or desktop accessory is associated with a word list 124 and a corresponding voice file 122 ; these are loaded by the Recognition Software when the application or desktop accessory is opened.
  • the voice files are generated by the Recognizer Software 120 in its “learn” mode, under the control of internal commands from the Voice Control driver 128 .
  • the word lists are generated by the Language Maker desktop accessory 140 , which creates “languages” of utterance names and associated Voice Control command strings, and converts the languages into the word lists.
  • Voice Control command strings are strings such as “ESC”, “TEXT”, “@MENU(font,2)”, and belong to a Voice Control command set, the syntax of which will be described later and is set forth in Appendix A.
  • the Voice Control and Language Maker software includes about 30,000 lines of code, most of which is written in the C language, the remainder being written in assembly language. A listing of the Voice Control and Language Maker software is provided in microfiche as appendix C.
  • the Voice Control software will operate on a Macintosh Plus or later models, configured with a minimum of 1 Mbyte RAM (2 Mbyte for HyperCard and other large applications), a Hard Disk, and with Macintosh operating system version 6.01 or later.
  • Macintosh operating system 132 is “event driven”.
  • the operating system maintains an event queue (not shown); input devices such as the mouse 134 or the keyboard 136 “post” events to this queue to cause the operating system to, for example, create the appropriate text entry, or trigger a mouse movement.
  • the operating system 132 then, for example, passes messages to Macintosh applications (such as MacWrite 139 ) or to desktop accessories (such as Calculator 137 ) indicating events on the queues (if any).
  • Voice Control interpreter 130 likewise controls the operating system (and hence the applications and desktop accessories which are currently running) by posting events to the operating system queues.
  • the events posted by the Voice Control interpreter typically correspond to mouse activity or to keyboard keystrokes, or both, depending upon the voice commands.
  • the Voice Navigator system 102 provides an additional user interface.
  • the “voice” events may comprise text strings to be displayed or included with text being processed by the application program.
  • the Recognizer Software 120 may be trained to recognize an utterance of a particular user and to associate a corresponding text string with each utterance.
  • the Recognizer Software 120 displays to the user a menu of the utterance names (such as “file”, “page down”) which are to be recognized. These names, and the corresponding Voice Control command strings (indicating the appropriate actions) appear in a current word list 124 .
  • the user designates the utterance name of interest and then is prompted to speak the utterance corresponding to that name. For example, if the utterance name is “file”, the user might utter “FILE” or “PLEASE FILE”.
  • the digitized samples from the Voice Navigator box 112 corresponding to that utterance are then used by the Recognizer Software 120 to create a “macro” representing the utterance, which is stored in the voice file 122 and subsequently associated with the utterance name in the word list 124 .
  • the utterance is repeated more than once, in order to create a macro for the utterance that accommodates variation in a particular speaker's voice.
  • the meaning of the spoken utterance need not correspond to the utterance name, and the text of the utterance name need not correspond to the Voice Control command strings stored in the word list.
  • the user may wish a command string that causes the operating system to save a file to have the utterance name “save file”; the associated command string may be “@MENU(file, 2 )”; and the utterance that the user trains for this utterance name may be the spoken phrase “immortalize”.
  • the Recognizer Software and Voice Control cause that utterance, name, and command string to be properly associated in the voice file and word list 124 .
  • the word lists 124 used by the Voice Navigator are created by the Language Maker desk accessory 140 running under the operating system.
  • Each word list 124 is hierarchical, that is, some utterance names in the list link to sub-lists of other utterance names. Only the list of utterance names at a currently active level of the hierarchy can be recognized. (In the current embodiment, the number of utterance names at each level of the hierarchy can be as large as 1000.)
  • some utterances such as “file”, may summon the file menu on the screen, and link to a subsequent list of utterance names at a lower hierarchical level.
  • the file menu may list subsequent commands such as “save”, “open”, or “save as”, each associated with an utterance.
  • Language Maker enables the user to create a hierarchical language of utterance names and associated command strings, re-arrange the hierarchy of the language, and add new utterance names. Then, when the language is in the form that the user desires, the language is converted to a word list 124 . Because the hierarchy of the utterance names and command strings can be adjusted, when using the Voice Navigator system the user is not bound by the preset menu hierarchy of an application. For example, the user may want to create a “save” command at the top level of the utterance hierarchy that directly saves a file without first summoning the file menu. Also, the user may, for example, create a new utterance name “goodbye”, that saves a file and exits all at once.
  • Each language created by Language Maker 140 also contains the command strings which represent the actions (e.g. clicking the mouse at a location, typing text on the screen) to be associated with utterances and utterance names.
  • the user does not specify the command strings to describe the actions he wishes to be associated with an utterance and utterance name. In fact, the user does not need to know about, and never sees, the command strings stored in the Language Maker language or the resulting word list 124 .
  • a “record” mode to associate a series of actions with an utterance name, the user simply performs the desired actions (such as typing the text at the keyboard, or clicking the mouse at a menu). The actions performed are converted into the appropriate command strings, and when the user turns off the record mode, the command strings are associated with the selected utterance name.
  • the user can cause the creation of a language by entering utterance names by typing the names at the keyboard 142 , by using a “create default text” procedure 146 (to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level), or by using a “create default menus” procedure (to parse the executable code 144 for an application, and create a set of utterance names which equal the names of the commands in the menus of the application, in which case the initial hierarchy for the names is the same as the hierarchy of the menus in the application).
  • a “create default text” procedure 146 to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level
  • a “create default menus” procedure to parse the executable code 144 for an application, and create a set of utterance names which equal the names of
  • the names are typed at the keyboard or created by parsing a text file, the names are initially associated with the keystrokes which, when typed at the keyboard, produce the name. Therefore, the name “text” would be initially be associated with the keystrokes t-e-x-t. If the names are created by parsing the executable code 144 for an application, then the names are initially associated with the command strings which execute the corresponding menu commands for the application. These initial command strings can be changed by simply selecting the utterance name to be changed and putting Language Maker into record mode.
  • the output of Language Maker is a language file 148 .
  • This file contains the utterance names and the corresponding command strings.
  • the language file 148 is formatted for input to a VOCAL compiler 150 (available from Dragon Systems), which converts the language file into a word list 124 for use with the Recognition Software.
  • the syntax of language files is specified in the Voice Navigator Developer's Reference Manual, provided as Appendix D, and incorporated by reference.
  • a macro 147 of each learned utterance is stored in the voice file 122 .
  • a corresponding utterance name 149 and command string 151 are associated with one another and with the utterance and are stored in the word list 124 .
  • the word list 124 is created and modified by Language Maker 140
  • the voice file 122 is created and modified by the Recognition Software 120 in its learn mode, under the control of the Voice Control driver 128 .
  • the Voice Navigator hardware box 152 includes an analog-to-digital (A/D) converter 154 for converting the analog signal from the microphone into a digital signal for processing, a DSP section 156 for filtering and compacting the digitized signal, a SCSI manager 158 for communication with the Macintosh, and a microphone control section 160 for controlling the microphone.
  • A/D analog-to-digital
  • the Voice Navigator system also includes the Recognition Software voice drivers 120 which include routines for utterance detection 164 and command execution 166 .
  • the voice drivers For utterance detection 164 , the voice drivers periodically poll 168 the Voice Navigator hardware to determine if an utterance is being received by Voice Navigator box 152 , based on the amplitude of the signal received by the microphone.
  • the voice drivers create a speech buffer of encoded digital samples (tokens) to be used by the command execution drivers 166 .
  • the recognition drivers can learn new utterances by token-to-terminal conversion 174 .
  • the token is converted to a macro for the utterance, and stored as a terminal in a voice file 122 (FIG. 1).
  • Recognition and pattern matching 172 is also performed on command by the voice drivers.
  • a stored token of incoming digitized samples is compared with macros for the utterances in the current level of the recognition hierarchy. If a match is found, terminal to output conversion 176 is also performed, selecting the command string associated with the recognized utterance from the word list 124 (FIG. 1).
  • State management 178 such as changing of sensitivity controls, is also performed on command by the voice drivers.
  • the Voice Control driver 128 forms an interface 182 to the voice drivers 120 through control commands, an interface 184 to the Macintosh operating system 132 (FIG. 1) through event posting and operating system hooks, and an interface 186 to the user through display menus and prompts.
  • the interface 182 to the drivers allows Voice Control access to the Voice Driver command functions 166 .
  • This interface allows Voice Control to monitor 188 the status of the recognizer, for example to check for an utterance token in the utterance queue buffered 170 to the Macintosh. If there is an utterance, and if processor time is available, Voice Control issues command sdi_recognize 190 , calling the recognition and pattern match routine 172 in the voice drivers.
  • the interface to the drivers may issue command sdi_output 192 which controls the terminal to output conversion routine 176 in the voice drivers, converting a recognized utterance to an command string for use by Voice Control.
  • the command string may indicate mouse or keystroke events to be posted to the operating system, or may indicate commands to Voice Control itself (e.g. enabling or disabling Voice Control).
  • Voice Control is simply a Macintosh driver with internal parameters, such as sensitivity, and internal commands, such as commands to learn new utterances.
  • the actual processing which the user perceives as Voice Control may actually be performed by Voice Control, or by the Voice Drivers, depending upon the function. For example, the utterance learning procedures are performed by the Voice Drivers under the control of Voice Control.
  • the interface 184 to the Macintosh operating system allows Voice Control, where appropriate, to manipulate the operating system (e.g., by posting events or modifying event queues).
  • the macro interpreter 194 takes the command strings delivered from the voice drivers via the text buffer and interprets them to decide what actions to take. These commands may indicate text strings to be displayed on the display or mouse movements or menu selections to be executed.
  • Voice Control In the interpretive execution of the command strings, Voice Control must manipulate the Macintosh event queues. This task is performed by OS event management 196 . As discussed above, voice events may simulate events which are ordinarily associated with the keyboard or with the mouse. Keyboard events are handled by OS event management 196 directly. Mouse events are handled by mouse handler 198 . Mouse events require an additional level of handling because mouse events can require operating system manipulation outside of the standard event post routines which are accomplished by the OS event management 196 .
  • the main interface into-the-Macintosh operating system 132 is event based, and is used in the majority of the commands which are voice recognized and issued to the Macintosh. However, there are other “hooks” to the operating system state which are used to control parameters such as mouse placement and mouse motion. For example, as will be discussed later, pushing the mouse button down generates an event, however, keeping the mouse button pushed down and dragging the mouse across a menu requires the use of an operating system hook. For reference, the operating system hooks used by the voice Navigator are listed in Appendix B.
  • the operating system hooks are implemented by the trap filters 200 , which are filters used by Voice Control to force the Macintosh operating system to accept the controls implemented by OS event management 196 and mouse handler 198 .
  • the Macintosh operating system traps are held in Macintosh read only memories (ROMs), and implement high level commands for controlling the system. Examples of these high level commands are: drawing a string onto the screen, window zooming, moving windows to the front and back of the screen, and polling the status of the mouse button. In order for the Voice Control driver to properly interface with the Macintosh operating system it must control these operating system traps to generate the appropriate events.
  • ROMs Macintosh read only memories
  • Voice Control To generate menu events, for example, Voice Control “seizes” the menu select trap (i.e. takes control of the trap from the operating system). Once Voice Control has seized the trap, application requests for menu selections are forwarded to Voice Control. In this way Voice Control is able to modify, where necessary, the operating system output to the program, thereby controlling the system behavior as desired.
  • the interface 186 to the user provides user control of the Voice Control operations.
  • Prompts 202 display the name of each recognized utterance on the Macintosh screen so that the user may determine if the proper utterance has been recognized.
  • On-line training 204 allows the user to access, at any time while using the Macintosh, the utterance names in the word list 124 currently in use. The user may see which utterance names have been trained and may retrain the utterance names in an on-line manner (these functions require Voice Control to use the Voice Driver interface, as discussed above).
  • User options 206 provide selection of various Voice Control settings, such as the sensitivity and confidence level of the recognizer (i.e., the level of certainty required to decide that an utterance has been recognized). The optimal values for these parameters depend upon the microphone in use and the speaking voice of the user.
  • the interface 186 to the user does not operate via the Macintosh event interface. Rather, it is simply a recursive loop which controls the Recognition Software and the state of the Voice Control driver.
  • Language Maker 140 includes an application analyzer 210 and an event recorder 212 .
  • Application analyzer 210 parses the executable code of applications as discussed above, and produces suitable default utterance names and pre-programmed command strings.
  • the application analyzer 210 includes a menu extraction procedure 214 which searches executable code to find text strings corresponding to menus.
  • the application analyzer 210 also includes control identification procedures 216 for creating the command strings corresponding to each menu item in an application.
  • the event recorder 212 is a driver for recording user commands and creating command strings for utterances. This allows the user to easily create and edit command strings as discussed above.
  • Types of events which may be entered into the event recorder include: text entry 218 , mouse events 220 (such as clicking at a specified place on the screen), special events 222 which may be necessary to control a particular application, and voice events 224 which may be associated with operations of the Voice Control driver.
  • the Language Maker main event loop 230 is similar in structure to main event loops used by other desk accessories in the Macintosh operating system. If a desk accessory is selected from the “Apple” menu, an “open” event is transmitted to the accessory. In general, if the application in which it resides quits or if the user quits it using its menus, a “close” event is transmitted to the accessory. Otherwise, the accessory is transmitted control events. The message parameter of a control event indicates the kind of event. As seen in FIG. 4, the Language Maker main event loop 230 begins with an analysis 232 of the event type.
  • Language Maker tests 234 whether it is already opened. If Language Maker is already opened 236 , the current language (i.e. the list of utterance names from the current word list) is displayed and Language Maker returns 237 to the operating system. If Language Maker is not open 238 , it is initialized and then returns 239 to the operating system.
  • Language Maker prompts the user 240 to save the current language as a language file. If the user commands Language Maker to save the current language, the current language is converted by the Write Production module 242 to a language file, and then Language Maker exits 244 . If the current language is not saved, Language Maker exits directly.
  • the way in which Language Maker responds to the event depends upon the mode that Language Maker is in, because Language Maker has a utility for recording events (i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance), and must record events which do not involve the Language Maker window. However, when not recording, Language Maker should only respond to events in its window. Therefore, Language Maker may respond to events in one mode but not in another.
  • Events i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance
  • a control event 246 is forwarded to one of three branches 248 , 250 , 252 . All menu events are forwarded to the accMenu branch 252 . (Only menu events occurring in desk accessory menus will be forwarded to Language Maker.) All window events for the Language Maker window are forwarded to the accEvent branch 250 . All other events received by Language Maker, which correspond to events for desktop accessories or applications other than Language Maker, initiate activity in the accRun branch 248 , to enable recording of actions.
  • Language Maker seizes control of the operating system by setting control flags that cause the operating system to call Language Maker every tick of the Macintosh (i.e. every ⁇ fraction (1/60) ⁇ second).
  • Language Maker can record dialog events (i.e. events which involve modal dialog, where the user cannot do anything except respond to the actions in modal dialog boxes). To accomplish this, the user must be able to produce actions (i.e. mouse clicks, menu selections) in the current application so that the dialog boxes are prompted to the screen. Then the user can initialize recording and respond to the dialog boxes. When modal dialog boxes should be produced, events received by Language Maker are also forwarded to the operating system. Otherwise, events are not forwarded to the operating system. Language Maker's modal dialog recording is performed by the Run Modal module 260 .
  • the menu indicated by the desk accessory menu event is checked 266 . If the event occurred in the Language Maker menu, it is forwarded to the Do My Menu module 268 . Other events are ignored 270 .
  • the Run Edit module 262 performs a loop 272 , 274 . Each action is recorded by the Record Actions submodule 272 . If there are more actions in the event queue then the loop returns to the Record Actions submodule. If a cancel action appears 276 in the event queue then Run Edit returns 277 without updating the current language in memory. Otherwise, if the events are completed successfully, run edit updates the language in memory and turns off recording 278 and returns to the operating system 280 .
  • the Record Actions submodule 272 actions performed by the user in record mode are recorded.
  • the event is checked by record actions.
  • Each non-null event i.e. each action
  • Record Actions First, the type of action is checked 282 . If the action selects a menu 284 , then the selected menu is recorded. If the action is a mouse click 286 , the In Button? routine (see FIG. 8) checks if the click occurred inside of a button (a button is a menu selection area in the front window) or not. If so, the button is recorded 288 . If not, the location of the click is recorded 290 .
  • actions are recorded by special handlers. These actions include group actions 292 , mouse down actions 294 , mouse up actions 296 , zoom actions 298 , grow actions 300 , and next window actions 302 .
  • Some actions in menus can create pop-up menus with subchoices. These actions are handled by popping up the appropriate pop-up menu so that the user may select the desired subchoice. Move actions 304 , pause actions 306 , scroll actions 308 , text actions 310 and voice actions 312 pop up respective menus and Record Actions checks 314 for the menu selection made by the user (with a mouse drag). If no menu selection is made, then no action is recorded 316 . Otherwise, the choice is recorded 318 .
  • Other actions may launch applications.
  • the selected application is determined. If no application has been selected then no action is recorded 322 , otherwise the selected application is recorded 324 .
  • Run Modal procedure 260 allows recording of the modal dialogs of the Macintosh computer. During modal dialogs, the user cannot do anything except respond to the actions in the modal dialog box. In order to record responses to those actions, Run Modal has several phases, each phase corresponding to a step in the recording process.
  • Run Modal prompts the user with a Language Maker dialog box that gives the user the options “record” and “cancel” (see FIG. 25). The user may then interact with the current application until arriving at the dialog click that is to be recorded.
  • all calls to Run Modal are routed through Select Dialog 326 , which produces the initial Language Maker dialog box, and then returns 327 , ignoring further actions.
  • the In Button? procedure 286 determines whether a mouse click event occurred on a button.
  • In Button? gets the current window control list 342 (a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B) from the operating system and parses the list with a loop 344 - 350 . Each control is fetched 350 , and then the rectangle of the control is found 346 . Each rectangle is analyzed 348 to determine if the click occurred in the rectangle. If not, the next control is fetched 350 , and the loop recurses. If, 344 , the list is emptied, then the click did not occur on a button, and no is returned 352 .
  • the current window control list 342 a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B
  • Event Handler module 264 deals with standard Macintosh events in the Language Maker display window.
  • the Language Maker display window lists the utterance names in the current language.
  • Event Handler determines 358 whether the event is a mouse or keyboard event and subsequently performs the proper action on the Language Maker window.
  • Mouse events include: dragging the window 360 , growing the window 362 , scrolling the window 364 , clicking on the window 368 (which selects an utterance name), and dragging on the window 370 (which moves an utterance name from one location on the screen to another, potentially changing the utterance's position in the language hierarchy). Double-clicking 366 on an utterance name in the window selects that utterance name for action recording, and therefore starts the Run Edit module.
  • Keyboard events include the standard cut 372 , copy 374 , and paste 376 routines, as well as cursor movements down 380 , up 382 , right 384 , and left 386 . Pressing return at the keyboard 378 , as with a double click at the mouse, selects the current utterance name for action recording by Run Edit. After the appropriate command handler is called, Event Handler returns 388 . The modifications to the language hierarchy performed by the Event Handler module are reflected in hierarchical structure of the language file produced by the Write Production module during close and save operations.
  • the Do My Menu module 268 controls all of the menu choices supported by Language Maker. After summoning the appropriate submodule (discussed in detail in FIGS. 11A through 11I), Do My Menu returns 408 .
  • the New submodule 390 creates a new language.
  • the New submodule first checks 410 if Language Maker is open. If so, it prompts the user 412 to save the current language as a language file. If the user saves the current language, New calls Write Production module 414 to save the language. New then calls Create Global Words 416 and forms a new language 418 . Create Global Words 416 will automatically enter a few global (i.e. resident in all languages) utterance names and command strings into the new language.
  • utterance names and command strings allow the user to make Voice Control commands, and correspond to utterances such as “show me the active words” and “bring up the voice options” (the utterance macros for the corresponding voice file are trained by the user, or copied from an existing voice file, after the new language is saved).
  • the Open submodule 392 opens an existing language for modification.
  • the Open submodule 392 checks 420 if Language Maker is open. If so, it prompts the user 422 to save the current language, calling Write Production 424 if yes. Open then prompts the user to open the selected language 426 . If the user cancels, Open returns 428 . Otherwise, the language is loaded 430 and Open returns 432 .
  • the Save submodule 394 saves the current language in memory as a language file. Save prompts the user to save the current language 434 . If the user cancels, Save returns 436 , otherwise, Save calls Write Production 438 to convert the language into a state machine control file suitable for use by VOCAL (FIG. 2). Finally, Save returns 440 .
  • the New Action submodule 396 initializes the event recorders to begin recording a new sequence of actions.
  • New Action initializes the event recorder by displaying an action window to the user 442 , setting up a tool palette for the user to use, and initializing recording of actions. Then New Action returns 444 . After New Action is started, actions are not delivered to the operating system directly; rather they are filtered through Language Maker.
  • the Record Dialog submodule 398 records responses to dialog boxes through the use of the Run Modal module. Record Dialog 398 gives the user a way to record actions in modal dialog; otherwise the user would be prevented from performing the actions which bring up the dialog boxes. Record Dialog displays 446 the dialog action window (see FIG. 25) and turns recording on. Then Record Dialog returns 448 .
  • the Create Default Menus submodule 400 extracts default utterance names (and generates associated command strings) from the executable code for an application.
  • Create Default Menus 270 is ordinarily the first choice selected by a user when creating a language for a particular application.
  • This submodule looks at the executable code of an application and creates an utterance name for each menu command in the application, associating the utterance name with a command string that will select that menu command.
  • a first loop 452 , 456 , 458 , 460 locates the current (X th ) menu handle 456 , initializes menu parsing, checks if the current menu is fully parsed 458 , and reiterates by updating the current menu to the next menu.
  • a second loop 458 , 462 , 464 finds each menu name 462 , and checks 464 if the name is hierarchical (i.e. if the name points to further menus). If the names are not hierarchical, the loop recurses. Otherwise, the hierarchical menu is fetched 466 , and a third loop 470 , 472 starts. In the third loop, each item name in the hierarchical menu is fetched 472 , and the loop checks if all hierarchical item names have been fetched 470 .
  • the Create Default Text submodule 402 allows the user to convert a text file on the clipboard into a list of utterance names.
  • Create default text 402 creates an utterance name for each unique word in the clipboard 474 , and then returns 476 .
  • the utterance names are associated with the keyboard entries which will type out the name. For example, a business letter can be copied from the clipboard into default text. Utterances would then be associated with each of the common business terms in the letter. After ten or twelve business letters have been converted the majority of the business letter words would be stored as a set of utterances.
  • the Alphabetize Group submodule 404 allows the user to alphabetize the utterance names in a language.
  • the selected group of names (created by dragging the mouse over utterance names in the Language Maker window) is alphabetized 478 , and then Alphabetize Group returns 480 .
  • the Preferences submodule 406 allows the user to select standard graphic user interface preferences such as font style 482 and font size 484 .
  • the Preferences submenu 486 allows the user to state the metric by which mouse locations of recorded actions are stored. The coordinates for mouse actions can be relative to the global window coordinates or relative to the application window coordinates. In the case where application menu selections are performed by mouse clicks, the mouse clicks must always be in relative coordinates so that the window may be moved on the screen without affecting the function of the mouse click.
  • the Preferences submenu 486 also determines whether, when a mouse action is recorded, the mouse is left at the location of a click or returned to its original location after a click.
  • the user is prompted whether he wants to update the current preference settings for Language Maker. If so, the file is updated 490 and Preferences returns 492 . If not, Preferences returns directly to the operating system 494 without saving.
  • the Write Production module 242 is called when a file is saved.
  • Write Production saves the current language and converts it from an outline processor format such as that used in the Language Maker application to a hierarchical text format suitable for use with the state machine based Recognition Software.
  • Language files are associated with applications and new language files can be created or edited for each additional application to incorporate the various commands of the application into voice recognition.
  • the embodiment of the Write Production module depends upon the Recognition Software in use. In general, the Write Production module is written to convert the current language to suitable format for the Recognition Software in use.
  • the particular embodiment of Write Production shown in FIG. 12 applies to the syntax of the VOCAL compiler for the Dragon Systems Recognition Software.
  • Write Production checks 512 for sub-levels in the language. If no sub-levels exist, Write Production returns 514 . Otherwise, the sub-levels are processed by another call 516 to Write Production on the sub-level of the language. After the sub-level is processed, Write Production writes the string”)” and returns 518 .
  • the Write Terminal submodule 496 writes each utterance name and the associated command string to the language file.
  • Write Terminal checks 520 if it is at a terminal. If not, it returns 530 . Otherwise, Write Terminal writes 522 the string corresponding to the utterance name to the language file.
  • Write Terminal writes the command string (i.e. “output”) to the language file.
  • Write Terminal writes 528 the string “;” to the language file and returns 530 .
  • the Voice Control software serves as a gate between the operating system and the applications running on the operating system. This is accomplished by setting the Macintosh operating system's get_next_event procedure equal to a filter procedure created by Voice Control.
  • the get_next_event procedure runs when each next_event request is generated by the operating system or by applications. Ordinarily the get_next_event procedure is null, and next_event requests go directly to the operating system.
  • the filter procedure passes control to Voice Control on every request. This allows Voice Control to perform voice actions by intercepting mouse and keyboard events, and create new events corresponding to spoken commands.
  • the get_next_event filter procedure 540 is called before an event is generated by the operating system.
  • the event is first checked 54 Z to see if it is a null event. If so, the Process Input module 544 is called directly.
  • the Process Input routine 544 checks for new speech input and processes any that has been received.
  • the Voice Control driver proceeds through normal filter processing 546 (i.e., any filter processing caused by other applications) and returns 548 . If the next event is not a null event, then displays are hidden 550 . This allows Voice Control to hide any Voice Control displays (such as current language lists) which could have been generated by a previous non-null action.
  • any prompt windows have been produced by Voice Control, when a non-null event occurs, the prompt windows are hidden.
  • key down events are checked 552 . Because the recognizer is controlled (i.e. turned on and off) by certain special key down events, if the event is a key down event then Voice Control must do further processing. Otherwise, the Voice Control drive procedure moves directly to Process Input 544 . If a key down event has occurred 554 , where appropriate, software latches which control the recognizer are set. This allows activation of the Recognizer Software, the selection of Recognizer options, or the display of languages. Thereafter, the Voice Control driver moves to Process Input 544 .
  • the Process Input routine is the heart of the Voice Control driver. It manages all voice input for the Voice Navigator.
  • the Process Input module is called each time an event is processed by the operating system.
  • First 546 any latches which need to be set are processed, and the Macintosh waits for a number of delay ticks, if necessary. Delay ticks are included, for example, where a menu drag is being performed by Voice Control, to allow the menu to be drawn on the screen before starting the drag. Also, some applications require delay between mouse or keyboard events.
  • recognition is activated 548 the process input routine proceeds to do recognition 562 . If recognition is deactivated, Process Input returns 560 .
  • the recognition routine 562 prompts the recognition drivers to check for an utterance (i.e., sound that could be speech input). If there is recognized speech input 564 , Process Input checks the vertical blanking interrupt VBL handler 566 , and deactivates it where appropriate.
  • the vertical blanking interrupt cycle is a very low level cycle in the operating system. Every time the screen is refreshed, as the raster is moving from the bottom right to the top left of the screen, the vertical blanking interrupt time occurs. During this blanking time, very short and very high priority routines can be executed. The cycle is used by the Process Input routine to move the mouse continuously by very slowly incrementing of the mouse coordinates where appropriate. To accomplish this, mouse move events are installed onto the VBL queue. Therefore, where appropriate, the VBL handler must be deactivated to move the mouse.
  • the Recognize submodule 562 checks for encoded utterances queued by the Voice Navigator box, and then calls the recognition drivers to attempt to recognize any utterances. Recognize returns the number of commands in (i.e. the length of) the command string returned from the recognizer. If, 572 , no utterance is returned from the recognizer, then Recognize returns a length of zero ( 574 ), indicating no recognition has occurred. If an utterance is available, then Recognize calls sdi_recognize 576 , instructing the Recognizer Software to attempt recognition on the utterance. If, 578 , recognition is successful, then the name of the utterance is displayed 582 to the user.
  • any close call windows i.e. windows associated with close call choices, prompted by Voice Control in response to the Recognizer Software
  • any close call windows are cleared from the display. If recognition is unsuccessful, the Macintosh beeps 580 and zero length is returned 574 .
  • Recognize searches 584 for an output string associated with the utterance. If there is an output string, recognize checks if it is asleep 586 . If it is not asleep 590 , the output count is set to the length of the output string and, if the command is a control command 592 (such as “go to sleep” or “wake up”), it is handled by the Process voice Commands routine 594 .
  • a control command 592 such as “go to sleep” or “wake up”
  • the Process Voice Commands module deals with commands that control the recognizer.
  • the module may perform actions, or may flag actions to be performed by the Process States block 596 (FIG. 16). If the recognizer is put to sleep 600 or awakened 604 , the appropriate flags are set 602 , 606 , and zero is returned 626 , 628 for the length of the command string, indicating to Process States to take no further actions. Otherwise, if the command is scratch_that 608 (ignore last utterance), first_level 612 (go to top of language hierarchy, i.e.
  • the ProcessQ module 570 pulls speech input from the speech queue and processes it. If, 630 , the event queue is empty then ProcessQ may proceed, otherwise ProcessQ aborts 632 because the event queue may overflow if speech events are placed on the queue along with other events. If, 634 , the speech queue has any events then process queue checks to see if, 636 , delay ticks for menu drawing or other related activities have expired. If no events are on the speech queue the ProcessQ aborts 636 . If delay ticks have expired, then ProcessQ calls Get Next 642 and returns 644 . Otherwise, if delay ticks have not expired, ProcessQ aborts 640 .
  • the Get Next submodule 642 gets characters from the speech queue and processes them. If, 646 , there are no characters in the speech queue then the procedure simply returns 648 . If there are characters in the speech queue then Get Next checks 650 to see if the characters are command characters. If they are, then Get Next calls Check Command 660 . If not, then the characters are text, and Get Next sets the meta bits 652 where appropriate.
  • the meta bits are used as flags for conditioning keystrokes such as the condition key, the option key, or the command key. These keys condition the character pressed at the keyboard and create control characters. To create the proper operating system events, therefore, the meta bits must be set where necessary.
  • a key down event is posted 654 to the Macintosh event queue, simulating a keypush at the keyboard.
  • a key up is posted 656 to the event queue, simulating a key up. If, 658 , there is still room in the event queue, then further speech characters are obtained and processed 646 . If not, then the Get Next procedure returns 676 .
  • the command string input corresponds to a command rather than simple key strokes
  • the string is handled by the Check Command procedure 660 as illustrated in FIG. 19.
  • the next four characters from the speech queue (four characters is the length of all command strings, see Appendix A) are fetched 662 and compared 664 to a command table. If, 666 , the characters equal a voice command, then a command is recognized, and processing is continued by the Handle Command routine 668 . Otherwise, the characters are interpreted as text and processing returns to the meta bits step 652 .
  • each command is referenced into a table of command procedures by first computing 670 the command handler offset into the table and then referencing the table, and calling the appropriate command handler 672 . After calling the appropriate command handler, Get Next exits the Process Input module directly 674 (the structure of the software is such that a return from Handle Command would return to the meta bits step 652 , which would be incorrect).
  • FIG. 20 The command handlers available to the Handle Command routine are illustrated in FIG. 20. Each command handler is detailed by a flow diagram in FIGS. 21A through 21G. The syntax for the commands is detailed in Appendix A.
  • the Menu command will pull down a menu, for example, @MENU(apple,0) (where apple is the menu number for the apple menu) will pull down the apple menu.
  • Menu command will also select an item from the menu, for example, @MENU(apple,calculator) (where calculator is the item number for the calculator in the apple menu) will select the calculator from the apple menu.
  • Menu command initializes by running the Find Menu routine 678 which queues the menu id and the item number for the selected menu. (If the item number in the menu is 0 then Find Menu simply clicks on the menu bar.) After Find Menu returns, if 680 , there are no menus queued for posting, the Menu command simply returns 690 .
  • the Menu Select trap is set equal to the My Menu Select routine 692 .
  • the cursor coordinates are hidden 684 so that the mouse cannot be seen as it moves on the screen.
  • the mouse down occurs on the menu bar the Macintosh operating system generates a menu event for the application.
  • Each application receiving a menu event requests service from the operating system to find out what the menu event is. To do this the application issues a Menu Select trap.
  • the menu select trap then places the location of the mouse on the stack.
  • Menu Command sets 688 the wait ticks to 30, which gives the operating system time to draw the menu, and returns 690 .
  • the menuselect global state is reset 694 to clear any previously selected menus, and the desired menu id and the item number are moved to the Macintosh stack 696 , thus selecting the desired menu item.
  • the Find Menu routine 700 collects 702 the command parameters for the desired menu. Next, the menuname is compared 704 to the menu name list. If, 706 , there is no menu with the name “menuname”, Find Menu exits 708 . Otherwise, Find Menu compares 710 the itemname to the names of the items in the menu. If, 712 , the located item number is greater than 0, then Find Menu queues 718 the menu id and item number f or use by Menu command, and returns 720 . Otherwise, if the item number is 0 then Find Menu simply sets 714 the internal Voice Control flags “mousedown” and “global” flags to true. This indicates to Voice Control that the mouse location should be globally referenced, and that the mouse button should be held down. Then Find Menu calls 716 the Post Mouse routine, which references these flags to manipulate the operating system's mouse state accordingly.
  • the Control command 722 performs a button push within a menu, invoking actions such as the save command in the file menu of an application.
  • the control command gets the command parameters 724 from the control string, finds the front window 726 , gets the window command list 728 , and checks 730 if the control name exists in the control list. If the control name does exist in the control list then the control rectangle coordinates are calculated 732 , the Post Mouse routine 734 clicks the mouse in the proper coordinates, and the Control command returns 736 . If the control name is not found, the Control command returns directly.
  • the Keypad command 738 simulates numerical entries at the Macintosh keypad. Keypad finds the command parameters for the command string 740 , gets the keycode value 742 for the desired key, posts a key down event 744 to the Macintosh event queue, and returns 746 .
  • the Zoom command 748 zooms the front window. Zoom obtains the front window pointer 750 in order to reference the mouse to the front window, calculates the location of the zoom box 752 , uses Post Mouse to click in the zoom box 754 , and returns 756 .
  • the Local Mouse command 758 clicks the mouse at a locally referenced location.
  • Local Mouse obtains the command parameters for the desired mouse location 760 , uses Post Mouse to click at the desired coordinate 762 , and returns 764 .
  • the Global Mouse command 766 clicks the mouse at a globally referenced location.
  • Global Mouse obtains the command parameters for the desired mouse location 768 , sets the global flag to true 770 (to signal to Post Mouse that the coordinates are global), uses Post Mouse to click at the desired coordinate 772 , and returns 774 .
  • Double Click command double clicks the mouse at a locally referenced location. Double Click obtains the command parameters for the desired mouse location 778 , calls Post Mouse twice 780 , 782 (to click twice in the desired location), and returns 784 .
  • Mouse Down command 786 sets the mouse button down.
  • Mouse Down sets the mousedown flag to true 788 (to signal to Post Mouse that mouse button should be held down), uses Post Mouse to set the button down 790 , and returns 792 .
  • Mouse Up command 794 sets the mouse button up.
  • Mouse Up sets the mbState global (see Appendix B) to Mouse Button UP 796 (to signal to the operating system that mouse button should be set up), posts a mouse up event to the Macintosh event queue 798 (to signal to applications that the mouse button has gone up), and returns 800 .
  • the Screen Down command 802 scrolls the contents of the current window down.
  • Screen Down first looks 804 for the vertical scroll bat in the front window. If, 806 , the scroll bar is not found, Screen Down simply returns 814 . If the scroll bar is found, Screen Down calculates the coordinates of the down arrow 808 , sets the mousedown flag to true 810 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 812 , and returns 814 .
  • the Screen Up command 816 scrolls the contents of the current window up. Screen Up first looks 818 for the vertical scroll bar in the front window. If, 820 , the scroll bar is not found, Screen Up simply returns 828 . If the scroll bar is found, Screen Up calculates the coordinates of the up arrow 822 , sets the mousedown flag to true 824 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 826 , and returns 828 .
  • the Screen Left command 830 scrolls the contents of the current window left.
  • Screen Left first looks 832 for the horizontal scroll bar in the front window. If, 834 , the scroll bar is not found, Screen Left simply returns 842 . If the scroll bar is found, Screen Left calculates the coordinates of the left arrow 836 , sets the mousedown flag to true 838 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 840 , and returns 842 .
  • the Screen Right command 84 scrolls the contents of the current window right. Screen Right first looks 846 for the horizontal scroll bar in the front window. If, 848 , the scroll bar is not found, Screen Right simply returns 856 . If the scroll bar is found, Screen Right calculates the coordinates of the right arrow 850 , sets the mousedown flag to true 852 (indicating to Post Mouse that the mouse button should be set down), uses Post Mouse to set the mouse button down 854 , and returns 856 .
  • the Page Down command 858 moves the contents of the current window down a page.
  • Page Down first looks 860 for the vertical scroll bar in the front window. If, 862 , the scroll bar is not found, Page Down simply returns 868 . If the scroll bar is found, Page Down calculates the page down button coordinates 864 , uses Post Mouse to click the mouse button down 866 , and returns 868 .
  • Page Up command 870 moves the contents of the current window up a page. Page Up first looks 872 for the vertical scroll bar in the front window. If, 874 , the scroll bar is not found, Page Up simply returns 880 . If the scroll bar is found, Page Up calculates the page up button coordinates 876 , uses Post Mouse to click the mouse button down 878 , and returns 880 .
  • Page Left command 882 moves the contents of the current window left a page.
  • Page Left first looks 884 for the horizontal scroll bar in the front window. If, 886 , the scroll bar is not found, Page Left simply returns 892 . If the scroll bar is found, Page Left calculates the page left button coordinates 888 , uses Post Mouse to click the mouse button down 890 , and returns 892 .
  • Page Right command 894 moves the contents of the current window right a page.
  • Page Right first looks 896 for the horizontal scroll bar in the front window. If, 898 , the scroll bar is not found, Page Right simply returns 904 . If the scroll bar is found, Page Right calculates the page right button coordinates 900 , uses Post Mouse to click the mouse button down 902 , and returns 904 .
  • the Move command 906 moves the mouse from its current location (y,x), to a new location (y+ ⁇ y,x+ ⁇ x).
  • Move gets the command parameters 908 then Move sets the mouse speed to tablet 910 (this cancels the mouse acceleration, which otherwise would make mouse movements uncontrollable), adds the offset parameters to the current mouse location 912 , forces a new cursor position and resets the mouse speed 914 , and returns 916 .
  • the Move to Global Coordinate command 918 moves the cursor to the global coordinates given by the Voice Control command string.
  • Move to Global gets the command parameters 920 , then Move to Global checks 922 if there is a position parameter. If there is a position parameter, the screen position coordinates are fetched 924 . In either case, the global coordinates are calculated 926 , the mouse speed is set to tablet 928 , the mouse position is set to the new coordinates 930 , the cursor is forced to the new position 932 , and Move to Global returns 934 .
  • the Move to Local Coordinate command 936 moves the cursor to the local coordinates given by the Voice Control command string.
  • Move to Local gets the command parameters 938 , then Move to Local checks 940 if there is a position parameter. If there is a position parameter, the local position coordinates are fetched 942 . In either case, the global coordinates are calculated 944 , the mouse speed is set to tablet 946 , the mouse position is set to the new coordinates 948 , the cursor is forced to the new position 950 , and Move to Global returns 952 .
  • the Move Continuous command 954 moves the mouse continuously from its present location, moving ⁇ y, ⁇ x every refresh of the screen. This is accomplished by inserting 956 the VBL Move routine 960 in the Vertical Blanking Interrupt queue of the Macintosh and returning 958 . Once in the queue, the VBL Move routine 960 will be executed every screen refresh. The VBL Move routine simply adds the ⁇ y and ⁇ x values to the current cursor position 962 , resets the cursor 964 , and returns 966 .
  • the Option Key Down command 968 sets the option key down. This is done by setting the option key bit in the keyboard bit map to TRUE 970 , and returning 972 .
  • the Option Key Up command 974 sets the option key up. This is done by setting the option key bit in the keyboard bit map to FALSE 976 , and returning 978 .
  • the Shift Key Down command 980 sets the shift key down. This is done by setting the shift key bit in the keyboard bit map to TRUE 982 , and returning 984 .
  • the Shift Key Up command 986 sets the shift key up. This is done by setting the shift key bit in the keyboard bit map to FALSE 988 , and returning 990 .
  • the Command Key Down command 992 sets the command key down. This is done by setting the command key bit in the keyboard bit map to TRUE 994 , and returning 996 .
  • the Command Key Up command 998 sets the command key up. This is done by setting the command key bit in the keyboard bit map to FALSE 1000 , and returning 1002 .
  • the Control Key Down command 1004 sets the control key down. This is done by setting the control key bit in the keyboard bit map to TRUE 1006 , and returning 1008 .
  • the Control Key Up command 1010 sets the control key up. This is done by setting the control key bit in the keyboard bit map to FALSE 1012 , and returning 1014 .
  • the Next Window command 1016 moves the front window to the back. This is done by getting the front window 1018 and sending it to the back 1020 , and returning 1022 .
  • the Erase command 1024 erases numchars characters from the screen.
  • the number of characters typed by the most recent voice command is stored by Voice Control. Therefore, Erase will erase the characters from the most recent voice command. This is done by a loop which posts delete key keydown events 1026 and checks 1028 if the number posted equals numchars. When numchars deletes have been posted, Erase returns 1030 .
  • the Capitalize command 1032 capitalizes the next keystroke. This is done by setting the caps flag to TRUE 1034 , and returning 1036 .
  • the Launch command 1038 launches an application.
  • the application must be on the boot drive no more than one level deep. This is done by getting the name of the application 1040 (“appl_name”), searching for appl_name on the boot volume 1042 , and, if, 1044 , the application is found, setting the volume to the application folder 1048 , launching the application 1050 (no return is necessary because the new application will clear the Macintosh queue). If the application is not found, Launch simply returns 1046 .
  • the Post Mouse routine 1052 posts mouse down events to the Macintosh event queue and can set traps to monitor mouse activity and to keep the mouse down.
  • the actions of Post Mouse are determined by the Voice Control flags global and mousedown, which are set by command handlers before calling Post Mouse. After a Post Mouse, when an application does a get_next_event it will see a mouse down event in the event queue, leading to events such as clicks, mouse downs or double clicks.
  • Post Mouse saves the current mouse location 1054 so that the mouse may be returned to its initial location after the mouse events are produced.
  • the cursor is hidden 1056 to shield the user from seeing the mouse moving around the screen.
  • the mouse speed is set to tablet 1062 (to avoid acceleration problems), and the mouse down is posted to the Macintosh event queue 1064 . If, 1066 , the mousedown flag is TRUE (i.e. if the mouse button should be held down) then the set Mouse Down routine is called 1072 and Post Mouse returns 1070 . Otherwise, if the mouse down flag is FALSE, then a click is created by posting a mouse up event to the Macintosh event queue 1068 and returning 1070 .
  • the Set Mouse Down routine 1072 holds the mouse button down by replacing 1074 the Macintosh button trap with a Voice Control trap named My Button.
  • the My Button trap then recognizes further voice commands and creates mouse drags or clicks as appropriate.
  • Set Mouse Down checks 1076 if the Macintosh is a Macintosh Plus, in which case the Post Event trap must also be reset 1078 to the Voice Control My Post Event trap. (The Macintosh Plus will not simply check the mbState global flag to determine the mouse button state. Rather, the Post Event trap in a Macintosh Plus will poll the actual mouse button to determine its state, and will post mouse up events if the mouse button is up.
  • the Post Event trap is replaced with a My Post Event trap, which will not poll the status of the mouse button.
  • the mbstate flag is set to MouseDown 1080 (indicating that the mouse button is down) and Set Mouse Down returns 1082 .
  • the My Button trap 1084 replaces the Macintosh button trap, thereby seizing control of the button state from the operating system.
  • My Button Each time My Button is called, it checks 1086 the Macintosh mouse button state bit mbstate. If mbState has been set to UP, My Button moves to the End Button routine 1106 which sets mbstate to UP 1108 , removes any VBL routine which has been installed 1110 , resets the Button and Post Event traps to the original Macintosh traps 1112 , resets the mouse speed and couples the cursor to the mouse 1114 , shows the cursor 1102 , and returns 1104 .
  • My Button checks for the expiration of wait ticks (which allow the Macintosh time to draw menus on the screen) 1088 , and calls the recognize routine 1090 to recognize further speech commands. After further speech commands are recognized, My Button determines 1092 its next action based on the length of the command string. If the command string length is less than zero, then the next voice command was a Voice Control internal command, and the mouse button is released by calling End Button 1106 . If the command string length is greater than zero, then a command was recognized, and the command is queued onto the voice que 1094 , and the voice queue is checked for further commands 1096 .
  • My Button If nothing was recognized (command string length of zero), then My Button skips directly to checking the voice queue 1096 . If there is nothing in the voice queue, then My Button returns 1104 . However, if there is a command in the voice queue, then My Button checks 1098 if the command is a mouse movement command (which would cause a mouse drag). If it is not a mouse movement, then the mouse button is released by calling End Button 1106 . If the command is a mouse movement, then the command is executed 1100 (which drags the mouse), the cursor is displayed 1102 , and My Button returns.
  • FIG. 24 a screen display of a record actions session is shown.
  • the user is recording a local mouse click 1106 , and the click is being acknowledged in the action list 1108 and in the action window 1110 .
  • dialog boxes 1112 for recording a manual printer feed are displayed to the user, as well as the Voice Control Run Modal dialog box 1114 prompting the user to record the dialogs.
  • the user is preparing to record a click on the Manual Feed button 1116 .
  • the user has requested the current language, which is displayed by Voice Control in a pop-up display 1120 .
  • FIG. 30 a listing of the Write Production output file as displayed in FIG. 29 is provided.
  • the graphic user interface controlled by a voice recognition system could be other than that of the Apple Macintosh computer.
  • the recognizer could be other than that marketed by Dragon Systems.
  • Appendix A which sets forth the Voice Control command language syntax
  • Appendix B which lists some of the Macintosh OS globals used by the Voice Navigator system
  • Appendix C which is a fiche of the Voice Navigator executable code
  • Appendix D which is the Developer's Reference Manual for the voice Navigator system
  • Appendix E which is the Voice Navigator User's Manual, all incorporated by reference herein.
  • Appendix B Macintosh OS Globals
  • MTemp EQU $828 a low-level interrupt mouse location; used to move the mouse during VBL handling while executing a @MOVI command.
  • Mouse EQU $830 the processed mouse coordinate; used to move the mouse for all other @MOVX commands. [long]
  • KeyMap EQU $174 keyboard bit map, with one bit mapped to each key on the keyboard. Set the bit to TRUE to set the Meta keys (option, command, shift, control) down. [2 longs]
  • evtMax EQU $1E maximum number of events in the event queue. When this number is reached, stop Posting events.
  • EventQueue EQU $14A Event queue header, the location of the Macintosh event queue. [10 bytes]
  • WindowList EQU $9D6—Z-ordered linked list of windows. This pointer will lead to a chain of all existing windows for an application. Use to find a window queue for all local commands. [pointer]
  • portRect EQU $10 port's rectangle [rect]; window relative forms of the @MOVL command.
  • controlList EQU 140 used to find the controls associated with a window.
  • contrlTitle EQU 40 used to compare control Titles for @CTRL commands.
  • contrlRect EQU 8 used to calculate the click locations in a control.
  • nextwindow EQU 144 used to locate the next window for the @NEXT command.

Abstract

Voice utterances are substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; in the system, a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard. In another aspect, voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, by converting some voiced utterances into commands corresponding to actions to be taken by the operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system. In another aspect, a table is generated for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button. In another aspect, a user is enabled to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language which does not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing.

Description

    BACKGROUND OF THE INVENTION
  • This invention relates to voice controlled computer interfaces. [0001]
  • Voice recognition systems can convert human speech into computer information. Such voice recognition systems have been used, for example, to control text-type user interfaces, e.g., the text-type interface of the disk operating system (DOS) of the IBM Personal Computer. [0002]
  • Voice control has also been applied to graphical user interfaces, such as the one implemented by the Apple Macintosh computer, which includes icons, pop-up windows, and a mouse. These voice control systems use voiced commands to generate keyboard keystrokes. [0003]
  • SUMMARY OF THE INVENTION
  • In general, in one aspect, the invention features enabling voiced utterances to be substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard. [0004]
  • In general, in another aspect of the invention, voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, converting some voiced utterances into commands corresponding to actions to be taken by said operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system. [0005]
  • In general, in another aspect, the invention features generating a table for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button. [0006]
  • In general, in another aspect, the invention features enabling a user to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language and do not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing. [0007]
  • The invention enables a user to easily control the graphical interface of a computer. Any actions that the operating system can be commanded to take can be commanded by voiced utterances. The commands may include commands that are normally entered through the keyboard as well as commands normally entered through a mouse or any other input device. The user may switch back and forth between voiced utterances that correspond to commands for actions to be taken and voiced utterances that correspond to text strings to be used in an application program without giving any indication that the switch has been made. Any application may be made susceptible to a voice interface by automatically parsing the application instruction sequence for menus and control buttons that control the application. [0008]
  • Other advantages and features will become apparent from the following description of the preferred embodiment and from the claims.[0009]
  • DESCRIPTION OF THE PREFERRED EMBODIMENT
  • We first briefly describe the drawings. [0010]
  • FIG. 1 is a functional block diagram of a Macintosh computer served by a Voice Navigator voice controlled interface system. [0011]
  • FIG. 2A is a functional block diagram of a Language Maker system for creating word lists for use with the Voice Navigator interface of FIG. 1. [0012]
  • FIG. 2B depicts the format of the voice files and word lists used with the Voice Navigator interface. [0013]
  • FIG. 3 is an organizational block diagram of the Voice Navigator interface system. [0014]
  • FIG. 4 is a flow diagram of the Language Maker main event loop. [0015]
  • FIG. 5 is a flow diagram of the Run Edit module. [0016]
  • FIG. 6 is a flow diagram of the Record Actions submodule. [0017]
  • FIG. 7 is a flow diagram of the Run Modal module. [0018]
  • FIG. 8 is a flow diagram of the In Button? routine. [0019]
  • FIG. 9 is a flow diagram of the Event Handler module. [0020]
  • FIG. 10 is a flow diagram of the Do My Menu module. [0021]
  • FIGS. 11A through 11I are flow diagrams of the Language Maker menu submodules. [0022]
  • FIG. 12 is a flow diagram of the Write Production module. [0023]
  • FIG. 13 is a flow diagram of the Write Terminal submodule. [0024]
  • FIG. 14 is a flow diagram of the Voice Control main driver loop. [0025]
  • FIG. 15 is a flow diagram of the Process Input module. [0026]
  • FIG. 16 is a flow diagram of the Recognize submodule. [0027]
  • FIG. 17 is a flow diagram of the Process Voice Control Commands routine. [0028]
  • FIG. 18 is a flow diagram of the ProcessQ module. [0029]
  • FIG. 19 is a flow diagram of the Get Next submodule. [0030]
  • FIG. 20 is a chart of the command handlers. [0031]
  • FIGS. 21A through 21G are flow diagrams of the command handlers. [0032]
  • FIG. 22 is a flow diagram of the Post Mouse routine. [0033]
  • FIG. 23 is a flow diagram of the Set Mouse Down routine. [0034]
  • FIGS. 24 and 25 illustrate the screen displays of Voice Control. [0035]
  • FIGS. 26 through 29 illustrate the screen displays of Language Maker. [0036]
  • FIG. 30 is a listing of a language file.[0037]
  • SYSTEM OVERVIEW
  • Referring to FIG. 1, in an Apple Macintosh [0038] computer 100, a Macintosh operating system 132 provides a graphical interactive user interface by processing events received from a mouse 134 and a keyboard 136 and by providing displays including icons, windows, and menus on a display device 138. Operating system 132 provides an environment in which application programs such as Macwrite 139, desktop utilities such as Calculator 137, and a wide variety of other programs can be run.
  • The [0039] operating system 132 also receives events from the Voice Navigator voice controlled computer interface 102 to enable the user to control the computer by voiced utterances. For this purpose, the user speaks into a microphone 114 connected via a Voice Navigator box 112 to the SCSI (Small Computer Systems Interface) port of the computer 100. The Voice Navigator box 112 digitizes and processes analog audio signals received from a microphone 114, and transmits processed digitized audio signals to the Macintosh SCSI port. The Voice Navigator box includes an analog-to-digital converter (A/D) for digitizing the audio signal, a DSP (Digital Signal Processing) chip for compressing the resulting digital samples, and protocol interface hardware which configures the digital samples to obey the SCSI protocols.
  • Recognizer Software [0040] 120 (available from Dragon Systems, Newton, Mass.) runs under the Macintosh operating system, and is controlled by internal commands 123 received from Voice Control driver 128 (which also operates under the Macintosh operating systems. One possible algorithm for implementing Recognizer Software 120 is disclosed by Baker et al, in U.S. Pat. No. 4,783,803, incorporated by reference herein. Recognizer Software 120 processes the incoming compressed, digitized audio, and compares each utterance of the user to prestored utterance macros. If the user utterance matches a prestored utterance macro, the utterance is recognized, and a command string 121 corresponding to the recognized utterance is delivered to a text buffer 126. Command strings 121 delivered from the Recognizer Software represent commands to be issued to the Macintosh operating system (e.g., menu selections to be made or text to be displayed), or internal commands 123 to be issued by the Voice Control driver.
  • During recognition, the Recognizer Software [0041] 120 compares the incoming samples of an utterance with macros in a voice file 122. (The system requires the user to space apart his utterances briefly so that the system can recognize when each utterance ends.) The voice file macros are created by a “training” process, described below. If a match is found (as judged by the recognition algorithm of the Recognizer Software 120), a Voice Control command string from a word list 124 (which has been directly associated with voice file 122) is fetched and sent to text buffer 126.
  • The command strings in [0042] text buffer 126 are relayed to Voice Control driver 128, which drives a Voice Control interpreter 130 in response to the strings.
  • A command string [0043] 121 may indicate an internal command 123, such as a command to the Recognizer Software to “learn” new voice file macros, or to adjust the sensitivity of the recognition algorithm. In this case, Voice Control interpreter 130 sends the appropriate internal command 123 to the Recognizer Software 120. In other cases, the command string may represent an operating system manipulation, such as a mouse movement. In this case, Voice Control interpreter 130 produces the appropriate action by interacting with the Macintosh operating system 132.
  • Each application or desktop accessory is associated with a [0044] word list 124 and a corresponding voice file 122; these are loaded by the Recognition Software when the application or desktop accessory is opened.
  • The voice files are generated by the [0045] Recognizer Software 120 in its “learn” mode, under the control of internal commands from the Voice Control driver 128.
  • The word lists are generated by the Language [0046] Maker desktop accessory 140, which creates “languages” of utterance names and associated Voice Control command strings, and converts the languages into the word lists. Voice Control command strings are strings such as “ESC”, “TEXT”, “@MENU(font,2)”, and belong to a Voice Control command set, the syntax of which will be described later and is set forth in Appendix A.
  • The Voice Control and Language Maker software includes about 30,000 lines of code, most of which is written in the C language, the remainder being written in assembly language. A listing of the Voice Control and Language Maker software is provided in microfiche as appendix C. The Voice Control software will operate on a Macintosh Plus or later models, configured with a minimum of 1 Mbyte RAM (2 Mbyte for HyperCard and other large applications), a Hard Disk, and with Macintosh operating system version 6.01 or later. [0047]
  • In order to understand the interaction of the [0048] Voice Control interpreter 130 and the operating system, note that Macintosh operating system 132 is “event driven”. The operating system maintains an event queue (not shown); input devices such as the mouse 134 or the keyboard 136 “post” events to this queue to cause the operating system to, for example, create the appropriate text entry, or trigger a mouse movement. The operating system 132 then, for example, passes messages to Macintosh applications (such as MacWrite 139) or to desktop accessories (such as Calculator 137) indicating events on the queues (if any). In one mode of operation, Voice Control interpreter 130 likewise controls the operating system (and hence the applications and desktop accessories which are currently running) by posting events to the operating system queues. The events posted by the Voice Control interpreter typically correspond to mouse activity or to keyboard keystrokes, or both, depending upon the voice commands. Thus, the Voice Navigator system 102 provides an additional user interface. In some cases, the “voice” events may comprise text strings to be displayed or included with text being processed by the application program.
  • At any time during the operation of the Voice Navigator system, the [0049] Recognizer Software 120 may be trained to recognize an utterance of a particular user and to associate a corresponding text string with each utterance. In this mode, the Recognizer Software 120 displays to the user a menu of the utterance names (such as “file”, “page down”) which are to be recognized. These names, and the corresponding Voice Control command strings (indicating the appropriate actions) appear in a current word list 124. The user designates the utterance name of interest and then is prompted to speak the utterance corresponding to that name. For example, if the utterance name is “file”, the user might utter “FILE” or “PLEASE FILE”. The digitized samples from the Voice Navigator box 112 corresponding to that utterance are then used by the Recognizer Software 120 to create a “macro” representing the utterance, which is stored in the voice file 122 and subsequently associated with the utterance name in the word list 124. Ordinarily, the utterance is repeated more than once, in order to create a macro for the utterance that accommodates variation in a particular speaker's voice.
  • The meaning of the spoken utterance need not correspond to the utterance name, and the text of the utterance name need not correspond to the Voice Control command strings stored in the word list. For example, the user may wish a command string that causes the operating system to save a file to have the utterance name “save file”; the associated command string may be “@MENU(file,[0050] 2)”; and the utterance that the user trains for this utterance name may be the spoken phrase “immortalize”. The Recognizer Software and Voice Control cause that utterance, name, and command string to be properly associated in the voice file and word list 124.
  • Referring to FIG. 2A, the word lists [0051] 124 used by the Voice Navigator are created by the Language Maker desk accessory 140 running under the operating system. Each word list 124 is hierarchical, that is, some utterance names in the list link to sub-lists of other utterance names. Only the list of utterance names at a currently active level of the hierarchy can be recognized. (In the current embodiment, the number of utterance names at each level of the hierarchy can be as large as 1000.) In the operation of Voice Control, some utterances, such as “file”, may summon the file menu on the screen, and link to a subsequent list of utterance names at a lower hierarchical level. For example, the file menu may list subsequent commands such as “save”, “open”, or “save as”, each associated with an utterance.
  • Language Maker enables the user to create a hierarchical language of utterance names and associated command strings, re-arrange the hierarchy of the language, and add new utterance names. Then, when the language is in the form that the user desires, the language is converted to a [0052] word list 124. Because the hierarchy of the utterance names and command strings can be adjusted, when using the Voice Navigator system the user is not bound by the preset menu hierarchy of an application. For example, the user may want to create a “save” command at the top level of the utterance hierarchy that directly saves a file without first summoning the file menu. Also, the user may, for example, create a new utterance name “goodbye”, that saves a file and exits all at once.
  • Each language created by [0053] Language Maker 140 also contains the command strings which represent the actions (e.g. clicking the mouse at a location, typing text on the screen) to be associated with utterances and utterance names. In order for the training of the Voice Navigator system to be more intuitive, the user does not specify the command strings to describe the actions he wishes to be associated with an utterance and utterance name. In fact, the user does not need to know about, and never sees, the command strings stored in the Language Maker language or the resulting word list 124.
  • In a “record” mode, to associate a series of actions with an utterance name, the user simply performs the desired actions (such as typing the text at the keyboard, or clicking the mouse at a menu). The actions performed are converted into the appropriate command strings, and when the user turns off the record mode, the command strings are associated with the selected utterance name. [0054]
  • While using Language Maker, the user can cause the creation of a language by entering utterance names by typing the names at the [0055] keyboard 142, by using a “create default text” procedure 146 (to parse a text file on the clipboard, in which case one utterance name is created for each word in the text file, and the names all start at the same hierarchical level), or by using a “create default menus” procedure (to parse the executable code 144 for an application, and create a set of utterance names which equal the names of the commands in the menus of the application, in which case the initial hierarchy for the names is the same as the hierarchy of the menus in the application).
  • If the names are typed at the keyboard or created by parsing a text file, the names are initially associated with the keystrokes which, when typed at the keyboard, produce the name. Therefore, the name “text” would be initially be associated with the keystrokes t-e-x-t. If the names are created by parsing the [0056] executable code 144 for an application, then the names are initially associated with the command strings which execute the corresponding menu commands for the application. These initial command strings can be changed by simply selecting the utterance name to be changed and putting Language Maker into record mode.
  • The output of Language Maker is a language file [0057] 148. This file contains the utterance names and the corresponding command strings. The language file 148 is formatted for input to a VOCAL compiler 150 (available from Dragon Systems), which converts the language file into a word list 124 for use with the Recognition Software. The syntax of language files is specified in the Voice Navigator Developer's Reference Manual, provided as Appendix D, and incorporated by reference.
  • Referring to FIG. 2B, a [0058] macro 147 of each learned utterance is stored in the voice file 122. A corresponding utterance name 149 and command string 151 are associated with one another and with the utterance and are stored in the word list 124. The word list 124 is created and modified by Language Maker 140, and the voice file 122 is created and modified by the Recognition Software 120 in its learn mode, under the control of the Voice Control driver 128.
  • Referring to FIG. 3, in the [0059] Voice Navigator system 102, the Voice Navigator hardware box 152 includes an analog-to-digital (A/D) converter 154 for converting the analog signal from the microphone into a digital signal for processing, a DSP section 156 for filtering and compacting the digitized signal, a SCSI manager 158 for communication with the Macintosh, and a microphone control section 160 for controlling the microphone.
  • The Voice Navigator system also includes the Recognition [0060] Software voice drivers 120 which include routines for utterance detection 164 and command execution 166. For utterance detection 164, the voice drivers periodically poll 168 the Voice Navigator hardware to determine if an utterance is being received by Voice Navigator box 152, based on the amplitude of the signal received by the microphone. When an utterance is detected 170, the voice drivers create a speech buffer of encoded digital samples (tokens) to be used by the command execution drivers 166. On command 166 from the Voice Control driver 128, the recognition drivers can learn new utterances by token-to-terminal conversion 174. The token is converted to a macro for the utterance, and stored as a terminal in a voice file 122 (FIG. 1).
  • Recognition and pattern matching [0061] 172 is also performed on command by the voice drivers. During recognition, a stored token of incoming digitized samples is compared with macros for the utterances in the current level of the recognition hierarchy. If a match is found, terminal to output conversion 176 is also performed, selecting the command string associated with the recognized utterance from the word list 124 (FIG. 1). State management 178, such as changing of sensitivity controls, is also performed on command by the voice drivers.
  • The [0062] Voice Control driver 128 forms an interface 182 to the voice drivers 120 through control commands, an interface 184 to the Macintosh operating system 132 (FIG. 1) through event posting and operating system hooks, and an interface 186 to the user through display menus and prompts.
  • The [0063] interface 182 to the drivers allows Voice Control access to the Voice Driver command functions 166. This interface allows Voice Control to monitor 188 the status of the recognizer, for example to check for an utterance token in the utterance queue buffered 170 to the Macintosh. If there is an utterance, and if processor time is available, Voice Control issues command sdi_recognize 190, calling the recognition and pattern match routine 172 in the voice drivers. In addition, the interface to the drivers may issue command sdi_output 192 which controls the terminal to output conversion routine 176 in the voice drivers, converting a recognized utterance to an command string for use by Voice Control. The command string may indicate mouse or keystroke events to be posted to the operating system, or may indicate commands to Voice Control itself (e.g. enabling or disabling Voice Control).
  • From the user's perspective, Voice Control is simply a Macintosh driver with internal parameters, such as sensitivity, and internal commands, such as commands to learn new utterances. The actual processing which the user perceives as Voice Control may actually be performed by Voice Control, or by the Voice Drivers, depending upon the function. For example, the utterance learning procedures are performed by the Voice Drivers under the control of Voice Control. [0064]
  • The [0065] interface 184 to the Macintosh operating system allows Voice Control, where appropriate, to manipulate the operating system (e.g., by posting events or modifying event queues). The macro interpreter 194 takes the command strings delivered from the voice drivers via the text buffer and interprets them to decide what actions to take. These commands may indicate text strings to be displayed on the display or mouse movements or menu selections to be executed.
  • In the interpretive execution of the command strings, Voice Control must manipulate the Macintosh event queues. This task is performed by [0066] OS event management 196. As discussed above, voice events may simulate events which are ordinarily associated with the keyboard or with the mouse. Keyboard events are handled by OS event management 196 directly. Mouse events are handled by mouse handler 198. Mouse events require an additional level of handling because mouse events can require operating system manipulation outside of the standard event post routines which are accomplished by the OS event management 196.
  • The main interface into-the-[0067] Macintosh operating system 132 is event based, and is used in the majority of the commands which are voice recognized and issued to the Macintosh. However, there are other “hooks” to the operating system state which are used to control parameters such as mouse placement and mouse motion. For example, as will be discussed later, pushing the mouse button down generates an event, however, keeping the mouse button pushed down and dragging the mouse across a menu requires the use of an operating system hook. For reference, the operating system hooks used by the voice Navigator are listed in Appendix B.
  • The operating system hooks are implemented by the trap filters [0068] 200, which are filters used by Voice Control to force the Macintosh operating system to accept the controls implemented by OS event management 196 and mouse handler 198.
  • The Macintosh operating system traps are held in Macintosh read only memories (ROMs), and implement high level commands for controlling the system. Examples of these high level commands are: drawing a string onto the screen, window zooming, moving windows to the front and back of the screen, and polling the status of the mouse button. In order for the Voice Control driver to properly interface with the Macintosh operating system it must control these operating system traps to generate the appropriate events. [0069]
  • To generate menu events, for example, Voice Control “seizes” the menu select trap (i.e. takes control of the trap from the operating system). Once Voice Control has seized the trap, application requests for menu selections are forwarded to Voice Control. In this way Voice Control is able to modify, where necessary, the operating system output to the program, thereby controlling the system behavior as desired. [0070]
  • The [0071] interface 186 to the user provides user control of the Voice Control operations. Prompts 202 display the name of each recognized utterance on the Macintosh screen so that the user may determine if the proper utterance has been recognized. On-line training 204 allows the user to access, at any time while using the Macintosh, the utterance names in the word list 124 currently in use. The user may see which utterance names have been trained and may retrain the utterance names in an on-line manner (these functions require Voice Control to use the Voice Driver interface, as discussed above). User options 206 provide selection of various Voice Control settings, such as the sensitivity and confidence level of the recognizer (i.e., the level of certainty required to decide that an utterance has been recognized). The optimal values for these parameters depend upon the microphone in use and the speaking voice of the user.
  • The [0072] interface 186 to the user does not operate via the Macintosh event interface. Rather, it is simply a recursive loop which controls the Recognition Software and the state of the Voice Control driver.
  • [0073] Language Maker 140 includes an application analyzer 210 and an event recorder 212. Application analyzer 210 parses the executable code of applications as discussed above, and produces suitable default utterance names and pre-programmed command strings. The application analyzer 210 includes a menu extraction procedure 214 which searches executable code to find text strings corresponding to menus. The application analyzer 210 also includes control identification procedures 216 for creating the command strings corresponding to each menu item in an application.
  • The [0074] event recorder 212 is a driver for recording user commands and creating command strings for utterances. This allows the user to easily create and edit command strings as discussed above.
  • Types of events which may be entered into the event recorder include: [0075] text entry 218, mouse events 220 (such as clicking at a specified place on the screen), special events 222 which may be necessary to control a particular application, and voice events 224 which may be associated with operations of the Voice Control driver.
  • Language Maker
  • Referring to FIG. 4, the Language Maker [0076] main event loop 230 is similar in structure to main event loops used by other desk accessories in the Macintosh operating system. If a desk accessory is selected from the “Apple” menu, an “open” event is transmitted to the accessory. In general, if the application in which it resides quits or if the user quits it using its menus, a “close” event is transmitted to the accessory. Otherwise, the accessory is transmitted control events. The message parameter of a control event indicates the kind of event. As seen in FIG. 4, the Language Maker main event loop 230 begins with an analysis 232 of the event type.
  • If the event is an open event Language Maker tests [0077] 234 whether it is already opened. If Language Maker is already opened 236, the current language (i.e. the list of utterance names from the current word list) is displayed and Language Maker returns 237 to the operating system. If Language Maker is not open 238, it is initialized and then returns 239 to the operating system.
  • If the event is a close event, Language Maker prompts the [0078] user 240 to save the current language as a language file. If the user commands Language Maker to save the current language, the current language is converted by the Write Production module 242 to a language file, and then Language Maker exits 244. If the current language is not saved, Language Maker exits directly.
  • If the event is a [0079] control event 246, then the way in which Language Maker responds to the event depends upon the mode that Language Maker is in, because Language Maker has a utility for recording events (i.e. the mouse movements and clicks or text entry that the user wishes to assign to an utterance), and must record events which do not involve the Language Maker window. However, when not recording, Language Maker should only respond to events in its window. Therefore, Language Maker may respond to events in one mode but not in another.
  • A [0080] control event 246 is forwarded to one of three branches 248, 250, 252. All menu events are forwarded to the accMenu branch 252. (Only menu events occurring in desk accessory menus will be forwarded to Language Maker.) All window events for the Language Maker window are forwarded to the accEvent branch 250. All other events received by Language Maker, which correspond to events for desktop accessories or applications other than Language Maker, initiate activity in the accRun branch 248, to enable recording of actions.
  • In the [0081] accRun branch 248, events are recorded and associated with the selected utterance name. Before any events are recorded Language Maker checks 254 if Language Maker is recording; if not, Language Maker returns 256. If recording is on 258, then Language Maker checks the current recording mode.
  • While recording, Language Maker seizes control of the operating system by setting control flags that cause the operating system to call Language Maker every tick of the Macintosh (i.e. every {fraction (1/60)} second). [0082]
  • If the user has set Language Maker in dialog mode, Language Maker can record dialog events (i.e. events which involve modal dialog, where the user cannot do anything except respond to the actions in modal dialog boxes). To accomplish this, the user must be able to produce actions (i.e. mouse clicks, menu selections) in the current application so that the dialog boxes are prompted to the screen. Then the user can initialize recording and respond to the dialog boxes. When modal dialog boxes should be produced, events received by Language Maker are also forwarded to the operating system. Otherwise, events are not forwarded to the operating system. Language Maker's modal dialog recording is performed by the [0083] Run Modal module 260.
  • If modal dialog events are not being recorded, the user records with Language Maker in “action” mode, and Language Maker proceeds to the [0084] Run Edit module 262.
  • In the accEvent branch, all events are forwarded to the [0085] Event Handler module 264.
  • In the accMenu branch, the menu indicated by the desk accessory menu event is checked [0086] 266. If the event occurred in the Language Maker menu, it is forwarded to the Do My Menu module 268. Other events are ignored 270.
  • Referring to FIG. 5, the [0087] Run Edit module 262 performs a loop 272, 274. Each action is recorded by the Record Actions submodule 272. If there are more actions in the event queue then the loop returns to the Record Actions submodule. If a cancel action appears 276 in the event queue then Run Edit returns 277 without updating the current language in memory. Otherwise, if the events are completed successfully, run edit updates the language in memory and turns off recording 278 and returns to the operating system 280.
  • Referring to FIG. 6, in the [0088] Record Actions submodule 272, actions performed by the user in record mode are recorded. When the current application makes a request for the next event on the event queue, the event is checked by record actions. Each non-null event (i.e. each action) is processed by Record Actions. First, the type of action is checked 282. If the action selects a menu 284, then the selected menu is recorded. If the action is a mouse click 286, the In Button? routine (see FIG. 8) checks if the click occurred inside of a button (a button is a menu selection area in the front window) or not. If so, the button is recorded 288. If not, the location of the click is recorded 290.
  • Other actions are recorded by special handlers. These actions include [0089] group actions 292, mouse down actions 294, mouse up actions 296, zoom actions 298, grow actions 300, and next window actions 302.
  • Some actions in menus can create pop-up menus with subchoices. These actions are handled by popping up the appropriate pop-up menu so that the user may select the desired subchoice. Move [0090] actions 304, pause actions 306, scroll actions 308, text actions 310 and voice actions 312 pop up respective menus and Record Actions checks 314 for the menu selection made by the user (with a mouse drag). If no menu selection is made, then no action is recorded 316. Otherwise, the choice is recorded 318.
  • Other actions may launch applications. In this [0091] case 320 the selected application is determined. If no application has been selected then no action is recorded 322, otherwise the selected application is recorded 324.
  • Referring to FIG. 7, the [0092] Run Modal procedure 260 allows recording of the modal dialogs of the Macintosh computer. During modal dialogs, the user cannot do anything except respond to the actions in the modal dialog box. In order to record responses to those actions, Run Modal has several phases, each phase corresponding to a step in the recording process.
  • In the first phase, when the user selects dialog recording, Run Modal prompts the user with a Language Maker dialog box that gives the user the options “record” and “cancel” (see FIG. 25). The user may then interact with the current application until arriving at the dialog click that is to be recorded. During this phase, all calls to Run Modal are routed through [0093] Select Dialog 326, which produces the initial Language Maker dialog box, and then returns 327, ignoring further actions.
  • To enter the second, recording, phase, the user clicks on the “record” button in the Language Maker dialog box, indicating that the following dialog responses are to be recorded. In this phase, calls to Run Modal are routed to Record [0094] 328, which uses the In Button? routine 330 to check if a button in current application's dialog box has been selected. If the click occurred in a button, then the button is recorded 332, and Run Modal returns 333. Otherwise, the location of the click is recorded 334 and Run Modal returns 335.
  • Finally, when all clicks are recorded, the user clicks on the “cancel” button in the Language Maker dialog box, entering the third phase of the recording session. The click in the “cancel” button causes Run Modal to route to Cancel [0095] 336, which updates 338 the current language in memory, then returns 340.
  • Referring to FIG. 8, the In Button? [0096] procedure 286 determines whether a mouse click event occurred on a button. In Button? gets the current window control list 342 (a Macintosh global which contains the locations of all of the button rectangles in the current window, refer to Appendix B) from the operating system and parses the list with a loop 344-350. Each control is fetched 350, and then the rectangle of the control is found 346. Each rectangle is analyzed 348 to determine if the click occurred in the rectangle. If not, the next control is fetched 350, and the loop recurses. If, 344, the list is emptied, then the click did not occur on a button, and no is returned 352. However, if the click did occur in a rectangle, then, if, 351, the rectangle is named, the click occurred on a button, and yes is returned 354; if the rectangle is not named 356, the click did not occur on a button, and no is returned 356.
  • Referring to FIG. 9, the [0097] Event Handler module 264 deals with standard Macintosh events in the Language Maker display window. The Language Maker display window lists the utterance names in the current language. As shown in FIG. 9, Event Handler determines 358 whether the event is a mouse or keyboard event and subsequently performs the proper action on the Language Maker window.
  • Mouse events include: dragging the [0098] window 360, growing the window 362, scrolling the window 364, clicking on the window 368 (which selects an utterance name), and dragging on the window 370 (which moves an utterance name from one location on the screen to another, potentially changing the utterance's position in the language hierarchy). Double-clicking 366 on an utterance name in the window selects that utterance name for action recording, and therefore starts the Run Edit module.
  • Keyboard events include the [0099] standard cut 372, copy 374, and paste 376 routines, as well as cursor movements down 380, up 382, right 384, and left 386. Pressing return at the keyboard 378, as with a double click at the mouse, selects the current utterance name for action recording by Run Edit. After the appropriate command handler is called, Event Handler returns 388. The modifications to the language hierarchy performed by the Event Handler module are reflected in hierarchical structure of the language file produced by the Write Production module during close and save operations.
  • Referring to FIG. 10, the Do [0100] My Menu module 268 controls all of the menu choices supported by Language Maker. After summoning the appropriate submodule (discussed in detail in FIGS. 11A through 11I), Do My Menu returns 408.
  • Referring to FIG. 11A, the [0101] New submodule 390 creates a new language. The New submodule first checks 410 if Language Maker is open. If so, it prompts the user 412 to save the current language as a language file. If the user saves the current language, New calls Write Production module 414 to save the language. New then calls Create Global Words 416 and forms a new language 418. Create Global Words 416 will automatically enter a few global (i.e. resident in all languages) utterance names and command strings into the new language. These utterance names and command strings allow the user to make Voice Control commands, and correspond to utterances such as “show me the active words” and “bring up the voice options” (the utterance macros for the corresponding voice file are trained by the user, or copied from an existing voice file, after the new language is saved).
  • Referring to FIG. 11B, the [0102] Open submodule 392 opens an existing language for modification. The Open submodule 392 checks 420 if Language Maker is open. If so, it prompts the user 422 to save the current language, calling Write Production 424 if yes. Open then prompts the user to open the selected language 426. If the user cancels, Open returns 428. Otherwise, the language is loaded 430 and Open returns 432.
  • Referring to FIG. 11C, the Save submodule [0103] 394 saves the current language in memory as a language file. Save prompts the user to save the current language 434. If the user cancels, Save returns 436, otherwise, Save calls Write Production 438 to convert the language into a state machine control file suitable for use by VOCAL (FIG. 2). Finally, Save returns 440.
  • Referring to FIG. 11D, the New Action submodule [0104] 396 initializes the event recorders to begin recording a new sequence of actions. New Action initializes the event recorder by displaying an action window to the user 442, setting up a tool palette for the user to use, and initializing recording of actions. Then New Action returns 444. After New Action is started, actions are not delivered to the operating system directly; rather they are filtered through Language Maker.
  • Referring to FIG. 11E, the [0105] Record Dialog submodule 398 records responses to dialog boxes through the use of the Run Modal module. Record Dialog 398 gives the user a way to record actions in modal dialog; otherwise the user would be prevented from performing the actions which bring up the dialog boxes. Record Dialog displays 446 the dialog action window (see FIG. 25) and turns recording on. Then Record Dialog returns 448.
  • Referring to FIG. 11F, the Create [0106] Default Menus submodule 400 extracts default utterance names (and generates associated command strings) from the executable code for an application. Create Default Menus 270 is ordinarily the first choice selected by a user when creating a language for a particular application. This submodule looks at the executable code of an application and creates an utterance name for each menu command in the application, associating the utterance name with a command string that will select that menu command. When called, Create Default Menus gets 450 the menu bar from the executable code of the application, and initializes the current menu to be the first menu (X=1). Next, each menu is processed recursively. When all menus are processed, Create Default Menus returns 454. A first loop 452, 456, 458, 460 locates the current (Xth) menu handle 456, initializes menu parsing, checks if the current menu is fully parsed 458, and reiterates by updating the current menu to the next menu. A second loop 458, 462, 464 finds each menu name 462, and checks 464 if the name is hierarchical (i.e. if the name points to further menus). If the names are not hierarchical, the loop recurses. Otherwise, the hierarchical menu is fetched 466, and a third loop 470, 472 starts. In the third loop, each item name in the hierarchical menu is fetched 472, and the loop checks if all hierarchical item names have been fetched 470.
  • Referring to FIG. 11G, the Create [0107] Default Text submodule 402 allows the user to convert a text file on the clipboard into a list of utterance names. Create default text 402 creates an utterance name for each unique word in the clipboard 474, and then returns 476. The utterance names are associated with the keyboard entries which will type out the name. For example, a business letter can be copied from the clipboard into default text. Utterances would then be associated with each of the common business terms in the letter. After ten or twelve business letters have been converted the majority of the business letter words would be stored as a set of utterances.
  • Referring to FIG. 11H, the Alphabetize Group submodule [0108] 404 allows the user to alphabetize the utterance names in a language. The selected group of names (created by dragging the mouse over utterance names in the Language Maker window) is alphabetized 478, and then Alphabetize Group returns 480.
  • Referring to FIG. 11I, the Preferences submodule [0109] 406 allows the user to select standard graphic user interface preferences such as font style 482 and font size 484. The Preferences submenu 486 allows the user to state the metric by which mouse locations of recorded actions are stored. The coordinates for mouse actions can be relative to the global window coordinates or relative to the application window coordinates. In the case where application menu selections are performed by mouse clicks, the mouse clicks must always be in relative coordinates so that the window may be moved on the screen without affecting the function of the mouse click. The Preferences submenu 486 also determines whether, when a mouse action is recorded, the mouse is left at the location of a click or returned to its original location after a click. When the preference selections are done 488, the user is prompted whether he wants to update the current preference settings for Language Maker. If so, the file is updated 490 and Preferences returns 492. If not, Preferences returns directly to the operating system 494 without saving.
  • Referring to FIG. 12, the [0110] Write Production module 242 is called when a file is saved. Write Production saves the current language and converts it from an outline processor format such as that used in the Language Maker application to a hierarchical text format suitable for use with the state machine based Recognition Software. Language files are associated with applications and new language files can be created or edited for each additional application to incorporate the various commands of the application into voice recognition.
  • The embodiment of the Write Production module depends upon the Recognition Software in use. In general, the Write Production module is written to convert the current language to suitable format for the Recognition Software in use. The particular embodiment of Write Production shown in FIG. 12 applies to the syntax of the VOCAL compiler for the Dragon Systems Recognition Software. [0111]
  • Write Production first tests the [0112] language 494 to determine if there are any sub-levels. If not, the Write Terminal submodule 496 saves the top level language, and Write Production returns 498. If sub-levels exist in the language, then each sub-level is processed by a tail-recursive loop. If a root entry exists in the language 500 (i.e. if only one utterance name exists at the current level) then Write Production writes 502 the string “Root=(” to the file, and checks for sub-levels 512. Otherwise, if no root exists, Write Terminal is called 504 to save the names in the current level of the language. Next, the string “TERMINAL =” is written 506, and if, 508, the language level is terminal, the string “(“is written. Next, Write Production checks 512 for sub-levels in the language. If no sub-levels exist, Write Production returns 514. Otherwise, the sub-levels are processed by another call 516 to Write Production on the sub-level of the language. After the sub-level is processed, Write Production writes the string”)” and returns 518.
  • Referring to FIG. 13, the Write Terminal submodule [0113] 496 writes each utterance name and the associated command string to the language file. First, Write Terminal checks 520 if it is at a terminal. If not, it returns 530. Otherwise, Write Terminal writes 522 the string corresponding to the utterance name to the language file. Next, if, 524, there is an associated command string, Write Terminal writes the command string (i.e. “output”) to the language file. Finally, Write Terminal writes 528 the string “;” to the language file and returns 530.
  • Voice Control
  • The Voice Control software serves as a gate between the operating system and the applications running on the operating system. This is accomplished by setting the Macintosh operating system's get_next_event procedure equal to a filter procedure created by Voice Control. The get_next_event procedure runs when each next_event request is generated by the operating system or by applications. Ordinarily the get_next_event procedure is null, and next_event requests go directly to the operating system. The filter procedure passes control to Voice Control on every request. This allows Voice Control to perform voice actions by intercepting mouse and keyboard events, and create new events corresponding to spoken commands. [0114]
  • The Voice Control filter procedure is shown in FIG. 14. [0115]
  • After [0116] installation 538, the get_next_event filter procedure 540 is called before an event is generated by the operating system. The event is first checked 54Z to see if it is a null event. If so, the Process Input module 544 is called directly. The Process Input routine 544 checks for new speech input and processes any that has been received. After Process Input, the Voice Control driver proceeds through normal filter processing 546 (i.e., any filter processing caused by other applications) and returns 548. If the next event is not a null event, then displays are hidden 550. This allows Voice Control to hide any Voice Control displays (such as current language lists) which could have been generated by a previous non-null action. Therefore, if any prompt windows have been produced by Voice Control, when a non-null event occurs, the prompt windows are hidden. Next, key down events are checked 552. Because the recognizer is controlled (i.e. turned on and off) by certain special key down events, if the event is a key down event then Voice Control must do further processing. Otherwise, the Voice Control drive procedure moves directly to Process Input 544. If a key down event has occurred 554, where appropriate, software latches which control the recognizer are set. This allows activation of the Recognizer Software, the selection of Recognizer options, or the display of languages. Thereafter, the Voice Control driver moves to Process Input 544.
  • Referring to FIG. 15, the Process Input routine is the heart of the Voice Control driver. It manages all voice input for the Voice Navigator. The Process Input module is called each time an event is processed by the operating system. First [0117] 546, any latches which need to be set are processed, and the Macintosh waits for a number of delay ticks, if necessary. Delay ticks are included, for example, where a menu drag is being performed by Voice Control, to allow the menu to be drawn on the screen before starting the drag. Also, some applications require delay between mouse or keyboard events. Next, if recognition is activated 548 the process input routine proceeds to do recognition 562. If recognition is deactivated, Process Input returns 560.
  • The [0118] recognition routine 562 prompts the recognition drivers to check for an utterance (i.e., sound that could be speech input). If there is recognized speech input 564, Process Input checks the vertical blanking interrupt VBL handler 566, and deactivates it where appropriate.
  • The vertical blanking interrupt cycle is a very low level cycle in the operating system. Every time the screen is refreshed, as the raster is moving from the bottom right to the top left of the screen, the vertical blanking interrupt time occurs. During this blanking time, very short and very high priority routines can be executed. The cycle is used by the Process Input routine to move the mouse continuously by very slowly incrementing of the mouse coordinates where appropriate. To accomplish this, mouse move events are installed onto the VBL queue. Therefore, where appropriate, the VBL handler must be deactivated to move the mouse. [0119]
  • Other speech input is placed [0120] 568 on a speech queue, which stores speech related events for the processor until they can be handled by the ProcessQ routine. However, regardless of whether speech is recognized, ProcessQ 570 is always called by Process Input. Therefore, the speech events queued to ProcessQ are eventually executed, but not necessarily in the same Process Input cycle. After calling ProcessQ, Process Input returns 571.
  • Referring to FIG. 16, the Recognize [0121] submodule 562 checks for encoded utterances queued by the Voice Navigator box, and then calls the recognition drivers to attempt to recognize any utterances. Recognize returns the number of commands in (i.e. the length of) the command string returned from the recognizer. If, 572, no utterance is returned from the recognizer, then Recognize returns a length of zero (574), indicating no recognition has occurred. If an utterance is available, then Recognize calls sdi_recognize 576, instructing the Recognizer Software to attempt recognition on the utterance. If, 578, recognition is successful, then the name of the utterance is displayed 582 to the user. At the same time, any close call windows (i.e. windows associated with close call choices, prompted by Voice Control in response to the Recognizer Software) are cleared from the display. If recognition is unsuccessful, the Macintosh beeps 580 and zero length is returned 574.
  • If recognition is successful, Recognize [0122] searches 584 for an output string associated with the utterance. If there is an output string, recognize checks if it is asleep 586. If it is not asleep 590, the output count is set to the length of the output string and, if the command is a control command 592 (such as “go to sleep” or “wake up”), it is handled by the Process voice Commands routine 594.
  • If there is no output string for the recognized utterance, or if the recognizer is asleep, then the output of Recognize is zero ([0123] 588). After the output count is determined 596, the state of the recognizer is processed 596. At this time, if the Voice Control state flags have been modified by any of the Recognize subroutines, the appropriate actions are initialized. Finally, Recognize returns 598.
  • Referring to FIG. 17, the Process Voice Commands module deals with commands that control the recognizer. The module may perform actions, or may flag actions to be performed by the Process States block [0124] 596 (FIG. 16). If the recognizer is put to sleep 600 or awakened 604, the appropriate flags are set 602, 606, and zero is returned 626, 628 for the length of the command string, indicating to Process States to take no further actions. Otherwise, if the command is scratch_that 608 (ignore last utterance), first_level 612 (go to top of language hierarchy, i.e. set the Voice Control state to the root state for the language), word_list 616 (show the current language), or voice options 620, the appropriate flags are set and 610, 614, 618, 622, and a string length of −1 is returned 624, 628, indicating that the recognizer state should be changed by Process States 596 (FIG. 16).
  • Referring to FIG. 18 the [0125] ProcessQ module 570 pulls speech input from the speech queue and processes it. If, 630, the event queue is empty then ProcessQ may proceed, otherwise ProcessQ aborts 632 because the event queue may overflow if speech events are placed on the queue along with other events. If, 634, the speech queue has any events then process queue checks to see if, 636, delay ticks for menu drawing or other related activities have expired. If no events are on the speech queue the ProcessQ aborts 636. If delay ticks have expired, then ProcessQ calls Get Next 642 and returns 644. Otherwise, if delay ticks have not expired, ProcessQ aborts 640.
  • Referring to FIG. 19, the Get Next submodule [0126] 642 gets characters from the speech queue and processes them. If, 646, there are no characters in the speech queue then the procedure simply returns 648. If there are characters in the speech queue then Get Next checks 650 to see if the characters are command characters. If they are, then Get Next calls Check Command 660. If not, then the characters are text, and Get Next sets the meta bits 652 where appropriate.
  • When the Macintosh posts an event, the meta bits (see Appendix B) are used as flags for conditioning keystrokes such as the condition key, the option key, or the command key. These keys condition the character pressed at the keyboard and create control characters. To create the proper operating system events, therefore, the meta bits must be set where necessary. Once the meta bits are set [0127] 652, a key down event is posted 654 to the Macintosh event queue, simulating a keypush at the keyboard. Following this, a key up is posted 656 to the event queue, simulating a key up. If, 658, there is still room in the event queue, then further speech characters are obtained and processed 646. If not, then the Get Next procedure returns 676.
  • If the command string input corresponds to a command rather than simple key strokes, the string is handled by the [0128] Check Command procedure 660 as illustrated in FIG. 19. In the Check Command procedure 660 the next four characters from the speech queue (four characters is the length of all command strings, see Appendix A) are fetched 662 and compared 664 to a command table. If, 666, the characters equal a voice command, then a command is recognized, and processing is continued by the Handle Command routine 668. Otherwise, the characters are interpreted as text and processing returns to the meta bits step 652.
  • In the [0129] Handle Command procedure 668 each command is referenced into a table of command procedures by first computing 670 the command handler offset into the table and then referencing the table, and calling the appropriate command handler 672. After calling the appropriate command handler, Get Next exits the Process Input module directly 674 (the structure of the software is such that a return from Handle Command would return to the meta bits step 652, which would be incorrect).
  • The command handlers available to the Handle Command routine are illustrated in FIG. 20. Each command handler is detailed by a flow diagram in FIGS. 21A through 21G. The syntax for the commands is detailed in Appendix A. [0130]
  • Referring to FIG. 21A, the Menu command will pull down a menu, for example, @MENU(apple,0) (where apple is the menu number for the apple menu) will pull down the apple menu. Menu command will also select an item from the menu, for example, @MENU(apple,calculator) (where calculator is the item number for the calculator in the apple menu) will select the calculator from the apple menu. Menu command initializes by running the Find Menu routine [0131] 678 which queues the menu id and the item number for the selected menu. (If the item number in the menu is 0 then Find Menu simply clicks on the menu bar.) After Find Menu returns, if 680, there are no menus queued for posting, the Menu command simply returns 690. However, if menus are queued for posting, Menu command intercepts 682 one of the Macintosh internal traps called Menu Select. The Menu Select trap is set equal to the My Menu Select routine 692. Next the cursor coordinates are hidden 684 so that the mouse cannot be seen as it moves on the screen. Next, Menu command posts 686 a mouse down (i.e. pushes the mouse button down) on the menu bar. When the mouse down occurs on the menu bar the Macintosh operating system generates a menu event for the application. Each application receiving a menu event requests service from the operating system to find out what the menu event is. To do this the application issues a Menu Select trap. The menu select trap then places the location of the mouse on the stack. However, when the application issues a menu select trap in this case, it is serviced by the My Menu Select routine 692 instead, thereby allowing Menu command to insert the desired menu coordinates in the place of the real coordinates. After posting a mouse down in the appropriate menu bar, Menu Command sets 688 the wait ticks to 30, which gives the operating system time to draw the menu, and returns 690.
  • In the My [0132] Menu Select trap 692 the menuselect global state is reset 694 to clear any previously selected menus, and the desired menu id and the item number are moved to the Macintosh stack 696, thus selecting the desired menu item.
  • The [0133] Find Menu routine 700 collects 702 the command parameters for the desired menu. Next, the menuname is compared 704 to the menu name list. If, 706, there is no menu with the name “menuname”, Find Menu exits 708. Otherwise, Find Menu compares 710 the itemname to the names of the items in the menu. If, 712, the located item number is greater than 0, then Find Menu queues 718 the menu id and item number f or use by Menu command, and returns 720. Otherwise, if the item number is 0 then Find Menu simply sets 714 the internal Voice Control flags “mousedown” and “global” flags to true. This indicates to Voice Control that the mouse location should be globally referenced, and that the mouse button should be held down. Then Find Menu calls 716 the Post Mouse routine, which references these flags to manipulate the operating system's mouse state accordingly.
  • Referring to FIG. 21B, the [0134] Control command 722 performs a button push within a menu, invoking actions such as the save command in the file menu of an application. To do this, the control command gets the command parameters 724 from the control string, finds the front window 726, gets the window command list 728, and checks 730 if the control name exists in the control list. If the control name does exist in the control list then the control rectangle coordinates are calculated 732, the Post Mouse routine 734 clicks the mouse in the proper coordinates, and the Control command returns 736. If the control name is not found, the Control command returns directly.
  • The [0135] Keypad command 738 simulates numerical entries at the Macintosh keypad. Keypad finds the command parameters for the command string 740, gets the keycode value 742 for the desired key, posts a key down event 744 to the Macintosh event queue, and returns 746.
  • The [0136] Zoom command 748 zooms the front window. Zoom obtains the front window pointer 750 in order to reference the mouse to the front window, calculates the location of the zoom box 752, uses Post Mouse to click in the zoom box 754, and returns 756.
  • The Local Mouse command [0137] 758 clicks the mouse at a locally referenced location. Local Mouse obtains the command parameters for the desired mouse location 760, uses Post Mouse to click at the desired coordinate 762, and returns 764.
  • The [0138] Global Mouse command 766 clicks the mouse at a globally referenced location. Global Mouse obtains the command parameters for the desired mouse location 768, sets the global flag to true 770 (to signal to Post Mouse that the coordinates are global), uses Post Mouse to click at the desired coordinate 772, and returns 774.
  • The Double Click command double clicks the mouse at a locally referenced location. Double Click obtains the command parameters for the desired [0139] mouse location 778, calls Post Mouse twice 780, 782 (to click twice in the desired location), and returns 784.
  • The Mouse Down [0140] command 786 sets the mouse button down. Mouse Down sets the mousedown flag to true 788 (to signal to Post Mouse that mouse button should be held down), uses Post Mouse to set the button down 790, and returns 792.
  • The Mouse Up [0141] command 794 sets the mouse button up. Mouse Up sets the mbState global (see Appendix B) to Mouse Button UP 796 (to signal to the operating system that mouse button should be set up), posts a mouse up event to the Macintosh event queue 798 (to signal to applications that the mouse button has gone up), and returns 800.
  • Referring to FIG. 21D, the Screen Down command [0142] 802 scrolls the contents of the current window down. Screen Down first looks 804 for the vertical scroll bat in the front window. If, 806, the scroll bar is not found, Screen Down simply returns 814. If the scroll bar is found, Screen Down calculates the coordinates of the down arrow 808, sets the mousedown flag to true 810 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 812, and returns 814.
  • The Screen Up [0143] command 816 scrolls the contents of the current window up. Screen Up first looks 818 for the vertical scroll bar in the front window. If, 820, the scroll bar is not found, Screen Up simply returns 828. If the scroll bar is found, Screen Up calculates the coordinates of the up arrow 822, sets the mousedown flag to true 824 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 826, and returns 828.
  • The Screen Left command [0144] 830 scrolls the contents of the current window left. Screen Left first looks 832 for the horizontal scroll bar in the front window. If, 834, the scroll bar is not found, Screen Left simply returns 842. If the scroll bar is found, Screen Left calculates the coordinates of the left arrow 836, sets the mousedown flag to true 838 (indicating to Post Mouse that the mouse button should be held down), uses Post Mouse to set the mouse button down 840, and returns 842.
  • The Screen Right command [0145] 84 scrolls the contents of the current window right. Screen Right first looks 846 for the horizontal scroll bar in the front window. If, 848, the scroll bar is not found, Screen Right simply returns 856. If the scroll bar is found, Screen Right calculates the coordinates of the right arrow 850, sets the mousedown flag to true 852 (indicating to Post Mouse that the mouse button should be set down), uses Post Mouse to set the mouse button down 854, and returns 856.
  • Referring to FIG. 21E, the [0146] Page Down command 858 moves the contents of the current window down a page. Page Down first looks 860 for the vertical scroll bar in the front window. If, 862, the scroll bar is not found, Page Down simply returns 868. If the scroll bar is found, Page Down calculates the page down button coordinates 864, uses Post Mouse to click the mouse button down 866, and returns 868.
  • The Page Up command [0147] 870 moves the contents of the current window up a page. Page Up first looks 872 for the vertical scroll bar in the front window. If, 874, the scroll bar is not found, Page Up simply returns 880. If the scroll bar is found, Page Up calculates the page up button coordinates 876, uses Post Mouse to click the mouse button down 878, and returns 880.
  • The Page Left command [0148] 882 moves the contents of the current window left a page. Page Left first looks 884 for the horizontal scroll bar in the front window. If, 886, the scroll bar is not found, Page Left simply returns 892. If the scroll bar is found, Page Left calculates the page left button coordinates 888, uses Post Mouse to click the mouse button down 890, and returns 892.
  • The Page [0149] Right command 894 moves the contents of the current window right a page. Page Right first looks 896 for the horizontal scroll bar in the front window. If, 898, the scroll bar is not found, Page Right simply returns 904. If the scroll bar is found, Page Right calculates the page right button coordinates 900, uses Post Mouse to click the mouse button down 902, and returns 904.
  • Referring to FIG. 21F, the [0150] Move command 906 moves the mouse from its current location (y,x), to a new location (y+δy,x+δx). First, Move gets the command parameters 908, then Move sets the mouse speed to tablet 910 (this cancels the mouse acceleration, which otherwise would make mouse movements uncontrollable), adds the offset parameters to the current mouse location 912, forces a new cursor position and resets the mouse speed 914, and returns 916.
  • The Move to Global Coordinate [0151] command 918 moves the cursor to the global coordinates given by the Voice Control command string. First, Move to Global gets the command parameters 920, then Move to Global checks 922 if there is a position parameter. If there is a position parameter, the screen position coordinates are fetched 924. In either case, the global coordinates are calculated 926, the mouse speed is set to tablet 928, the mouse position is set to the new coordinates 930, the cursor is forced to the new position 932, and Move to Global returns 934.
  • The Move to Local Coordinate [0152] command 936 moves the cursor to the local coordinates given by the Voice Control command string. First, Move to Local gets the command parameters 938, then Move to Local checks 940 if there is a position parameter. If there is a position parameter, the local position coordinates are fetched 942. In either case, the global coordinates are calculated 944, the mouse speed is set to tablet 946, the mouse position is set to the new coordinates 948, the cursor is forced to the new position 950, and Move to Global returns 952.
  • The Move [0153] Continuous command 954 moves the mouse continuously from its present location, moving δy,δx every refresh of the screen. This is accomplished by inserting 956 the VBL Move routine 960 in the Vertical Blanking Interrupt queue of the Macintosh and returning 958. Once in the queue, the VBL Move routine 960 will be executed every screen refresh. The VBL Move routine simply adds the δy and δx values to the current cursor position 962, resets the cursor 964, and returns 966.
  • Referring to FIG. 21G, the Option Key Down [0154] command 968 sets the option key down. This is done by setting the option key bit in the keyboard bit map to TRUE 970, and returning 972.
  • The Option Key Up [0155] command 974 sets the option key up. This is done by setting the option key bit in the keyboard bit map to FALSE 976, and returning 978.
  • The Shift Key Down [0156] command 980 sets the shift key down. This is done by setting the shift key bit in the keyboard bit map to TRUE 982, and returning 984.
  • The Shift Key Up [0157] command 986 sets the shift key up. This is done by setting the shift key bit in the keyboard bit map to FALSE 988, and returning 990.
  • The Command [0158] Key Down command 992 sets the command key down. This is done by setting the command key bit in the keyboard bit map to TRUE 994, and returning 996.
  • The Command Key Up command [0159] 998 sets the command key up. This is done by setting the command key bit in the keyboard bit map to FALSE 1000, and returning 1002.
  • The Control [0160] Key Down command 1004 sets the control key down. This is done by setting the control key bit in the keyboard bit map to TRUE 1006, and returning 1008.
  • The Control [0161] Key Up command 1010 sets the control key up. This is done by setting the control key bit in the keyboard bit map to FALSE 1012, and returning 1014.
  • The [0162] Next Window command 1016 moves the front window to the back. This is done by getting the front window 1018 and sending it to the back 1020, and returning 1022.
  • The Erase [0163] command 1024 erases numchars characters from the screen. The number of characters typed by the most recent voice command is stored by Voice Control. Therefore, Erase will erase the characters from the most recent voice command. This is done by a loop which posts delete key keydown events 1026 and checks 1028 if the number posted equals numchars. When numchars deletes have been posted, Erase returns 1030.
  • The Capitalize [0164] command 1032 capitalizes the next keystroke. This is done by setting the caps flag to TRUE 1034, and returning 1036.
  • The [0165] Launch command 1038 launches an application. The application must be on the boot drive no more than one level deep. This is done by getting the name of the application 1040 (“appl_name”), searching for appl_name on the boot volume 1042, and, if, 1044, the application is found, setting the volume to the application folder 1048, launching the application 1050 (no return is necessary because the new application will clear the Macintosh queue). If the application is not found, Launch simply returns 1046.
  • Referring to FIG. 22, the Post Mouse routine [0166] 1052 posts mouse down events to the Macintosh event queue and can set traps to monitor mouse activity and to keep the mouse down. The actions of Post Mouse are determined by the Voice Control flags global and mousedown, which are set by command handlers before calling Post Mouse. After a Post Mouse, when an application does a get_next_event it will see a mouse down event in the event queue, leading to events such as clicks, mouse downs or double clicks.
  • First, Post Mouse saves the [0167] current mouse location 1054 so that the mouse may be returned to its initial location after the mouse events are produced. Next the cursor is hidden 1056 to shield the user from seeing the mouse moving around the screen. Next the global flag is checked. If, 1058, the coordinates are local (i.e. global=FALSE) then they are converted 1060 to global coordinates. Next, the mouse speed is set to tablet 1062 (to avoid acceleration problems), and the mouse down is posted to the Macintosh event queue 1064. If, 1066, the mousedown flag is TRUE (i.e. if the mouse button should be held down) then the set Mouse Down routine is called 1072 and Post Mouse returns 1070. Otherwise, if the mouse down flag is FALSE, then a click is created by posting a mouse up event to the Macintosh event queue 1068 and returning 1070.
  • Referring to FIG. 23, the Set [0168] Mouse Down routine 1072 holds the mouse button down by replacing 1074 the Macintosh button trap with a Voice Control trap named My Button. The My Button trap then recognizes further voice commands and creates mouse drags or clicks as appropriate. After initializing My Button, Set Mouse Down checks 1076 if the Macintosh is a Macintosh Plus, in which case the Post Event trap must also be reset 1078 to the Voice Control My Post Event trap. (The Macintosh Plus will not simply check the mbState global flag to determine the mouse button state. Rather, the Post Event trap in a Macintosh Plus will poll the actual mouse button to determine its state, and will post mouse up events if the mouse button is up. Therefore, to force the Macintosh Plus to accept the mouse button state as dictated by Voice Control, during voice actions, the Post Event trap is replaced with a My Post Event trap, which will not poll the status of the mouse button.) Next, the mbstate flag is set to MouseDown 1080 (indicating that the mouse button is down) and Set Mouse Down returns 1082.
  • The My [0169] Button trap 1084 replaces the Macintosh button trap, thereby seizing control of the button state from the operating system. Each time My Button is called, it checks 1086 the Macintosh mouse button state bit mbstate. If mbState has been set to UP, My Button moves to the End Button routine 1106 which sets mbstate to UP 1108, removes any VBL routine which has been installed 1110, resets the Button and Post Event traps to the original Macintosh traps 1112, resets the mouse speed and couples the cursor to the mouse 1114, shows the cursor 1102, and returns 1104.
  • However, if the mouse button is to remain down, My Button checks for the expiration of wait ticks (which allow the Macintosh time to draw menus on the screen) [0170] 1088, and calls the recognize routine 1090 to recognize further speech commands. After further speech commands are recognized, My Button determines 1092 its next action based on the length of the command string. If the command string length is less than zero, then the next voice command was a Voice Control internal command, and the mouse button is released by calling End Button 1106. If the command string length is greater than zero, then a command was recognized, and the command is queued onto the voice que 1094, and the voice queue is checked for further commands 1096. If nothing was recognized (command string length of zero), then My Button skips directly to checking the voice queue 1096. If there is nothing in the voice queue, then My Button returns 1104. However, if there is a command in the voice queue, then My Button checks 1098 if the command is a mouse movement command (which would cause a mouse drag). If it is not a mouse movement, then the mouse button is released by calling End Button 1106. If the command is a mouse movement, then the command is executed 1100 (which drags the mouse), the cursor is displayed 1102, and My Button returns.
  • Screen Displays
  • Referring to FIG. 24, a screen display of a record actions session is shown. The user is recording a [0171] local mouse click 1106, and the click is being acknowledged in the action list 1108 and in the action window 1110.
  • Referring to FIG. 25, a record actions session using dialog boxes is shown. The [0172] dialog boxes 1112 for recording a manual printer feed are displayed to the user, as well as the Voice Control Run Modal dialog box 1114 prompting the user to record the dialogs. The user is preparing to record a click on the Manual Feed button 1116.
  • Referring to FIG. 26, the Language Maker menu [0173] 1118 is shown.
  • Referring to FIG. 27, the user has requested the current language, which is displayed by Voice Control in a pop-[0174] up display 1120.
  • Referring to FIG. 28, the user has clicked on the utterance name “apple” [0175] 1122, requesting a retraining of the utterance for “apple”. Voice Control has responded with a dialog box 1124 asking the user to say “apple” twice into the microphone.
  • Referring to FIG. 29, the text format of a Write Production output file [0176] 1126 (to be compiled by VOCAL) and the corresponding Language Maker display for the file 1128 are shown. It is clear from FIG. 29 that the Language Maker display is far more intuitive.
  • Referring to FIG. 30, a listing of the Write Production output file as displayed in FIG. 29 is provided. [0177]
  • Other Embodiments
  • Other embodiments of the invention are within the scope of the claims which follow the appendices. For example, the graphic user interface controlled by a voice recognition system could be other than that of the Apple Macintosh computer. The recognizer could be other than that marketed by Dragon Systems. [0178]
  • Included in the Appendices are Appendix A, which sets forth the Voice Control command language syntax, Appendix B, which lists some of the Macintosh OS globals used by the Voice Navigator system, Appendix C, which is a fiche of the Voice Navigator executable code, Appendix D, which is the Developer's Reference Manual for the voice Navigator system, and Appendix E, which is the Voice Navigator User's Manual, all incorporated by reference herein. [0179]
  • A portion of the disclosure of this patent document contains material which is subject to copyright protection (for example, the microfiche Appendix, the User's Manual, and the Reference Manual). The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. [0180]
  • Appendix A: Voice Control Command Language Syntax
  • Menu Command—@MENU(menuname,itemnum). [0181]
  • Finds item named itemnum in the menu named menuname and selects it. If itemnum is 0, hold the menu down. [0182]
  • Control Command—@CTRL(ctlname) [0183]
  • Finds the control named ctlname and clicks in its rectangle. [0184]
  • Key Pad Command—@KYPD(n), where n=0-9, −, +, *, /, =, and c for clear [0185]
  • Posts a Keydown for keys on the numeric keypad. [0186]
  • Zoom Command—@ZOOM [0187]
  • Clicks in the zoom box of the front window. [0188]
  • Local Mouse Click Command—@LMSE(y,x) [0189]
  • Clicks at local coordinates (y,x) of the front window. [0190]
  • Global Mouse Click Command—@GMSE(y,x) [0191]
  • Clicks at the global coordinates (y,x) of the current screen. [0192]
  • Double Click Command—@DCLK(y,x) [0193]
  • Double clicks at the global coordinates (y,x) of the current screen. If y=x=0, double click at the current Mouse location. [0194]
  • Mouse Down Command—@MSDN [0195]
  • Set the mouse button state to down and set up traps to keep it down. [0196]
  • Mouse Up Command—@MSUP [0197]
  • Set the mouse button state to up. [0198]
  • Scroll Down Command—@SCDN [0199]
  • Post a mouse down in the down arrow portion of the front window's scroll bar. [0200]
  • Scroll Up Command—@SCUP [0201]
  • Post a mouse down in the up arrow portion of the front window's scroll bar. [0202]
  • Scroll Left Command—@SCUP [0203]
  • Post a mouse down in the left arrow portion of the front window's scroll bar. [0204]
  • Scroll Right Command—@SCRT [0205]
  • Post a mouse down in the right arrow portion of the front window's scroll bar. [0206]
  • Page Down Command—@PGDN [0207]
  • Click in the page down portion of the front window's scroll bar. [0208]
  • Page Up Command—@PGUP [0209]
  • Click in the page up portion of the front window's scroll bar. [0210]
  • Pare Left Command—@PGLF [0211]
  • Click in the page left portion of the front window's scroll bar. [0212]
  • Page Right Command—@PGRT [0213]
  • Click in the page right portion of the front window's scroll bar. [0214]
  • Move Command—@MOVE(δy,δx) [0215]
  • Move the Mouse from its current location (y,x), to a new location (y+δy,x+δx) where δy and δx are pixels and can be either positive or negative values. [0216]
  • Move Continuous Command—MOVI(δy,δx) [0217]
  • Move the mouse continuously from its present location, moving δy,δx every refresh of the screen. [0218]
  • Move to Local Coordinate Command—MOVL(y,x<,windowname>) or [0219]
  • MOVL(n<,y,x<,windowname>> where n=N,S,E,W,NE,SE,SW,NW,C,G [0220]
  • Move the cursor to the local coordinates given by (y,x) or by (n.v+y,n.h+x). Use the grafPort of the window named “windowname”. If there is no “windowname” use the grafPort of the front window. [0221]
  • Move to Global Coordinate Command—@MOVG(n,<y,x>) [0222]
  • where n=N,S,E,W,NE,SE,SW,NW,C,G [0223]
  • move the cursor to the global coordinates given by (y,x) or by (n.v+y,n.h+x). Use the grafport of the screen. [0224]
  • Option Key Down Command—@OPTD [0225]
  • Press (and hold) the option key. [0226]
  • Option Key Up Command—@OPTU [0227]
  • Release the option key. [0228]
  • Shift Key Down Command—@SHFD [0229]
  • Press (and hold) the shift key. [0230]
  • Shift Key Up Command—@SHFU [0231]
  • Release the shift key. [0232]
  • Command Key Down Command—@CMDD [0233]
  • Press (and hold) the command key. [0234]
  • Command Key Up Command—@CMDU [0235]
  • Release the command key. [0236]
  • Control Key Down Command—@CTLD [0237]
  • Press (and hold) the control key. [0238]
  • Control Key Up Command—@CTLU [0239]
  • Release the control key. [0240]
  • Next Window Command—@NEXT [0241]
  • Sends the front window to the back. [0242]
  • Erase Command—@ERAS [0243]
  • Erase the last numChars typed. [0244]
  • Capitalize Command—@CAPS [0245]
  • Capitalize the next letter typed. [0246]
  • Launch Command—@LAUN(application[0247] 13 name)
  • Launch the application named application_name. The application must be on the boot drive no more than one level deep. [0248]
  • Wait Command—@WAIT(nnn) [0249]
  • Wait for nnn ticks to elapse before doing anything else in recognition. [0250]
  • Appendix B: Macintosh OS Globals
  • Interfacing to the Macintosh Operating System requires that certain low memory globals be managed by Voice Control. The following describes the most important globals. Further information is available in “Inside Macintosh”, Vols. I-V. [0251]
  • Mouse Globals
  • MickeyBytes EQU $D6A—a pointer to the cursor value; used to control the acceleration of the mouse. Set to point to tablet whenever the mouse is moved more than 10 pixels. [pointer][0252]
  • MTemp EQU $828—a low-level interrupt mouse location; used to move the mouse during VBL handling while executing a @MOVI command. [long][0253]
  • Mouse EQU $830—the processed mouse coordinate; used to move the mouse for all other @MOVX commands. [long][0254]
  • MBState EQU $172—current mouse button state; used to set the MouseDown for @MSDN and for @MENU when itemname —0. [byte][0255]
  • Keyboard Globals
  • KeyMap EQU $174—keyboard bit map, with one bit mapped to each key on the keyboard. Set the bit to TRUE to set the Meta keys (option, command, shift, control) down. [2 longs][0256]
  • Filter Globals
  • JGNEFilter EQU $29A—Get Next Event filter proc; set to Voice Control's main loop to intercept calls to Get Next Event. [pointer][0257]
  • Event Queue Globals
  • evtMax EQU $1E—maximum number of events in the event queue. When this number is reached, stop Posting events. [0258]
  • EventQueue EQU $14A—event queue header, the location of the Macintosh event queue. [10 bytes][0259]
  • Time Globals
  • Ticks EQU $16A—Tick count, time since boot. Used to measure elapsed time between Voice Control actions. [long][0260]
  • Cursor Globals
  • CrsrCouple EQU $8CF—cursor coupled to mouse? Used to disconnect cursor when doing remote clicks with @LMSE and @GMSE. [byte][0261]
  • CrsrNew EQU $8CE—Cursor changed? Force a new cursor after moving the cursor. [byte][0262]
  • Menu Globals
  • MenuList EQU $A1 Current menuBar list structure. This handle can be de-referenced to find all the menus associated with an application. Use for @MENU commands [handle][0263]
  • Window Globals
  • WindowList EQU $9D6—Z-ordered linked list of windows. This pointer will lead to a chain of all existing windows for an application. Use to find a window queue for all local commands. [pointer][0264]
  • Window Offsets
  • These values are offsets within the window records that describe characteristics of the window. Once a window is located, these offsets are used to calculate: [0265]
  • [0266] thePort EQU 0—GrafPtr; local coordinates for @LMSE and @MOVL commands.
  • portRect EQU $10—port's rectangle [rect]; window relative forms of the @MOVL command. [0267]
  • [0268] controlList EQU 140—used to find the controls associated with a window.
  • [0269] contrlTitle EQU 40—used to compare control Titles for @CTRL commands. contrlRect EQU 8—used to calculate the click locations in a control.
  • nextwindow [0270] EQU 144—used to locate the next window for the @NEXT command.

Claims (4)

1. A system for enabling voiced utterances to be substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard, the system comprising
a voice recognizer for recognizing a voiced utterance, and
an interpreter for converting the voiced utterance into control signals which will directly create a desired action aided by the operating system in the computer without first being converted into control signals expressed in the predetermined format specific to the keyboard.
2. A method for converting voiced utterances to commands, expressed in a predefined command language, to be used by an operating system of a computer, comprising
converting some voiced utterances into commands corresponding to actions to be taken by said operating system, and
converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under said operating system.
3. A method of generating a table for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, said application program including menus and control buttons, said method comprising
parsing the instruction sequence of the application program to identify menu entries and control buttons, and
including in said table an entry for each menu entry and control button found in said application program, each said entry containing a control command corresponding to said menu entry or control button.
4. A method of enabling a user to create an instance in a formal language of the kind which has a strictly defined syntax, comprising
providing a graphically displayed list of entries which are expressed in a natural language and which do not comply with said syntax,
permitting the user to point to an entry on said list, and
automatically generating said instance corresponding to the identified entry in the list in response to said pointing.
US09/852,049 1989-06-23 2001-05-09 Voice controlled computer interface Abandoned US20020128843A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/852,049 US20020128843A1 (en) 1989-06-23 2001-05-09 Voice controlled computer interface

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US37077989A 1989-06-23 1989-06-23
US97343592A 1992-11-09 1992-11-09
US08/165,014 US5377303A (en) 1989-06-23 1993-12-09 Controlled computer interface
US20088694A 1994-02-23 1994-02-23
US45077695A 1995-05-25 1995-05-25
US67434196A 1996-07-02 1996-07-02
US97690897A 1997-11-24 1997-11-24
US09/852,049 US20020128843A1 (en) 1989-06-23 2001-05-09 Voice controlled computer interface

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US97690897A Continuation 1989-06-23 1997-11-24

Publications (1)

Publication Number Publication Date
US20020128843A1 true US20020128843A1 (en) 2002-09-12

Family

ID=23461140

Family Applications (4)

Application Number Title Priority Date Filing Date
US08/165,014 Expired - Lifetime US5377303A (en) 1989-06-23 1993-12-09 Controlled computer interface
US09/783,725 Abandoned US20020010582A1 (en) 1989-06-23 2001-02-14 Voice controlled computer interface
US09/852,049 Abandoned US20020128843A1 (en) 1989-06-23 2001-05-09 Voice controlled computer interface
US10/102,047 Abandoned US20020178009A1 (en) 1989-06-23 2002-03-20 Voice controlled computer interface

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US08/165,014 Expired - Lifetime US5377303A (en) 1989-06-23 1993-12-09 Controlled computer interface
US09/783,725 Abandoned US20020010582A1 (en) 1989-06-23 2001-02-14 Voice controlled computer interface

Family Applications After (1)

Application Number Title Priority Date Filing Date
US10/102,047 Abandoned US20020178009A1 (en) 1989-06-23 2002-03-20 Voice controlled computer interface

Country Status (2)

Country Link
US (4) US5377303A (en)
JP (1) JPH03163623A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050154588A1 (en) * 2001-12-12 2005-07-14 Janas John J.Iii Speech recognition and control in a process support system
US20110184730A1 (en) * 2010-01-22 2011-07-28 Google Inc. Multi-dimensional disambiguation of voice commands
US20120150546A1 (en) * 2010-12-13 2012-06-14 Hon Hai Precision Industry Co., Ltd. Application starting system and method
US20130054247A1 (en) * 2011-08-31 2013-02-28 International Business Machines Corporation Facilitating tangible interactions in voice applications
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge
CN115509627A (en) * 2022-11-22 2022-12-23 威海海洋职业学院 Electronic equipment awakening method and system based on artificial intelligence

Families Citing this family (216)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03163623A (en) * 1989-06-23 1991-07-15 Articulate Syst Inc Voice control computor interface
US5850627A (en) * 1992-11-13 1998-12-15 Dragon Systems, Inc. Apparatuses and methods for training and operating speech recognition systems
US6092043A (en) * 1992-11-13 2000-07-18 Dragon Systems, Inc. Apparatuses and method for training and operating speech recognition systems
US5890122A (en) * 1993-02-08 1999-03-30 Microsoft Corporation Voice-controlled computer simulateously displaying application menu and list of available commands
JP3530591B2 (en) * 1994-09-14 2004-05-24 キヤノン株式会社 Speech recognition apparatus, information processing apparatus using the same, and methods thereof
EP0747807B1 (en) * 1995-04-11 2002-03-06 Dragon Systems Inc. Moving an element shown on a computer display
US5761641A (en) * 1995-07-31 1998-06-02 Microsoft Corporation Method and system for creating voice commands for inserting previously entered information
US5903864A (en) * 1995-08-30 1999-05-11 Dragon Systems Speech recognition
US5903870A (en) * 1995-09-18 1999-05-11 Vis Tell, Inc. Voice recognition and display device apparatus and method
US5799279A (en) * 1995-11-13 1998-08-25 Dragon Systems, Inc. Continuous speech recognition of text and commands
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US5794189A (en) * 1995-11-13 1998-08-11 Dragon Systems, Inc. Continuous speech recognition
US6601027B1 (en) 1995-11-13 2003-07-29 Scansoft, Inc. Position manipulation in speech recognition
US5920841A (en) * 1996-07-01 1999-07-06 International Business Machines Corporation Speech supported navigation of a pointer in a graphical user interface
US5873064A (en) * 1996-11-08 1999-02-16 International Business Machines Corporation Multi-action voice macro method
US5930757A (en) * 1996-11-21 1999-07-27 Freeman; Michael J. Interactive two-way conversational apparatus with voice recognition
US6108515A (en) * 1996-11-21 2000-08-22 Freeman; Michael J. Interactive responsive apparatus with visual indicia, command codes, and comprehensive memory functions
KR100288976B1 (en) * 1997-01-08 2001-05-02 윤종용 Method for constructing and recognizing menu commands of television receiver
US5924068A (en) * 1997-02-04 1999-07-13 Matsushita Electric Industrial Co. Ltd. Electronic news reception apparatus that selectively retains sections and searches by keyword or index for text to speech conversion
US5909667A (en) * 1997-03-05 1999-06-01 International Business Machines Corporation Method and apparatus for fast voice selection of error words in dictated text
US5893063A (en) * 1997-03-10 1999-04-06 International Business Machines Corporation Data processing system and method for dynamically accessing an application using a voice command
US5897618A (en) * 1997-03-10 1999-04-27 International Business Machines Corporation Data processing system and method for switching between programs having a same title using a voice command
US5884265A (en) * 1997-03-27 1999-03-16 International Business Machines Corporation Method and system for selective display of voice activated commands dialog box
US6212498B1 (en) 1997-03-28 2001-04-03 Dragon Systems, Inc. Enrollment in speech recognition
US5966691A (en) * 1997-04-29 1999-10-12 Matsushita Electric Industrial Co., Ltd. Message assembler using pseudo randomly chosen words in finite state slots
US6038534A (en) * 1997-09-11 2000-03-14 Cowboy Software, Inc. Mimicking voice commands as keyboard signals
ATE254327T1 (en) * 1997-12-30 2003-11-15 Koninkl Philips Electronics Nv VOICE RECOGNITION APPARATUS USING A COMMAND LEXICO
US6438523B1 (en) 1998-05-20 2002-08-20 John A. Oberteuffer Processing handwritten and hand-drawn input and speech input
US6195635B1 (en) 1998-08-13 2001-02-27 Dragon Systems, Inc. User-cued speech recognition
US6243076B1 (en) 1998-09-01 2001-06-05 Synthetic Environments, Inc. System and method for controlling host system interface with point-of-interest data
US6514201B1 (en) 1999-01-29 2003-02-04 Acuson Corporation Voice-enhanced diagnostic medical ultrasound system and review station
US6487530B1 (en) * 1999-03-30 2002-11-26 Nortel Networks Limited Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
US6330540B1 (en) 1999-05-27 2001-12-11 Louis Dischler Hand-held computer device having mirror with negative curvature and voice recognition
AU7769400A (en) * 1999-08-13 2001-03-13 Genologic Gmbh Device for converting spoken commands and/or spoken texts into keyboard and/or mouse movements and/or texts
US20010043234A1 (en) * 2000-01-03 2001-11-22 Mallik Kotamarti Incorporating non-native user interface mechanisms into a user interface
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US7109970B1 (en) 2000-07-01 2006-09-19 Miller Stephen S Apparatus for remotely controlling computers and other electronic appliances/devices using a combination of voice commands and finger movements
US7035805B1 (en) * 2000-07-14 2006-04-25 Miller Stephen S Switching the modes of operation for voice-recognition applications
US6836759B1 (en) * 2000-08-22 2004-12-28 Microsoft Corporation Method and system of handling the selection of alternates for recognized words
US7120646B2 (en) * 2001-04-09 2006-10-10 Health Language, Inc. Method and system for interfacing with a multi-level data structure
US7466992B1 (en) 2001-10-18 2008-12-16 Iwao Fujisaki Communication device
US7127271B1 (en) 2001-10-18 2006-10-24 Iwao Fujisaki Communication device
US7107081B1 (en) 2001-10-18 2006-09-12 Iwao Fujisaki Communication device
US6889191B2 (en) * 2001-12-03 2005-05-03 Scientific-Atlanta, Inc. Systems and methods for TV navigation with compressed voice-activated commands
US7996232B2 (en) * 2001-12-03 2011-08-09 Rodriguez Arturo A Recognition of voice-activated commands
US20040054538A1 (en) * 2002-01-03 2004-03-18 Peter Kotsinadelis My voice voice agent for use with voice portals and related products
KR20020023294A (en) * 2002-01-12 2002-03-28 (주)코리아리더스 테크놀러지 GUI Context based Command and Control Method with Speech recognition
JP2003241790A (en) * 2002-02-13 2003-08-29 Internatl Business Mach Corp <Ibm> Speech command processing system, computer device, speech command processing method, and program
US7548847B2 (en) * 2002-05-10 2009-06-16 Microsoft Corporation System for automatically annotating training data for a natural language understanding system
US20040107179A1 (en) * 2002-08-22 2004-06-03 Mdt, Inc. Method and system for controlling software execution in an event-driven operating system environment
US8856093B2 (en) * 2002-09-03 2014-10-07 William Gross Methods and systems for search indexing
US7496559B2 (en) * 2002-09-03 2009-02-24 X1 Technologies, Inc. Apparatus and methods for locating data
US8229512B1 (en) 2003-02-08 2012-07-24 Iwao Fujisaki Communication device
US8241128B1 (en) 2003-04-03 2012-08-14 Iwao Fujisaki Communication device
US20050027539A1 (en) * 2003-07-30 2005-02-03 Weber Dean C. Media center controller system and method
US8090402B1 (en) 2003-09-26 2012-01-03 Iwao Fujisaki Communication device
US7389235B2 (en) * 2003-09-30 2008-06-17 Motorola, Inc. Method and system for unified speech and graphic user interfaces
US20050083300A1 (en) * 2003-10-20 2005-04-21 Castle Daniel C. Pointer control system
US8121635B1 (en) 2003-11-22 2012-02-21 Iwao Fujisaki Communication device
US7945914B2 (en) * 2003-12-10 2011-05-17 X1 Technologies, Inc. Methods and systems for performing operations in response to detecting a computer idle condition
US20050204295A1 (en) * 2004-03-09 2005-09-15 Freedom Scientific, Inc. Low Vision Enhancement for Graphic User Interface
US8041348B1 (en) 2004-03-23 2011-10-18 Iwao Fujisaki Communication device
US20060044261A1 (en) * 2004-09-02 2006-03-02 Kao-Cheng Hsieh Pointing input device imitating inputting of hotkeys of a keyboard
US20060123220A1 (en) * 2004-12-02 2006-06-08 International Business Machines Corporation Speech recognition in BIOS
US8788271B2 (en) * 2004-12-22 2014-07-22 Sap Aktiengesellschaft Controlling user interfaces with contextual voice commands
US8208954B1 (en) 2005-04-08 2012-06-26 Iwao Fujisaki Communication device
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8635073B2 (en) 2005-09-14 2014-01-21 At&T Intellectual Property I, L.P. Wireless multimodal voice browser for wireline-based IPTV services
US8229733B2 (en) * 2006-02-09 2012-07-24 John Harney Method and apparatus for linguistic independent parsing in a natural language systems
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US20080072234A1 (en) * 2006-09-20 2008-03-20 Gerald Myroup Method and apparatus for executing commands from a drawing/graphics editor using task interaction pattern recognition
US8886540B2 (en) * 2007-03-07 2014-11-11 Vlingo Corporation Using speech recognition results based on an unstructured language model in a mobile communication facility application
US8635243B2 (en) * 2007-03-07 2014-01-21 Research In Motion Limited Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US8949130B2 (en) * 2007-03-07 2015-02-03 Vlingo Corporation Internal and external speech recognition use with a mobile communication facility
US20110054899A1 (en) * 2007-03-07 2011-03-03 Phillips Michael S Command and control utilizing content information in a mobile voice-to-speech application
US20080221884A1 (en) 2007-03-07 2008-09-11 Cerra Joseph P Mobile environment speech processing facility
US8838457B2 (en) * 2007-03-07 2014-09-16 Vlingo Corporation Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility
US8886545B2 (en) 2007-03-07 2014-11-11 Vlingo Corporation Dealing with switch latency in speech recognition
US8949266B2 (en) 2007-03-07 2015-02-03 Vlingo Corporation Multiple web-based content category searching in mobile search application
US10056077B2 (en) 2007-03-07 2018-08-21 Nuance Communications, Inc. Using speech recognition results based on an unstructured language model with a music system
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
TWI345218B (en) * 2007-04-20 2011-07-11 Asustek Comp Inc Portable computer with function for identiying speech and processing method thereof
US7890089B1 (en) 2007-05-03 2011-02-15 Iwao Fujisaki Communication device
US8676273B1 (en) 2007-08-24 2014-03-18 Iwao Fujisaki Communication device
US8165886B1 (en) 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8639214B1 (en) 2007-10-26 2014-01-28 Iwao Fujisaki Communication device
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US8543157B1 (en) 2008-05-09 2013-09-24 Iwao Fujisaki Communication device which notifies its pin-point location or geographic area in accordance with user selection
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8849672B2 (en) * 2008-05-22 2014-09-30 Core Wireless Licensing S.A.R.L. System and method for excerpt creation by designating a text segment using speech
US8340726B1 (en) 2008-06-30 2012-12-25 Iwao Fujisaki Communication device
US8452307B1 (en) 2008-07-02 2013-05-28 Iwao Fujisaki Communication device
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10540976B2 (en) * 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
JP5463922B2 (en) * 2010-01-12 2014-04-09 株式会社デンソー In-vehicle machine
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9043206B2 (en) 2010-04-26 2015-05-26 Cyberpulse, L.L.C. System and methods for matching an utterance to a template hierarchy
US8165878B2 (en) 2010-04-26 2012-04-24 Cyberpulse L.L.C. System and methods for matching an utterance to a template hierarchy
US8738377B2 (en) 2010-06-07 2014-05-27 Google Inc. Predicting and learning carrier phrases for speech input
US8660934B2 (en) * 2010-06-30 2014-02-25 Trading Technologies International, Inc. Order entry actions
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
KR101295711B1 (en) * 2011-02-15 2013-08-16 주식회사 팬택 Mobile communication terminal device and method for executing application with voice recognition
US9081550B2 (en) * 2011-02-18 2015-07-14 Nuance Communications, Inc. Adding speech capabilities to existing computer applications with complex graphical user interfaces
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8954334B2 (en) * 2011-10-15 2015-02-10 Zanavox Voice-activated pulser
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
CN103577072A (en) * 2012-07-26 2014-02-12 中兴通讯股份有限公司 Terminal voice assistant editing method and device
TW201409351A (en) * 2012-08-16 2014-03-01 Hon Hai Prec Ind Co Ltd Electronic device with voice control function and voice control method
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8862476B2 (en) * 2012-11-16 2014-10-14 Zanavox Voice-activated signal generator
EP2954514B1 (en) 2013-02-07 2021-03-31 Apple Inc. Voice trigger for a digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
US9659058B2 (en) 2013-03-22 2017-05-23 X1 Discovery, Inc. Methods and systems for federation of results from search indexing
US9880983B2 (en) 2013-06-04 2018-01-30 X1 Discovery, Inc. Methods and systems for uniquely identifying digital content for eDiscovery
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
WO2014200728A1 (en) 2013-06-09 2014-12-18 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
AU2014278595B2 (en) 2013-06-13 2017-04-06 Apple Inc. System and method for emergency calls initiated by voice command
KR101749009B1 (en) 2013-08-06 2017-06-19 애플 인크. Auto-activating smart responses based on activities from remote devices
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
CN105138110A (en) * 2014-05-29 2015-12-09 中兴通讯股份有限公司 Voice interaction method and voice interaction device
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10346550B1 (en) 2014-08-28 2019-07-09 X1 Discovery, Inc. Methods and systems for searching and indexing virtual environments
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US20160225369A1 (en) * 2015-01-30 2016-08-04 Google Technology Holdings LLC Dynamic inference of voice command for software operation from user manipulation of electronic device
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US20160328205A1 (en) * 2015-05-05 2016-11-10 Motorola Mobility Llc Method and Apparatus for Voice Operation of Mobile Applications Having Unnamed View Elements
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10580405B1 (en) * 2016-12-27 2020-03-03 Amazon Technologies, Inc. Voice control of remote device
CN110546603A (en) * 2017-04-25 2019-12-06 惠普发展公司,有限责任合伙企业 Machine learning command interaction
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4726065A (en) * 1984-01-26 1988-02-16 Horst Froessl Image manipulation by speech signals
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US4799144A (en) * 1984-10-12 1989-01-17 Alcatel Usa, Corp. Multi-function communication board for expanding the versatility of a computer
US5027406A (en) * 1988-12-06 1991-06-25 Dragon Systems, Inc. Method for interactive speech recognition and training
US5133011A (en) * 1990-12-26 1992-07-21 International Business Machines Corporation Method and apparatus for linear vocal control of cursor position
US5157384A (en) * 1989-04-28 1992-10-20 International Business Machines Corporation Advanced user interface
US5231670A (en) * 1987-06-01 1993-07-27 Kurzweil Applied Intelligence, Inc. Voice controlled system and method for generating text from a voice controlled input
US5377303A (en) * 1989-06-23 1994-12-27 Articulate Systems, Inc. Controlled computer interface
US6684188B1 (en) * 1996-02-02 2004-01-27 Geoffrey C Mitchell Method for production of medical records and other technical documents

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4144582A (en) * 1970-12-28 1979-03-13 Hyatt Gilbert P Voice signal processing system
US3928724A (en) * 1974-10-10 1975-12-23 Andersen Byram Kouma Murphy Lo Voice-actuated telephone directory-assistance system
US4462080A (en) * 1981-11-27 1984-07-24 Kearney & Trecker Corporation Voice actuated machine control
JPS58195957A (en) * 1982-05-11 1983-11-15 Casio Comput Co Ltd Program starting system by voice
US4627001A (en) * 1982-11-03 1986-12-02 Wang Laboratories, Inc. Editing voice data
US4688195A (en) * 1983-01-28 1987-08-18 Texas Instruments Incorporated Natural-language interface generating system
US4704696A (en) * 1984-01-26 1987-11-03 Texas Instruments Incorporated Method and apparatus for voice control of a computer
JPS60158498A (en) * 1984-01-27 1985-08-19 株式会社リコー Pattern collation system
US4811243A (en) * 1984-04-06 1989-03-07 Racine Marsh V Computer aided coordinate digitizing system
US4874177A (en) * 1984-05-30 1989-10-17 Girardin Ronald E Horse racing game
US4914704A (en) * 1984-10-30 1990-04-03 International Business Machines Corporation Text editor for speech input
US4785408A (en) * 1985-03-11 1988-11-15 AT&T Information Systems Inc. American Telephone and Telegraph Company Method and apparatus for generating computer-controlled interactive voice services
JPH0638055B2 (en) * 1985-09-17 1994-05-18 東京電気株式会社 Multi-range load cell weighing method
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US4829576A (en) * 1986-10-21 1989-05-09 Dragon Systems, Inc. Voice recognition system
US4827520A (en) * 1987-01-16 1989-05-02 Prince Corporation Voice actuated control system for use in a vehicle
GB8702910D0 (en) * 1987-02-10 1987-03-18 British Telecomm Multi-user speech recognition system
JP2815579B2 (en) * 1987-03-10 1998-10-27 富士通株式会社 Word candidate reduction device in speech recognition
JP2558682B2 (en) * 1987-03-13 1996-11-27 株式会社東芝 Intellectual work station
US5022081A (en) * 1987-10-01 1991-06-04 Sharp Kabushiki Kaisha Information recognition system
US4821211A (en) * 1987-11-19 1989-04-11 International Business Machines Corp. Method of navigating among program menus using a graphical menu tree
US4984177A (en) * 1988-02-05 1991-01-08 Advanced Products And Technologies, Inc. Voice language translator
US5054082A (en) * 1988-06-30 1991-10-01 Motorola, Inc. Method and apparatus for programming devices to recognize voice commands
US5208745A (en) * 1988-07-25 1993-05-04 Electric Power Research Institute Multimedia interface and method for computer system
US4931950A (en) * 1988-07-25 1990-06-05 Electric Power Research Institute Multimedia interface and method for computer system
US4949382A (en) * 1988-10-05 1990-08-14 Griggs Talkwriter Corporation Speech-controlled phonetic typewriter or display device having circuitry for analyzing fast and slow speech
JP2841404B2 (en) * 1989-01-12 1998-12-24 日本電気株式会社 Continuous speech recognition device
US5036538A (en) * 1989-11-22 1991-07-30 Telephonics Corporation Multi-station voice recognition and processing system
US5386494A (en) * 1991-12-06 1995-01-31 Apple Computer, Inc. Method and apparatus for controlling a speech recognition function using a cursor control device
US5864819A (en) * 1996-11-08 1999-01-26 International Business Machines Corporation Internal window object tree method for representing graphical user interface applications for speech navigation
US6038534A (en) * 1997-09-11 2000-03-14 Cowboy Software, Inc. Mimicking voice commands as keyboard signals

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4726065A (en) * 1984-01-26 1988-02-16 Horst Froessl Image manipulation by speech signals
US4799144A (en) * 1984-10-12 1989-01-17 Alcatel Usa, Corp. Multi-function communication board for expanding the versatility of a computer
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US5231670A (en) * 1987-06-01 1993-07-27 Kurzweil Applied Intelligence, Inc. Voice controlled system and method for generating text from a voice controlled input
US5027406A (en) * 1988-12-06 1991-06-25 Dragon Systems, Inc. Method for interactive speech recognition and training
US5157384A (en) * 1989-04-28 1992-10-20 International Business Machines Corporation Advanced user interface
US5377303A (en) * 1989-06-23 1994-12-27 Articulate Systems, Inc. Controlled computer interface
US5133011A (en) * 1990-12-26 1992-07-21 International Business Machines Corporation Method and apparatus for linear vocal control of cursor position
US6684188B1 (en) * 1996-02-02 2004-01-27 Geoffrey C Mitchell Method for production of medical records and other technical documents

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050154588A1 (en) * 2001-12-12 2005-07-14 Janas John J.Iii Speech recognition and control in a process support system
US20110184730A1 (en) * 2010-01-22 2011-07-28 Google Inc. Multi-dimensional disambiguation of voice commands
US8626511B2 (en) * 2010-01-22 2014-01-07 Google Inc. Multi-dimensional disambiguation of voice commands
US20120150546A1 (en) * 2010-12-13 2012-06-14 Hon Hai Precision Industry Co., Ltd. Application starting system and method
US20130054247A1 (en) * 2011-08-31 2013-02-28 International Business Machines Corporation Facilitating tangible interactions in voice applications
US8831955B2 (en) * 2011-08-31 2014-09-09 International Business Machines Corporation Facilitating tangible interactions in voice applications
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
US10210242B1 (en) 2012-03-21 2019-02-19 Google Llc Presenting forked auto-completions
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge
CN115509627A (en) * 2022-11-22 2022-12-23 威海海洋职业学院 Electronic equipment awakening method and system based on artificial intelligence

Also Published As

Publication number Publication date
JPH03163623A (en) 1991-07-15
US20020010582A1 (en) 2002-01-24
US20020178009A1 (en) 2002-11-28
US5377303A (en) 1994-12-27

Similar Documents

Publication Publication Date Title
US5377303A (en) Controlled computer interface
US5748191A (en) Method and system for creating voice commands using an automatically maintained log interactions performed by a user
US6308157B1 (en) Method and apparatus for providing an event-based “What-Can-I-Say?” window
US6212541B1 (en) System and method for switching between software applications in multi-window operating system
CA2115210C (en) Interactive computer system recognizing spoken commands
US7024363B1 (en) Methods and apparatus for contingent transfer and execution of spoken language interfaces
CN107111516B (en) Headless task completion in a digital personal assistant
US8140971B2 (en) Dynamic and intelligent hover assistance
EP1076288B1 (en) Method and system for multi-client access to a dialog system
US7650284B2 (en) Enabling voice click in a multimodal page
Schmandt et al. Augmenting a window system with speech input
US6377928B1 (en) Voice recognition for animated agent-based navigation
US5786818A (en) Method and system for activating focus
US7188067B2 (en) Method for integrating processes with a multi-faceted human centered interface
US8056070B2 (en) System and method for modifying and updating a speech recognition program
US6085159A (en) Displaying voice commands with multiple variables
US5890122A (en) Voice-controlled computer simulateously displaying application menu and list of available commands
US6499015B2 (en) Voice interaction method for a computer graphical user interface
US5893063A (en) Data processing system and method for dynamically accessing an application using a voice command
KR101098716B1 (en) Combing use of a stepwise markup language and an object oriented development tool
US7165034B2 (en) Information processing apparatus and method, and program
JPH0580009B2 (en)
JP2001504610A (en) Apparatus and method for indirectly grouping the contents of operation history stacks into groups
US6253177B1 (en) Method and system for automatically determining whether to update a language model based upon user amendments to dictated text
US6745165B2 (en) Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system

Legal Events

Date Code Title Description
AS Assignment

Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT

Free format text: OFFICIAL COMMITTEE OF UNSECURED CREDITORS OF LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.'S PLAN OF LIQUIDATION FOR LERNOUT & HAUSPIE SPEECH PRODUCTS N.V. UNDER CHAPTER 11 OF THE BANKRUPTCY CODE;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0157

Effective date: 20030311

Owner name: FONIX/ASI CORPORATION, UTAH

Free format text: CHANGE OF NAME;ASSIGNOR:ASI ACQUISITION CORPORATION;REEL/FRAME:019048/0536

Effective date: 19990105

Owner name: ASI ACQUISITION CORPORATION, UTAH

Free format text: MERGER;ASSIGNOR:ARTICULATE SYSTEMS, INC.;REEL/FRAME:019048/0561

Effective date: 19980902

Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT

Free format text: PLAN ADMINISTRATION AGREEMENT;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0224

Effective date: 20030530

Owner name: ASI ACQUISITION CORPORATION, UTAH

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ARTICULATE SYSTEMS, INC.;REEL/FRAME:019056/0364

Effective date: 19980902

Owner name: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V., BELGIUM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FONIX CORPORATION;REEL/FRAME:019056/0355

Effective date: 19990901

Owner name: ARTICULATE SYSTEMS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FIRMAN, THOMAS R.;REEL/FRAME:019047/0010

Effective date: 19891009

Owner name: FONIX CORPORATION, UTAH

Free format text: MERGER;ASSIGNOR:FONIX/ASI CORPORATION;REEL/FRAME:019048/0429

Effective date: 19990901

Owner name: SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECT

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS N.V.;REEL/FRAME:019047/0044

Effective date: 20030530

AS Assignment

Owner name: MULTIMODAL TECHNOLOGIES, INC., PENNSYLVANIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCOTT L. BAENA, PLAN ADMINISTRATOR FOR POST EFFECTIVE DATE L&H;REEL/FRAME:024823/0237

Effective date: 20100708

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA

Free format text: CHANGE OF NAME;ASSIGNOR:MULTIMODAL TECHNOLOGIES, INC.;REEL/FRAME:027061/0492

Effective date: 20110818

AS Assignment

Owner name: ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT, ONT

Free format text: SECURITY AGREEMENT;ASSIGNORS:MMODAL IP LLC;MULTIMODAL TECHNOLOGIES, LLC;POIESIS INFOMATICS INC.;REEL/FRAME:028824/0459

Effective date: 20120817

AS Assignment

Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA

Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:ROYAL BANK OF CANADA, AS ADMINISTRATIVE AGENT;REEL/FRAME:033459/0987

Effective date: 20140731

AS Assignment

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:MMODAL IP LLC;REEL/FRAME:034047/0527

Effective date: 20140731

Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT,

Free format text: SECURITY AGREEMENT;ASSIGNOR:MMODAL IP LLC;REEL/FRAME:034047/0527

Effective date: 20140731

AS Assignment

Owner name: CORTLAND CAPITAL MARKET SERVICES LLC, ILLINOIS

Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:MULTIMODAL TECHNOLOGIES, LLC;REEL/FRAME:033958/0511

Effective date: 20140731

AS Assignment

Owner name: MULTIMODAL TECHNOLOGIES, LLC, PENNSYLVANIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CORTLAND CAPITAL MARKET SERVICES LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:048210/0792

Effective date: 20190201

AS Assignment

Owner name: MEDQUIST CM LLC, TENNESSEE

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712

Effective date: 20190201

Owner name: MMODAL IP LLC, TENNESSEE

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712

Effective date: 20190201

Owner name: MULTIMODAL TECHNOLOGIES, LLC, TENNESSEE

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712

Effective date: 20190201

Owner name: MMODAL MQ INC., TENNESSEE

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712

Effective date: 20190201

Owner name: MEDQUIST OF DELAWARE, INC., TENNESSEE

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:WELLS FARGO BANK, NATIONAL ASSOCIATION, AS AGENT;REEL/FRAME:048411/0712

Effective date: 20190201