WO1999048088A1 - Voice controlled web browser - Google Patents

Voice controlled web browser Download PDF

Info

Publication number
WO1999048088A1
WO1999048088A1 PCT/US1999/006072 US9906072W WO9948088A1 WO 1999048088 A1 WO1999048088 A1 WO 1999048088A1 US 9906072 W US9906072 W US 9906072W WO 9948088 A1 WO9948088 A1 WO 9948088A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
hyperlink
grammar
command
voice
Prior art date
Application number
PCT/US1999/006072
Other languages
French (fr)
Inventor
Jack H. Profit, Jr.
N. Gregg Brown
Original Assignee
Inroad, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inroad, Inc. filed Critical Inroad, Inc.
Priority to AU31045/99A priority Critical patent/AU3104599A/en
Publication of WO1999048088A1 publication Critical patent/WO1999048088A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • the present invention relates to the field of Web browsers and, in particular, to methods and systems for controlling a Web browser by the use of voice commands.
  • Intranets are local area network containing one or more Web servers and client computers operating in a manner similar to the Internet as described above. Typically, all of the computers interconnected via an intranet operate within a company or organization.
  • the protocols include the file transfer protocol (FTP) used for exchanging files and the hypertext transfer protocol (HTTP) used for accessing data on the World Wide Web, often referred to simply as "the Web.”
  • FTP file transfer protocol
  • HTTP hypertext transfer protocol
  • the Web is an information service on the Internet providing documents and hyperlinks between documents.
  • the Web is made up of numerous Web sites around the world that maintain and distribute electronic documents.
  • a Web site may use one or more Web server computers that store and distribute documents in various formats, including the hypertext markup language (HTML).
  • HTML hypertext markup language
  • HTML document contains text and metadata, that is, commands providing formatting information. HTML documents also include embedded "hyperlinks" that reference other data or documents located on any Web server computer. The referenced documents may represent text, graphics, audio, or video in respective formats.
  • a Web browser is a client application or operating system utility program that communicates with server computers via one or more Internet protocols, such as FTP and HTTP. Basically, Web browsers receive electronic HTML documents from server computers over the network and present them to users.
  • the HotJava Web browser available from Sun Microsystems, Palo Alto, California, is an example of a popular Web browser application.
  • the information includes company data residing on a network server, company Intranet information, or even information available on the Internet. These workers have been characterized as "locally mobile" workers.
  • a locally mobile production worker might need access to blueprints, reference manuals, and the like, to properly perform a particular job.
  • this worker would have to cease working, leave their workspace, obtain the information, and return to their workspace.
  • Some information may not be transportable. Even if the worker could return with the necessary information, the demands of the job may still make it extremely difficult for the worker to view the retrieved information while performing manual tasks at the same time.
  • the system and method would transmit data between a server computer and a mobile computer carried by the mobile worker.
  • a system and method wherein a user employs a browser program to view and enter information, and wherein voice commands are used to control the browser program.
  • a system and method would include a mechanism that allows a user to navigate between information pages and also allows a user to manipulate user interface controls by the use of voice commands.
  • such a system and method would include alternate mechanisms for creating a speech-recognition grammar.
  • One desirable mechanism includes the dynamic creation of a speech-recognition grammar after an information page is received by the user.
  • a second desirable mechanism includes precompiling the information page to create a speech-recognition grammar that is transmitted with the information page to the user's computer.
  • the present invention is directed to providing such a system and method with such associated mechanisms.
  • the present invention includes a voice-activated Web browser program executing on a wearable computer.
  • the browser program provides three mechanisms for allowing a user to employ voice commands to navigate pages.
  • a "speech hint," or index value corresponding to each hyperlink in a Web page is determined and displayed on the Web page.
  • a unique identifier, or index value is appended to the end of the hyperlink text.
  • the second mechanism when a voice command is received, a determination is made of whether the voice command corresponds to the text associated with a hyperlink on the current Web page. If the voice command corresponds to the text associated with a hyperlink, the associated hyperlink is activated to retrieve additional data.
  • a voice command causes a list of hyperlinks to be displayed. Each hyperlink is displayed with a corresponding index value.
  • all three mechanisms are presented to a user, providing a user with a choice of using any mechanism to control the browser program.
  • an external speech grammar referenced by the Web document is dynamically compiled by the Web browser after receiving a Web document.
  • the speech grammar is activated by the Web browser for use in processing subsequent voice commands whenever the Web document in question is displayed. This mechanism allows Web document developers to customize the speech features of a specific Web page.
  • a speech grammar corresponding to a Web document is compiled on a server computer and stored at the server computer.
  • the corresponding compiled speech grammar is transmitted to the Web browser.
  • the speech grammar is received at the browser and used to process voice commands pertaining to the Web page.
  • the present invention provides a mechanism for controlling a browser program executing on a wearable computer by the use of voice commands.
  • the invention provides flexibility to a user.
  • the mechanism also provides flexibility to a Web page author, who may optimally design the Web page to be used according to one or more of the mechanisms.
  • the invention also provides a mechanism for controlling a browser when a Web page author has not designed the Web page which might include voice-activated control.
  • the invention can be used when a Web page author has built a Web page with a speech grammar or when a Web page author has not built a speech grammar corresponding to the Web page.
  • FIGURE 1A is a block diagram of a wearable computer system for implementing the present invention
  • FIGURE 1 B is a pictorial illustration of the wearable computer system of
  • FIGURE 1 is a diagrammatic representation of FIG. 1 ;
  • FIGURE 2 illustrates an exemplary Web page displayed on a wearable computer, in accordance with the present invention
  • FIGURE 3 is a block diagram illustrating a system for implementing a voice-controlled Web browser in accordance with the present invention
  • FIGURE 4 is a block diagram illustrating an alternative system for implementing a voice-controlled Web browser
  • FIGURE 5 is a flow diagram illustrating a process of generating a speech- recognition grammar for use in a voice-controlled Web browser program
  • FIGURE 6 is a flow diagram illustrating the process of displaying a Web document and handling a voice command, in accordance with the present invention.
  • the present invention is a mechanism and method for implementing a voice- controlled Web browser program executing on a wearable computer that communicates with one or more server computers.
  • the mechanism and method of the invention generate a voice-recognition grammar.
  • the mechanism and method of the invention utilize the voice-recognition grammar to determine which command was received and the received command is used to control and manipulate the Web browser program.
  • a Web browser program executes on a wearable computer.
  • FIGURE 1A and the following discussion are intended to provide a brief, general description of a wearable computer upon which the invention may be implemented.
  • an exemplary system for implementing the invention includes a wearable computer 102, including a central processing unit 104 -6-
  • the system memory 106 may include both volatile and nonvolatile memory (not shown).
  • a second bus such as a PCI bus 110, communicates with the system bus 108 and transfers data to and from peripheral components.
  • a video controller 1 12 connected to the PCI bus 1 10 controls the display of information on a video screen 114.
  • An audio controller 116 connected to the PCI bus 110 controls a speaker device 1 18.
  • the speaker device 1 18 may optionally be built into a headset 134 (shown in FIGURE IB).
  • the audio controller 1 16 also receives inputs from a microphone 120.
  • the wearable computer 102 includes various other components, such as a power supply and a system clock, that are not illustrated in FIGURE 1.
  • a wearable computer system for use with the present invention is described in commonly assigned U.S. Patent Application, Serial No. 09/045,260, pending, the disclosure of which is incorporated herein by reference in its entirety.
  • FIGURE IB illustrates an embodiment of a wearable computer 102 that is used to implement the present invention.
  • a CPU 104 and a memory 106 are contained within a base unit 130 that may be attached to a belt 132.
  • a headset 134 includes a speaker device 118, a display screen 114, and a microphone 120.
  • the wearable computer 102 communicates with a server computer (not shown).
  • the server computer transmits Web documents, such as HTML documents, to the wearable computer 102, which displays the documents to a user.
  • the documents are displayed on the display screen 1 14.
  • the wearable computer may also play audio data via the speaker device 118.
  • the video screen 114 is not present or is inactive.
  • the video screen 1 14 may also be employed to selectively present Web documents, while other select Web documents are played only as audio data via the speaker device 118.
  • FIGURE 2 illustrates an exemplary Web document 150 that is displayed on the video screen 114.
  • the Web document 150 contains hyperlinks, which each include a representative symbol, such as text or a graphic symbol.
  • the symbol may also be an audio signal that is presented to the user.
  • the representative symbol is referred to as an "anchor tag" 152.
  • Each hyperlink also includes an -7-
  • a Uniform Resource Locator is one form of addressing that is commonly used in Web documents.
  • An address can be a file system pathname or other value used to indicate the location of additional data.
  • the present invention includes three mechanisms that allow a user to employ voice commands to navigate Web pages: speakable indices, an index menu, and "speakable hyperlinks.” Preferably, all three mechanisms are included, and a user has the option of using one or more of the mechanisms.
  • the speakable indices mechanism includes a speech-specific parser 206
  • the speech hint 154 is an index number and is inserted immediately before each corresponding anchor tag 152.
  • the superscripted index number is incremented with each successive anchor tag 152, so duplicate index numbers do not occur.
  • the speech hint 154 appears before each hyperlink anchor tag 152.
  • the speakable indexing feature is preferably enabled and disabled via two speakable commands: "index enable” and "index disable.”
  • a user speaks the words "index enable”
  • the speakable index feature is enabled.
  • a user speaks the words "index disable”
  • the speakable index feature is disabled.
  • this feature allows a user to speak a hyperlink tag's unique index number to follow the hyperlink.
  • a speech-recognition engine 212 shown in FIGURE 3
  • the index menu mechanism provides a second method of following hyperlinks.
  • the mechanism and method of the invention displays a dialog box to the user.
  • the dialog box includes a scrollable list of hyperlinks and their associated unique indices. A user may navigate this list using verbal scrolling commands, or may speak the unique index number corresponding to the hyperlink that they wish to follow.
  • a user speaks the contents of a hyperlink anchor tag 152.
  • the speech-recognition engine 212 (shown in FIGURE 3) generates a corresponding speech event that is translated into a user command to follow the corresponding hyperlink.
  • An HTML rendering engine 208 navigates to linked Web content based on the user selection.
  • a Web page author anticipating the use of speakable hyperlinks creates a Web page that does not have two hyperlink anchor tags that may sound similar.
  • Controls and images also have corresponding speech hints.
  • selection controls 156 have corresponding speech hints 158.
  • Edit controls 160 have corresponding speech hints 162.
  • the image 164 has a corresponding speech hint 166 positioned at the upper left corner.
  • Activating a control sets the focus of the browser to the control, so that additional voice input is directed to the control.
  • the use of speech grammars to select controls is similar to the use of speech grammars to select hyperlinks.
  • the present invention provides a mechanism and method for dynamically generating speech grammars upon receipt of Web pages, and a mechanism and method for pre-compiling speech grammars prior to transmitting Web pages to the wearable computer.
  • FIGURE 3 is a functional block diagram which illustrates components of a wearable computer system 200 that dynamically generates speech grammars upon receipt of Web pages.
  • An HTML parser 204 receives an HTML document from the Internet or an intranet 202 and parses the document content to generate an internal representation 205 of the HTML document.
  • the internal representation 205 is passed to a speech-specific parser 206.
  • the speech-specific parser 206 locates hyperlinks or other interactive controls that may be the target of a voice command and generates speech grammars 209.
  • the speech- specific parser 206 also generates visual speech hints 154 (shown in FIGURE 2).
  • the revised internal representation 207 of the HTML document 205 is passed to an HTML rendering engine 208.
  • the HTML rendering engine 208 generates a visual Web page 150 (shown in FIGURE 2) based upon the revised internal representation of the HTML document 207.
  • the visual Web page 150 is displayed on the display screen 114 (shown in FIGURE 1 A).
  • Speech grammars 209 are generated from the HTML text by the speech- specific parser 206 and are passed to a speech grammar compiler 210.
  • the speech grammar compiler 210 translates the speech grammars 209 into a compiled speech grammar 211 that is used by a grammar-based speech-recognition engine 212.
  • Many speech engine providers including IBM and Lernout & Hauspie, provide grammar compilers with their speech engine products.
  • the speech-recognition engine 212 receives static, or precompiled, grammars 214 that are used for controlling the Web browser.
  • grammars 214 include browser commands that are not Web page specific, such as "back" and "forward.”
  • the speech-recognition engine 212 receives voice audio input from the microphone 120 and uses the compiled speech grammars 211 and static speech grammars 214 to determine the command or text spoken into the microphone 120.
  • Via Voice a product licensed by IBM, is a commercially available speech-recognition engine that can be used as the speech-recognition engine 212 in the present invention.
  • the speech-recognition engine 212 In response to voice audio input, the speech-recognition engine 212 generates speech events 213.
  • the speech events 213 generated by the speech-recognition engine 212 are handled by corresponding software speech controls 218.
  • the speech controls 218 translate the speech events 213 into user commands 215, which are passed to the HTML rendering engine 208.
  • the HTML rendering engine 208 performs an action corresponding to the user commands 215. For example, if a user command 215 designates that a particular hyperlink has been selected by a voice audio input, the HTML rendering engine 208 performs the action of retrieving the Web page corresponding to the hyperlink.
  • GUI controls 216 can receive input from a mechanical device, such as a mouse, or other control (not shown).
  • the GUI controls 216 generate user commands 217, which are passed to the HTML rendering engine 208 for appropriate handling, as described above.
  • the speech controls 218 may also generate audio prompts 218, which are presented to the user via a headset or other speaker device 118.
  • FIGURE 4 is a functional block diagram which illustrates a voice-controlled Web browser system 300 that uses precompiled speech-recognition grammars 214.
  • the system 300 is similar to the system 200 illustrated in FIGURE 3. The following discussion describes the important differences between the system 200 which uses dynamically generated speech-recognition grammars 209 and the system 300 which uses precompiled speech-recognition grammars 214.
  • a speech grammar compiler such as the speech grammar compiler 210 illustrated in FIGURE 3, is used to generate a voice- recognition grammar at a Web server operating over the Internet or an intranet 202.
  • a speech-specific parser 206 on the wearable computer receives an internal representation of the HTML document 205 and one or more previously compiled speech grammars 302. The speech-specific parser 206 passes the received speech grammars 213 to the grammar-based speech-recognition engine 212.
  • the recognition engine 212 receives compiled speech grammars similar to the dynamic grammar system 200 of FIGURE 3.
  • the precompiled grammar system 300 need not include the speech grammar compiler 210 of FIGURE 3.
  • the speech-specific parser 206 passes the revised internal representation of the HTML document 207 to the HTML rendering engine 208.
  • the HTML rendering engine 208, the GUI controls 216, the speech controls 214, and the speech-recognition engine 212 perform operations as described above with respect to the dynamic grammar system 202 of FIGURE 3.
  • commands pertaining to speech grammars are embedded within HTML comment fields.
  • the following HTML code segment shows, by way of example, instructions used to specify the location of a dictionary and grammar files.
  • the name field has two valid values. One value is the name of the element to which the grammar is attached. Currently, this value is for form fields only. The other value is the word "document.” Use of the word "document” associates the grammar to a document level context.
  • the name field is optional and, if not specified, the grammar is considered to be a document level grammar.
  • FIGURE 2 illustrates a portion of an exemplary Web document 150 that is displayed on the display screen 1 14 (FIGURE 1 A) in response to receiving a corresponding HTML document.
  • An HTML code segment for the corresponding HTML document is listed below.
  • These grammar files are retrieved by the speech-specific parser 206 in the precompiled grammar system 302 of FIGURE 4.
  • a grammar compiler can be used by a document author to prepare an HTML document for speech recognition.
  • IBM's Via Voice discussed above, is a grammar compiler that accepts HTML documents and supporting grammar files as input.
  • the toolkit produces speech-enabled HTML documents, grammar files, and dictionary files.
  • FIGURE 5 illustrates a process 502 of dynamically generating and compiling speech-recognition grammars in accordance with the present invention.
  • a new HTML document is received and loaded into the HTML parser 204 (shown in FIGURE 3).
  • the HTML parser 204 parses the HTML instructions within the newly received HTML document.
  • the HTML parser 204 creates an internal representation 205 of the HTML document.
  • the internal representation includes one or more parse tags.
  • the internal representation is then passed to the speech-specific parser 206. -13-
  • the speech-specific parser 206 retrieves a parse tag from the internal representation of the HTML document.
  • Speakable HTML entities include anchors, image maps, applets, inputs, and select items. If the current parse tag does not represent a speakable entity, processing proceeds to step 516, where a determination is made of whether the current parse tag is the last parse tag of the current HTML document. If the tag is not the last parse tag, processing returns to step 508 to retrieve the next parse tag.
  • the speech-specific parser 206 determines that the current parse tag represents a speakable entity, at step 512, a new rule for the dynamic grammar is created, or, at step 514, an existing rule is appended.
  • the rule adheres to the form:
  • ⁇ rulename> "Goto link number ⁇ n>”.
  • the rule is subsequently used for numerical index navigation. This form is the format used for specifying a grammar rule.
  • the set of rules is then compiled using the grammar compiler, described above.
  • processing proceeds to step 516 to determine whether the current parse tag is the last parse tag of the HTML document, as described above. If the tag is not the last parse tag, flow control proceeds back to step 508 to retrieve and process the next parse tag. If, at step 516, the speech-specific parser 206 determines that the current parse tag is the last parse tag, processing proceeds to step 518 where the speech grammar compiler 210 compiles the generated rules into a compiled speech grammar.
  • the generated rules are in the form of ASCII text
  • the compiled speech grammar is a machine representation specific to the speech-recognition engine 212.
  • FIGURE 6 illustrates a process 601 that is performed on a wearable computer for displaying a Web document and handling a voice command.
  • the wearable computer receives a Web document containing one or more hyperlinks.
  • an ordered list of hyperlinks within the Web document is determined.
  • the Web document is displayed at the wearable computer.
  • a voice command is received from a user.
  • the command is an index menu command
  • a list of hyperlinks and their corresponding index numbers is displayed.
  • the list is displayed within a dialog window.
  • the list may be presented as speech over the wearable computer speakers.
  • a voice command is received.
  • a determination is made of the hyperlink corresponding to the voice command. If, at step 610, the voice command is not an index menu command, processing proceeds to step 616 to determine a corresponding hyperlink.
  • Step 616 may include determining whether the text of a hyperlink was spoken, or whether the index number corresponding to a hyperlink was spoken.
  • the hyperlink is activated. Activation of the hyperlink may include retrieving a new Web document. Alternatively, activation may comprise displaying a different portion of the same Web document.

Abstract

A system and method for implementing a voice-controlled Web browser program executing on a wearable computer is disclosed. A Web document is received (602) at the wearable computer, and processed to dynamically generate a speech grammar (604). The speech grammar is used to recognize voice commands (616) at the wearable computer. Alternatively, a Web document is precompiled at a server computer to generate a speech grammar, and the speech grammar is transmitted with its corresponding Web document to the wearable computer. The wearable computer provides three mechanisms for a user to navigate Web pages by the use of voice. In one mechanism, an index value corresponding to each hyperlink is appended to the hyperlink text and displayed to the user (612). The user may speak the index value to activate the corresponding hyperlink (614). In a second mechanism, the user can speak the test of the hyperlink to activate the hyperlink (616). In a third mechanism, the user invokes a command to display a dialog window having a list of hyperlinks and their corresponding index values. The user can speak an index value or a hyperlink to activate the hyperlink (618).

Description

VOICE CONTROLLED WEB BROWSER
This application claims the benefit of U.S. Provisional Application No. 60/078,937, filed March 20, 1998.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Field of the Invention The present invention relates to the field of Web browsers and, in particular, to methods and systems for controlling a Web browser by the use of voice commands.
Background of the Invention
In recent years, there has been a tremendous proliferation of computers connected to a global network known as the Internet. "Client" computers download and upload digital information from "server" computers via the Internet. Client application software executing on such client computers typically accept commands from users and obtain data and services from the server computers by sending requests to server applications running on the server computers. Intranets are local area network containing one or more Web servers and client computers operating in a manner similar to the Internet as described above. Typically, all of the computers interconnected via an intranet operate within a company or organization.
A number of specialized protocols are used for exchanging commands and data between computers interconnected via the Internet, as are well known in the art. -2-
The protocols include the file transfer protocol (FTP) used for exchanging files and the hypertext transfer protocol (HTTP) used for accessing data on the World Wide Web, often referred to simply as "the Web."
The Web is an information service on the Internet providing documents and hyperlinks between documents. The Web is made up of numerous Web sites around the world that maintain and distribute electronic documents. A Web site may use one or more Web server computers that store and distribute documents in various formats, including the hypertext markup language (HTML).
An HTML document contains text and metadata, that is, commands providing formatting information. HTML documents also include embedded "hyperlinks" that reference other data or documents located on any Web server computer. The referenced documents may represent text, graphics, audio, or video in respective formats.
A Web browser is a client application or operating system utility program that communicates with server computers via one or more Internet protocols, such as FTP and HTTP. Basically, Web browsers receive electronic HTML documents from server computers over the network and present them to users. The HotJava Web browser, available from Sun Microsystems, Palo Alto, California, is an example of a popular Web browser application. There are many jobs in manufacturing and other industries where workers require access to information available through a computer terminal, but must also move around and work with their hands. The information includes company data residing on a network server, company Intranet information, or even information available on the Internet. These workers have been characterized as "locally mobile" workers.
By way of example, a locally mobile production worker might need access to blueprints, reference manuals, and the like, to properly perform a particular job. At present, to retrieve such information, this worker would have to cease working, leave their workspace, obtain the information, and return to their workspace. Some information may not be transportable. Even if the worker could return with the necessary information, the demands of the job may still make it extremely difficult for the worker to view the retrieved information while performing manual tasks at the same time.
By way of further example, many inventory-based jobs require the creation and physical dissemination of tremendous amounts of paperwork such as in, for example, a distribution center, a worker may fill out invoice documents, and then send copies of the documents to the shipping department, the accounting department, the production department, and others. Such jobs have the potential for errors in the initial creation of such information. Additionally, the created information can be lost in the subsequent dissemination process. Time delay in disseminating such information may pose an additional problem.
Therefore, it would be desirable to provide a system and method for enabling a locally mobile worker with access to information needed to perform a job, without a worker having to leave the workspace. There is also a need for providing the locally mobile worker with a way to review the information while simultaneously performing the job. Preferably, the system and method would transmit data between a server computer and a mobile computer carried by the mobile worker.
It would be further desirable to have a system and method wherein a user employs a browser program to view and enter information, and wherein voice commands are used to control the browser program. Preferably, such a system and method would include a mechanism that allows a user to navigate between information pages and also allows a user to manipulate user interface controls by the use of voice commands.
Preferably, such a system and method would include alternate mechanisms for creating a speech-recognition grammar. One desirable mechanism includes the dynamic creation of a speech-recognition grammar after an information page is received by the user. A second desirable mechanism includes precompiling the information page to create a speech-recognition grammar that is transmitted with the information page to the user's computer. The present invention is directed to providing such a system and method with such associated mechanisms.
Summary of the Invention The present invention includes a voice-activated Web browser program executing on a wearable computer. The browser program provides three mechanisms for allowing a user to employ voice commands to navigate pages. In one mechanism, a "speech hint," or index value corresponding to each hyperlink in a Web page is determined and displayed on the Web page. Preferably, a unique identifier, or index value, is appended to the end of the hyperlink text. When a voice command is received, a determination is made of whether the voice command corresponds to the index value. If the voice command corresponds to an index value, the hyperlink corresponding to the index value is activated to retrieve additional data. -4-
In the second mechanism, when a voice command is received, a determination is made of whether the voice command corresponds to the text associated with a hyperlink on the current Web page. If the voice command corresponds to the text associated with a hyperlink, the associated hyperlink is activated to retrieve additional data.
In the third mechanism, a voice command causes a list of hyperlinks to be displayed. Each hyperlink is displayed with a corresponding index value. In response to receiving a voice command, a determination is made of whether the voice command corresponds to either hyperlink text or an index value corresponding to a hyperlink. If a match is found, the corresponding hyperlink is activated to retrieve additional data. Preferably, all three mechanisms are presented to a user, providing a user with a choice of using any mechanism to control the browser program.
In an additional aspect of the invention, an external speech grammar referenced by the Web document is dynamically compiled by the Web browser after receiving a Web document. The speech grammar is activated by the Web browser for use in processing subsequent voice commands whenever the Web document in question is displayed. This mechanism allows Web document developers to customize the speech features of a specific Web page.
In another aspect of the invention, a speech grammar corresponding to a Web document is compiled on a server computer and stored at the server computer. When a Web document is transmitted from the server computer to the Web browser, the corresponding compiled speech grammar is transmitted to the Web browser. The speech grammar is received at the browser and used to process voice commands pertaining to the Web page. This mechanism is similar to the one described in the previous paragraph. The main difference is that this mechanism takes advantage of the high performance of the server machine in compiling specific grammars.
The present invention provides a mechanism for controlling a browser program executing on a wearable computer by the use of voice commands. By providing five different mechanisms, the invention provides flexibility to a user. The mechanism also provides flexibility to a Web page author, who may optimally design the Web page to be used according to one or more of the mechanisms. The invention also provides a mechanism for controlling a browser when a Web page author has not designed the Web page which might include voice-activated control.
By providing a mechanism for pre-compiling speech grammars and a mechanism for dynamically compiling speech grammars, the invention can be used when a Web page author has built a Web page with a speech grammar or when a Web page author has not built a speech grammar corresponding to the Web page.
Brief Description of the Drawings The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated and better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
FIGURE 1A is a block diagram of a wearable computer system for implementing the present invention; FIGURE 1 B is a pictorial illustration of the wearable computer system of
FIGURE 1 ;
FIGURE 2 illustrates an exemplary Web page displayed on a wearable computer, in accordance with the present invention;
FIGURE 3 is a block diagram illustrating a system for implementing a voice- controlled Web browser in accordance with the present invention;
FIGURE 4 is a block diagram illustrating an alternative system for implementing a voice-controlled Web browser;
FIGURE 5 is a flow diagram illustrating a process of generating a speech- recognition grammar for use in a voice-controlled Web browser program; and FIGURE 6 is a flow diagram illustrating the process of displaying a Web document and handling a voice command, in accordance with the present invention. Detailed Description of the Preferred Embodiment The present invention is a mechanism and method for implementing a voice- controlled Web browser program executing on a wearable computer that communicates with one or more server computers. The mechanism and method of the invention generate a voice-recognition grammar. Upon receipt of a voice command, the mechanism and method of the invention utilize the voice-recognition grammar to determine which command was received and the received command is used to control and manipulate the Web browser program. In accordance with the present invention, a Web browser program executes on a wearable computer. FIGURE 1A and the following discussion are intended to provide a brief, general description of a wearable computer upon which the invention may be implemented.
With reference to FIGURE 1A, an exemplary system for implementing the invention includes a wearable computer 102, including a central processing unit 104 -6-
(CPU), a system memory 106, and a system bus 108 that couples various system components, including the system memory 106, to the processing unit 104. The system memory 106 may include both volatile and nonvolatile memory (not shown). A second bus, such as a PCI bus 110, communicates with the system bus 108 and transfers data to and from peripheral components. A video controller 1 12 connected to the PCI bus 1 10 controls the display of information on a video screen 114. An audio controller 116 connected to the PCI bus 110 controls a speaker device 1 18. The speaker device 1 18 may optionally be built into a headset 134 (shown in FIGURE IB). The audio controller 1 16 also receives inputs from a microphone 120.
The wearable computer 102 includes various other components, such as a power supply and a system clock, that are not illustrated in FIGURE 1. A wearable computer system for use with the present invention is described in commonly assigned U.S. Patent Application, Serial No. 09/045,260, pending, the disclosure of which is incorporated herein by reference in its entirety.
FIGURE IB illustrates an embodiment of a wearable computer 102 that is used to implement the present invention. A CPU 104 and a memory 106 (shown in FIGURE 1A) are contained within a base unit 130 that may be attached to a belt 132. A headset 134 includes a speaker device 118, a display screen 114, and a microphone 120.
The wearable computer 102 communicates with a server computer (not shown). The server computer transmits Web documents, such as HTML documents, to the wearable computer 102, which displays the documents to a user. In one embodiment, the documents are displayed on the display screen 1 14. The wearable computer may also play audio data via the speaker device 118. In an alternate embodiment, the video screen 114 is not present or is inactive. The video screen 1 14 may also be employed to selectively present Web documents, while other select Web documents are played only as audio data via the speaker device 118.
In response to the presentation of a Web document, a user may control the presentation of additional data and may transmit data to the server computer using voice commands. FIGURE 2 illustrates an exemplary Web document 150 that is displayed on the video screen 114. The Web document 150 contains hyperlinks, which each include a representative symbol, such as text or a graphic symbol. The symbol may also be an audio signal that is presented to the user. The representative symbol is referred to as an "anchor tag" 152. Each hyperlink also includes an -7-
embedded address (not shown) corresponding to additional data. When a user selects a hyperlink, the additional data corresponding to the associated address is retrieved and presented to the user. A Uniform Resource Locator (URL) is one form of addressing that is commonly used in Web documents. An address can be a file system pathname or other value used to indicate the location of additional data.
The present invention includes three mechanisms that allow a user to employ voice commands to navigate Web pages: speakable indices, an index menu, and "speakable hyperlinks." Preferably, all three mechanisms are included, and a user has the option of using one or more of the mechanisms. The speakable indices mechanism includes a speech-specific parser 206
(shown in FIGURES 3 and 4) that dynamically inserts a visual speech hint 154 next to each HTML anchor tag 152. Preferably, the speech hint 154 is an index number and is inserted immediately before each corresponding anchor tag 152. The superscripted index number is incremented with each successive anchor tag 152, so duplicate index numbers do not occur. As illustrated in FIGURE 2, the speech hint 154 appears before each hyperlink anchor tag 152.
The speakable indexing feature is preferably enabled and disabled via two speakable commands: "index enable" and "index disable." When a user speaks the words "index enable," the speakable index feature is enabled. When a user speaks the words "index disable," the speakable index feature is disabled. When enabled, this feature allows a user to speak a hyperlink tag's unique index number to follow the hyperlink. When an index number is spoken, a speech-recognition engine 212 (shown in FIGURE 3) generates a corresponding speech event, which is translated into a user command to follow the corresponding hyperlink. The index menu mechanism provides a second method of following hyperlinks. When a user speaks the phrase "index menu," the mechanism and method of the invention displays a dialog box to the user. The dialog box includes a scrollable list of hyperlinks and their associated unique indices. A user may navigate this list using verbal scrolling commands, or may speak the unique index number corresponding to the hyperlink that they wish to follow.
To employ the speakable hyperlinks mechanism, a user speaks the contents of a hyperlink anchor tag 152. In response, the speech-recognition engine 212 (shown in FIGURE 3) generates a corresponding speech event that is translated into a user command to follow the corresponding hyperlink. An HTML rendering engine 208 navigates to linked Web content based on the user selection. Preferably, a Web page author anticipating the use of speakable hyperlinks creates a Web page that does not have two hyperlink anchor tags that may sound similar.
Controls and images also have corresponding speech hints. For example, as depicted in FIGURE 2, selection controls 156 have corresponding speech hints 158. Edit controls 160 have corresponding speech hints 162. The image 164 has a corresponding speech hint 166 positioned at the upper left corner. Activating a control sets the focus of the browser to the control, so that additional voice input is directed to the control. The use of speech grammars to select controls is similar to the use of speech grammars to select hyperlinks. Thus, the present invention provides a mechanism and method for dynamically generating speech grammars upon receipt of Web pages, and a mechanism and method for pre-compiling speech grammars prior to transmitting Web pages to the wearable computer. FIGURE 3 is a functional block diagram which illustrates components of a wearable computer system 200 that dynamically generates speech grammars upon receipt of Web pages. An HTML parser 204 receives an HTML document from the Internet or an intranet 202 and parses the document content to generate an internal representation 205 of the HTML document. The internal representation 205 is passed to a speech-specific parser 206. The speech-specific parser 206 locates hyperlinks or other interactive controls that may be the target of a voice command and generates speech grammars 209. The speech- specific parser 206 also generates visual speech hints 154 (shown in FIGURE 2). The revised internal representation 207 of the HTML document 205, with the visual speech hints generated by the speech-specific parser 206, is passed to an HTML rendering engine 208. The HTML rendering engine 208 generates a visual Web page 150 (shown in FIGURE 2) based upon the revised internal representation of the HTML document 207. The visual Web page 150 is displayed on the display screen 114 (shown in FIGURE 1 A).
Speech grammars 209 are generated from the HTML text by the speech- specific parser 206 and are passed to a speech grammar compiler 210. The speech grammar compiler 210 translates the speech grammars 209 into a compiled speech grammar 211 that is used by a grammar-based speech-recognition engine 212. Many speech engine providers, including IBM and Lernout & Hauspie, provide grammar compilers with their speech engine products. In addition to the compiled speech grammars, the speech-recognition engine 212 receives static, or precompiled, grammars 214 that are used for controlling the Web browser. The static -9-
grammars 214 include browser commands that are not Web page specific, such as "back" and "forward." The speech-recognition engine 212 receives voice audio input from the microphone 120 and uses the compiled speech grammars 211 and static speech grammars 214 to determine the command or text spoken into the microphone 120. Via Voice, a product licensed by IBM, is a commercially available speech-recognition engine that can be used as the speech-recognition engine 212 in the present invention.
In response to voice audio input, the speech-recognition engine 212 generates speech events 213. The speech events 213 generated by the speech-recognition engine 212 are handled by corresponding software speech controls 218. The speech controls 218 translate the speech events 213 into user commands 215, which are passed to the HTML rendering engine 208. In response to the receipt of user commands 215, the HTML rendering engine 208 performs an action corresponding to the user commands 215. For example, if a user command 215 designates that a particular hyperlink has been selected by a voice audio input, the HTML rendering engine 208 performs the action of retrieving the Web page corresponding to the hyperlink. Similarly, graphical user interface (GUI) controls 216, such as buttons or menus, can receive input from a mechanical device, such as a mouse, or other control (not shown). The GUI controls 216 generate user commands 217, which are passed to the HTML rendering engine 208 for appropriate handling, as described above. The speech controls 218 may also generate audio prompts 218, which are presented to the user via a headset or other speaker device 118.
FIGURE 4 is a functional block diagram which illustrates a voice-controlled Web browser system 300 that uses precompiled speech-recognition grammars 214. The system 300 is similar to the system 200 illustrated in FIGURE 3. The following discussion describes the important differences between the system 200 which uses dynamically generated speech-recognition grammars 209 and the system 300 which uses precompiled speech-recognition grammars 214.
In the precompiled system 300, a speech grammar compiler, such as the speech grammar compiler 210 illustrated in FIGURE 3, is used to generate a voice- recognition grammar at a Web server operating over the Internet or an intranet 202. A speech-specific parser 206 on the wearable computer receives an internal representation of the HTML document 205 and one or more previously compiled speech grammars 302. The speech-specific parser 206 passes the received speech grammars 213 to the grammar-based speech-recognition engine 212. The speech- -10-
recognition engine 212 receives compiled speech grammars similar to the dynamic grammar system 200 of FIGURE 3. The precompiled grammar system 300 need not include the speech grammar compiler 210 of FIGURE 3.
As with the dynamic grammar system 200 of FIGURE 3, the speech-specific parser 206 passes the revised internal representation of the HTML document 207 to the HTML rendering engine 208. The HTML rendering engine 208, the GUI controls 216, the speech controls 214, and the speech-recognition engine 212 perform operations as described above with respect to the dynamic grammar system 202 of FIGURE 3. In one embodiment of the invention, commands pertaining to speech grammars are embedded within HTML comment fields. The following HTML code segment shows, by way of example, instructions used to specify the location of a dictionary and grammar files.
<!-InroadSpeechGrammar(%text)-> <!-InroadSpeechGrammar name = %NameOfElement grammar = %URL dictionary = % URL -> VoNameOfElement specifies the name of an HTML element and corresponds to the name in an HTML hyperlink anchor tag 152. The name field has two valid values. One value is the name of the element to which the grammar is attached. Currently, this value is for form fields only. The other value is the word "document." Use of the word "document" associates the grammar to a document level context. The name field is optional and, if not specified, the grammar is considered to be a document level grammar.
A document may contain InroadSpeechGrammar references. Multiple references are loaded and attached to the context for their containing document or attached to the context for the appropriate form field. FIGURE 2 illustrates a portion of an exemplary Web document 150 that is displayed on the display screen 1 14 (FIGURE 1 A) in response to receiving a corresponding HTML document. An HTML code segment for the corresponding HTML document is listed below.
<HTML> -1 1-
<HEAD>
<TITLE> Bridge HTML (Symantec 1.1)</TITLE> </HEAD> <BODY> <!-InroadSpeechGrammar grammar = http://www.inroad.site/test.std dictionary = http://www.inroad.site/inroad.phd-> <!-InroadSpeechGrammar grammar = http://www.inroad.site/test2.std dictionary = http://www.inroad.site/inroad.phd->
<!-InroadSpeechGrammar name = flavor grammar = http://www.inroad.site/dropdn.std dictionary = http://www.inroad.site/inroad.phd-> <!-InroadSpeechGrammar name = age grammar = http://www.inroad.site/dropage.std dictionary = http://www.inroad.site/inroad.phd->
<P> This page is a test page to load grammars for the inroad browser. </P> <a href =
"http://espnet.sportszone.com">SPORTSZONE</a>< p>
<a href = "http://www.unitedmedia.com/comics/dilbert" >DILBERT ZONE</a></p>
<a href="http://www.usatoday.com">USA TODAY</a>
<SELECT NAME="flavor"> <OPTION VALUE=a> Vanilla <OPTION VALUE=b>Strawberry >OPTION VALUE=θRum and Raisin <OPTION VALUE=d>Peach and Orange
</SELECT>
<!-InroadSpeechGrammar name = first grammar = http://www.inroad.site/first.std dictionary = http://www.inroad.site/inroad.phd-> -12-
<!-InroadSpeechGrammar name = last grammar = http://www.inroad.site/last.std dictionary = http://www.inroad.site/inroad.phd->
<INPUT type = text name=first size=l 2 maxlength=40>
<INPUT type=text name=last size=12 maxlength=40>
<SELECT NAME="age"> <OPTION VALUE=a> 10 to 32
<OPTION VALUE=b>33 to 50
<OPTION VALUE=θ51 to 74
<OPTION VALUE=d>75 to death
</SELECT>
</BODY>
</HTML>
In the exemplary HTML code segment listed above, each "grammar=" reference specifies a URL that designates the location of a grammar file. These grammar files are retrieved by the speech-specific parser 206 in the precompiled grammar system 302 of FIGURE 4.
At the Web server, a grammar compiler can be used by a document author to prepare an HTML document for speech recognition. IBM's Via Voice, discussed above, is a grammar compiler that accepts HTML documents and supporting grammar files as input. The toolkit produces speech-enabled HTML documents, grammar files, and dictionary files.
FIGURE 5 illustrates a process 502 of dynamically generating and compiling speech-recognition grammars in accordance with the present invention. At step 504, a new HTML document is received and loaded into the HTML parser 204 (shown in FIGURE 3). At step 506, the HTML parser 204 parses the HTML instructions within the newly received HTML document. At step 506, the HTML parser 204 creates an internal representation 205 of the HTML document. The internal representation includes one or more parse tags. The internal representation is then passed to the speech-specific parser 206. -13-
At step 508, the speech-specific parser 206 retrieves a parse tag from the internal representation of the HTML document. At step 510, a determination is made of whether the retrieved parse tag represents a speakable entity which includes an anchor, form field, text area, button, radio button, check box, and so forth. Speakable HTML entities include anchors, image maps, applets, inputs, and select items. If the current parse tag does not represent a speakable entity, processing proceeds to step 516, where a determination is made of whether the current parse tag is the last parse tag of the current HTML document. If the tag is not the last parse tag, processing returns to step 508 to retrieve the next parse tag. If, at step 510, the speech-specific parser 206 determines that the current parse tag represents a speakable entity, at step 512, a new rule for the dynamic grammar is created, or, at step 514, an existing rule is appended. The rule adheres to the form:
<rulename> = "Goto link number <n>". The rule is subsequently used for numerical index navigation. This form is the format used for specifying a grammar rule. The set of rules is then compiled using the grammar compiler, described above.
After creating a rule, processing proceeds to step 516 to determine whether the current parse tag is the last parse tag of the HTML document, as described above. If the tag is not the last parse tag, flow control proceeds back to step 508 to retrieve and process the next parse tag. If, at step 516, the speech-specific parser 206 determines that the current parse tag is the last parse tag, processing proceeds to step 518 where the speech grammar compiler 210 compiles the generated rules into a compiled speech grammar. In one embodiment, the generated rules are in the form of ASCII text, and the compiled speech grammar is a machine representation specific to the speech-recognition engine 212. After compiling the rules to create a compiled speech grammar, the process 502 of dynamically compiling a speech-recognition grammar is complete.
FIGURE 6 illustrates a process 601 that is performed on a wearable computer for displaying a Web document and handling a voice command. At step 602, the wearable computer receives a Web document containing one or more hyperlinks. At step 604, an ordered list of hyperlinks within the Web document is determined. At step 606, the Web document is displayed at the wearable computer.
At step 608, a voice command is received from a user. At step 610, a determination is made of whether the voice command is to display an index menu. If -14-
the command is an index menu command, at step 612, a list of hyperlinks and their corresponding index numbers is displayed. Preferably, the list is displayed within a dialog window. Alternatively, the list may be presented as speech over the wearable computer speakers. After displaying the list of hyperlinks, at step 614, a voice command is received. At step 616, a determination is made of the hyperlink corresponding to the voice command. If, at step 610, the voice command is not an index menu command, processing proceeds to step 616 to determine a corresponding hyperlink. Step 616 may include determining whether the text of a hyperlink was spoken, or whether the index number corresponding to a hyperlink was spoken. After determining that a hyperlink corresponding to the voice command is present, at step 618, the hyperlink is activated. Activation of the hyperlink may include retrieving a new Web document. Alternatively, activation may comprise displaying a different portion of the same Web document.
While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.

Claims

-15-The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A system for controlling a web browser using a dynamically generated speech grammar, comprising: a speech specific parser parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser; a visual speech hint generator adding a visual speech hint to the web page script for each such hyperlink and visual control; a speech grammar compiler generating a speech grammar for each such hyperlink and visual control; a voice audio input device; a grammar based speech recognition engine translating the speech grammar into a compiled speech grammar and determining a speech event from the voice audio input device using the compiled speech grammar; and a rendering engine executing the speech event on a visual web page rendered from the modified web page script.
2. system according to Claim 1, further comprising: a graphical user interface controller translating an input graphical user interface control into a user command, the rendering engine performing an action corresponding to the user command.
3. A system according to Claim 1, further comprising: a speech controller translating the speech event into a user command, the rendering engine performing an action corresponding to the user command.
4. A system according to Claim 3, further comprising: a speaker device, wherein the speech controller generates audio prompts using the compiled speech grammar for playing to the user over the speaker device.
5. A system according to Claim 3, wherein the user command is a web browser command.
6. A system according to Claim 1 , wherein the speech event is a text message for input into a field within the web page. -16-
7. A system according the Claim 1, further comprising: a precompiled speech grammar, the grammar based speech recognition engine determining a speech event from the voice audio input using the precompiled speech grammar.
8. A system according to Claim 7, further comprising: a speaker device; a speech controller generating audio prompts using the precompiled speech grammar for playing to the user over the speaker device.
9. A system according to Claim 7, wherein the precompiled speech grammar includes non-web page specific web browser commands.
10. A method for controlling a web browser using a dynamically generated speech grammar, comprising: parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser; adding a visual speech hint to the web page script for each such hyperlink and visual control; generating a speech grammar for each such hyperlink and visual control; translating the speech grammar into a compiled speech grammar; determining a speech event from voice audio input using the compiled speech grammar; and executing the speech event on a visual web page rendered from the modified web page script.
1 1. A method according to Claim 10, the operation of executing the speech event further comprises: receiving a graphical user interface control; translating the graphical user interface control into a user command; and performing an action corresponding to the user command.
12. A method according to Claim 10, the operation of executing the speech event further comprises: translating the speech event into a user command; and performing an action corresponding to the user command. -17-
13. A method according to Claim 12, further comprising: generating audio prompts for the user using the compiled speech grammar.
14. A method according to Claim 12, wherein the user command is a web browser command.
15. A method according to Claim 10, wherein the speech event is a text message for input into a field within the web page.
16. A method according the Claim 10, further comprising: receiving a precompiled speech grammar; and determining a speech event from the voice audio input using the precompiled speech grammar.
17. A method according the Claim 10, further comprising: receiving a precompiled speech grammar; and generating audio prompts for the user using the precompiled speech grammar.
18. A method according to Claim 17, wherein the precompiled speech grammar includes non-web page specific web browser commands.
19. A computer-readable storage medium containing code for controlling a web browser using a dynamically generated speech grammar, the web browser interfaced with a voice audio input device, comprising: a speech specific parser parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser; a visual speech hint generator adding a visual speech hint to the web page script for each such hyperlink and visual control; a speech grammar compiler generating a speech grammar for each such hyperlink and visual control; a grammar based speech recognition engine translating the speech grammar into a compiled speech grammar and determining a speech event from the voice audio input device using the compiled speech grammar; and a rendering engine executing the speech event on a visual web page rendered from the modified web page script.
20. A storage medium according to Claim 19, further comprising: -18-
a graphical user interface controller translating an input graphical user interface control into a user command, the rendering engine performing an action corresponding to the user command.
21. A storage medium according to Claim 19, further comprising: a speech controller translating the speech event into a user command, the rendering engine performing an action corresponding to the user command.
22. A storage medium according to Claim 21 , wherein the web browser is further interfaced with a speaker device, further comprising: the speech controller generating audio prompts using the compiled speech grammar for playing to the user over the speaker device.
23. A storage medium according the Claim 19, further comprising: a precompiled speech grammar, the grammar based speech recognition engine determining a speech event from the voice audio input using the precompiled speech grammar.
24. A storage medium according to Claim 23, wherein the web browser is further interfaced with a speaker device, further comprising: a speech controller generating audio prompts using the precompiled speech grammar for playing to the user over the speaker device.
25. A system for controlling a web browser using a static, precompiled speech grammar, comprising: a speech specific parser parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser and receiving the precompiled speech grammar; a visual speech hint generator adding a visual speech hint to the web page script for each such hyperlink and visual control; a voice audio input device; a grammar based speech recognition engine determining a speech event from voice audio input using the precompiled speech grammar; and a rendering engine executing the speech event on a visual web page rendered from the modified web page script.
26. A system according to Claim 25, further comprising: -19-
a graphical user interface controller translating an input graphical user interface control into a user command, the rendering engine performing an action corresponding to the user command.
27. A system according to Claim 25, further comprising: a speech controller translating the speech event into a user command, the rendering engine performing an action corresponding to the user command.
28. A system according to Claim 27, further comprising: a speaker device, wherein the speech controller generates audio prompts using the compiled speech grammar for playing to the user over the speaker device.
29. A system according to Claim 27, wherein the user command is a web browser command.
30. A system according to Claim 25, wherein the speech event is a text message for input into a field within the web page.
31. A system according to Claim 25, wherein the precompiled speech grammar includes non-web page specific web browser commands.
32. A method for controlling a web browser using a static, precompiled speech grammar, comprising: parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser; receiving the precompiled speech grammar; adding a visual speech hint to the web page script for each such hyperlink and visual control; determining a speech event from voice audio input using the precompiled speech grammar; and executing the speech event on a visual web page rendered from the modified web page script.
33. A method according to Claim 32, the operation of executing the speech event further comprises: receiving a graphical user interface control; translating the graphical user interface control into a user command; and -20-
performing an action corresponding to the user command.
34. A method according to Claim 32, the operation of executing the speech event further comprises: translating the speech event into a user command; and performing an action corresponding to the user command.
35. A method according to Claim 34, further comprising: generating audio prompts for the user using the compiled speech grammar.
36. A method according to Claim 34, wherein the user command is a web browser command.
37. A method according to Claim 32, wherein the speech event is a text message for input into a field within the web page.
38. A method according to Claim 32, wherein the precompiled speech grammar includes non-web page specific web browser commands.
39. A computer-readable storage medium containing code for controlling a web browser using a static, precompiled speech grammar, the web browser interfaced with a voice audio input device, comprising: a speech specific parser parsing hyperlinks and interactive controls from a script describing a web page for display by the web browser and receiving the precompiled speech grammar; a visual speech hint generator adding a visual speech hint to the web page script for each such hyperlink and visual control; a grammar based speech recognition engine determining a speech event from voice audio input using the precompiled speech grammar; and a rendering engine executing the speech event on a visual web page rendered from the modified web page script.
40. A storage medium according to Claim 39, further comprising: a graphical user interface controller translating an input graphical user interface control into a user command, the rendering engine performing an action corresponding to the user command.
41. A storage medium according to Claim 39, further comprising: -21-
a speech controller translating the speech event into a user command, the rendering engine performing an action corresponding to the user command.
42. A storage medium according to Claim 41, wherein the web browser is further interfaced with a speaker device, further comprising: the speech controller generates audio prompts using the compiled speech grammar for playing to the user over the speaker device.
43. A method of presenting electronic data comprising: receiving a document having a plurality of hyperlinks contained therein; determining an ordered list of hyperlinks from the plurality of hyperlinks, each hyperlink in the ordered list having a corresponding index value representative of the corresponding hyperlink's position in the ordered list of hyperlinks; displaying the document, wherein the display includes a plurality of hyperlink symbols and a plurality of index symbols, each hyperlink symbol representative of a corresponding hyperlink, each index symbol representative of a corresponding hyperlink's index value; receiving a voice command to activate a hyperlink; determining an activated hyperlink contained within the document, wherein the activated hyperlink corresponds to the voice command to activate the hyperlink, wherein determining the activated hyperlink includes determining an index symbol matching the voice command and locating a hyperlink based on the index value corresponding to the index symbol; and retrieving additional data based on the activated hyperlink.
44. The method of Claim 43, wherein each hyperlink symbol comprises text and each index symbol comprises a number.
45. The method of Claim 43, wherein each index symbol is displayed in close proximity to its corresponding hyperlink symbol.
46. The method of Claim 43, wherein determining the activated hyperlink includes: determining whether the voice command matches a hyperlink symbol; and if the voice command matches a hyperlink symbol, locating a hyperlink corresponding to the hyperlink symbol and not locating the hyperlink based on the index value corresponding to the index symbol. -22-
47. The method of Claim 43, further comprising: receiving a command to display an index menu; and in response to receiving the command to display an index menu, displaying a list of hyperlink symbols and associated index symbols.
48. A method of presenting electronic data comprising: receiving a document having a plurality of hyperlinks contained therein; displaying the document, wherein the display includes a plurality of hyperlink symbols, each hyperlink symbol representative of a corresponding hyperlink; receiving a voice command to display an index menu; determining an ordered list of hyperlinks from the plurality of hyperlinks each hyperlink in the ordered list having a corresponding index value representative of the corresponding hyperlink's position in the ordered list of hyperlinks; in response to receiving the command to display an index menu, displaying a list of hyperlink symbols and associated index symbols; and receiving a second voice command; determining an activated hyperlink contained within the document, wherein the activated hyperlink corresponds to the second voice command, wherein determining the activated hyperlink includes determining an index symbol matching the second voice command and locating a hyperlink based on the index value corresponding to the index symbol; and retrieving additional data based on the activated hyperlink.
49. The method of Claim 48, wherein each hyperlink symbol comprises text and each index symbol comprises a number.
50. A method for providing a voice-activated browser, the method comprising: generating a voice-recognition grammar at a server computer, the voice- recognition grammar corresponding to a plurality of hyperlink commands corresponding to a Web document; storing the voice-recognition grammar at the server computer; associating the voice-recognition grammar with the Web document; transmitting the Web document and the voice-recognition grammar to a wearable computer; -23-
in response to a voice command at the wearable computer, determining whether the voice command matches a hyperlink command corresponding to the voice-recognition grammar; and if the voice command matches a hyperlink command corresponding to the voice-recognition grammar, invoking the hyperlink command.
51. The method of Claim 50, further comprising: determining an ordered list of hyperlinks from the hyperlink commands, each hyperlink command in the ordered list having a corresponding index value representative of the corresponding hyperlink command's position in the ordered list of hyperlinks commands; displaying the document, wherein the display includes a plurality of hyperlink symbols and a plurality of index symbols, each hyperlink symbol representative of a corresponding hyperlink, each index symbol representative of a corresponding hyperlink's index value; and wherein determining whether the voice command matches the hyperlink command includes determining whether the voice command matches an index symbol corresponding to the hyperlink command.
PCT/US1999/006072 1998-03-20 1999-03-19 Voice controlled web browser WO1999048088A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU31045/99A AU3104599A (en) 1998-03-20 1999-03-19 Voice controlled web browser

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US7893798P 1998-03-20 1998-03-20
US60/078,937 1998-03-20

Publications (1)

Publication Number Publication Date
WO1999048088A1 true WO1999048088A1 (en) 1999-09-23

Family

ID=22147129

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/006072 WO1999048088A1 (en) 1998-03-20 1999-03-19 Voice controlled web browser

Country Status (2)

Country Link
AU (1) AU3104599A (en)
WO (1) WO1999048088A1 (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000029936A1 (en) * 1998-11-12 2000-05-25 Microsoft Corporation Speech recognition system with changing grammars and grammar help command
WO2001028187A1 (en) * 1999-10-08 2001-04-19 Blue Wireless, Inc. Portable browser device with voice recognition and feedback capability
EP1122636A2 (en) 2000-02-03 2001-08-08 Siemens Corporate Research, Inc. System and method for analysis, description and voice-driven interactive input to html forms
WO2001069592A1 (en) * 2000-03-15 2001-09-20 Bayerische Motoren Werke Aktiengesellschaft Device and method for the speech input of a destination into a destination guiding system by means of a defined input dialogue
GB2362017A (en) * 2000-03-29 2001-11-07 John Pepin Network access
WO2001095087A1 (en) * 2000-06-08 2001-12-13 Interactive Speech Technologies Voice-operated system for controlling a page stored on a server and capable of being downloaded for display on a client device
KR20020012364A (en) * 2000-08-07 2002-02-16 최중인 Method for electronic commerce using voice web server
EP1209660A2 (en) * 2000-11-23 2002-05-29 International Business Machines Corporation Voice navigation in web applications
WO2002044887A2 (en) * 2000-12-01 2002-06-06 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
WO2002073599A1 (en) * 2001-03-12 2002-09-19 Mediavoice S.R.L. Method for enabling the voice interaction with a web page
EP1246439A1 (en) * 2001-03-26 2002-10-02 Alcatel System and method for voice controlled internet browsing using a permanent D-channel connection
WO2002099786A1 (en) * 2001-06-01 2002-12-12 Nokia Corporation Method and device for multimodal interactive browsing
KR20030027359A (en) * 2001-09-28 2003-04-07 박기철 Method and System for interworking between voice-browser and existing web-browser
US7146323B2 (en) 2000-11-23 2006-12-05 International Business Machines Corporation Method and system for gathering information by voice input
EP1729284A1 (en) * 2005-05-30 2006-12-06 International Business Machines Corporation Method and systems for a accessing data by spelling discrimination letters of link names
US7219123B1 (en) * 1999-10-08 2007-05-15 At Road, Inc. Portable browser device with adaptive personalization capability
US7228495B2 (en) 2001-02-27 2007-06-05 International Business Machines Corporation Method and system for providing an index to linked sites on a web page for individuals with visual disabilities
EP1881685A1 (en) * 2000-12-01 2008-01-23 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
EP1899952A2 (en) * 2005-07-07 2008-03-19 V-Enable, Inc. System and method for searching for network-based content in a multi-modal system using spoken keywords
US7382770B2 (en) 2000-08-30 2008-06-03 Nokia Corporation Multi-modal content and automatic speech recognition in wireless telecommunication systems
CN100424630C (en) * 2004-03-26 2008-10-08 宏碁股份有限公司 Operation method of web page speech interface
CN100444097C (en) * 2005-06-16 2008-12-17 国际商业机器公司 Displaying available menu choices in a multimodal browser
EP2182452A1 (en) * 2008-10-29 2010-05-05 LG Electronics Inc. Mobile terminal and control method thereof
GB2467451A (en) * 2009-06-30 2010-08-04 Saad Ul Haq Voice activated launching of hyperlinks using discrete characters or letters
US20110010180A1 (en) * 2009-07-09 2011-01-13 International Business Machines Corporation Speech Enabled Media Sharing In A Multimodal Application
CN103136285A (en) * 2011-12-05 2013-06-05 英顺源(上海)科技有限公司 Translation query and operation system used for handheld device and method thereof
EP2518722A3 (en) * 2011-04-28 2013-08-28 Samsung Electronics Co., Ltd. Method for providing link list and display apparatus applying the same
US8600755B2 (en) 2006-09-11 2013-12-03 Nuance Communications, Inc. Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction
US8612230B2 (en) 2007-01-03 2013-12-17 Nuance Communications, Inc. Automatic speech recognition with a selection list
US8706490B2 (en) 2007-03-20 2014-04-22 Nuance Communications, Inc. Indexing digitized speech with words represented in the digitized speech
US8768711B2 (en) 2004-06-17 2014-07-01 Nuance Communications, Inc. Method and apparatus for voice-enabling an application
US8781840B2 (en) 2005-09-12 2014-07-15 Nuance Communications, Inc. Retrieval and presentation of network service results for mobile device using a multimodal browser
US8843376B2 (en) 2007-03-13 2014-09-23 Nuance Communications, Inc. Speech-enabled web content searching using a multimodal browser
US8862475B2 (en) 2007-04-12 2014-10-14 Nuance Communications, Inc. Speech-enabled content navigation and control of a distributed multimodal browser
US8862471B2 (en) 2006-09-12 2014-10-14 Nuance Communications, Inc. Establishing a multimodal advertising personality for a sponsor of a multimodal application
US8909532B2 (en) 2007-03-23 2014-12-09 Nuance Communications, Inc. Supporting multi-lingual user interaction with a multimodal application
US8938392B2 (en) 2007-02-27 2015-01-20 Nuance Communications, Inc. Configuring a speech engine for a multimodal application based on location
US9076454B2 (en) 2008-04-24 2015-07-07 Nuance Communications, Inc. Adjusting a speech engine for a mobile computing device based on background noise
US9083798B2 (en) 2004-12-22 2015-07-14 Nuance Communications, Inc. Enabling voice selection of user preferences
US9202467B2 (en) 2003-06-06 2015-12-01 The Trustees Of Columbia University In The City Of New York System and method for voice activating web pages
US9208785B2 (en) 2006-05-10 2015-12-08 Nuance Communications, Inc. Synchronizing distributed speech recognition
US9208783B2 (en) 2007-02-27 2015-12-08 Nuance Communications, Inc. Altering behavior of a multimodal application based on location
US9292183B2 (en) 2006-09-11 2016-03-22 Nuance Communications, Inc. Establishing a preferred mode of interaction between a user and a multimodal application
US9349367B2 (en) 2008-04-24 2016-05-24 Nuance Communications, Inc. Records disambiguation in a multimodal application operating on a multimodal device
US9396721B2 (en) 2008-04-24 2016-07-19 Nuance Communications, Inc. Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise
EP3401797A1 (en) * 2017-05-12 2018-11-14 Samsung Electronics Co., Ltd. Speech navigation for multilingual web pages
US11594218B2 (en) * 2020-09-18 2023-02-28 Servicenow, Inc. Enabling speech interactions on web-based user interfaces
CN116340685A (en) * 2023-03-28 2023-06-27 广东保伦电子股份有限公司 Webpage generating method and system based on voice

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819220A (en) * 1996-09-30 1998-10-06 Hewlett-Packard Company Web triggered word set boosting for speech interfaces to the world wide web

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819220A (en) * 1996-09-30 1998-10-06 Hewlett-Packard Company Web triggered word set boosting for speech interfaces to the world wide web

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"HELPING THE WEB", IEEE SPECTRUM., IEEE INC. NEW YORK., US, vol. 36, no. 03, 1 March 1999 (1999-03-01), US, pages 54 - 59, XP002919110, ISSN: 0018-9235 *
BAYER S: "EMBEDDING SPEECH IN WEB INTERFACES", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGEPROCESSING., XX, XX, vol. 03, 1 October 1996 (1996-10-01), XX, pages 1684 - 1687, XP002919109 *
HEMPHILL C T, THRIFT P R, LINN J C: "SPEECH-AWARE MULTIMEDIA", IEEE MULTIMEDIA., IEEE SERVICE CENTER, NEW YORK, NY., US, no. 01, 1 January 1996 (1996-01-01), US, pages 74 - 78, XP002919107, ISSN: 1070-986X, DOI: 10.1109/93.486706 *
KANEEN E, WYARD P: "A SPOKEN LANGUAGE INTERFACE TO INTERACTIVE MULTIMEDIA SERVICES", IEE COLLOQUIUM ON ADVANCES IN INTERACTIVE VOICE TECHNOLOGIES FORTELECOMMUNICATION SERVICES, IEE, LONDON, GB, 12 June 1997 (1997-06-12), GB, pages 01 - 07, XP002919111 *
ZUE V W: "NAVIGATING THE INFORMATION SUPERHIGHWAY USING SPOKEN LANGUAGE INTERFACES", IEEE EXPERT., IEEE SERVICE CENTER, NEW YORK, NY., US, vol. 10, no. 05, 1 October 1995 (1995-10-01), US, pages 39 - 43, XP002919108, ISSN: 0885-9000, DOI: 10.1109/64.464929 *

Cited By (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298324B1 (en) 1998-01-05 2001-10-02 Microsoft Corporation Speech recognition system with changing grammars and grammar help command
WO2000029936A1 (en) * 1998-11-12 2000-05-25 Microsoft Corporation Speech recognition system with changing grammars and grammar help command
WO2001028187A1 (en) * 1999-10-08 2001-04-19 Blue Wireless, Inc. Portable browser device with voice recognition and feedback capability
US7203721B1 (en) 1999-10-08 2007-04-10 At Road, Inc. Portable browser device with voice recognition and feedback capability
US7219123B1 (en) * 1999-10-08 2007-05-15 At Road, Inc. Portable browser device with adaptive personalization capability
EP1122636A3 (en) * 2000-02-03 2007-11-14 Siemens Corporate Research, Inc. System and method for analysis, description and voice-driven interactive input to html forms
EP1122636A2 (en) 2000-02-03 2001-08-08 Siemens Corporate Research, Inc. System and method for analysis, description and voice-driven interactive input to html forms
WO2001069592A1 (en) * 2000-03-15 2001-09-20 Bayerische Motoren Werke Aktiengesellschaft Device and method for the speech input of a destination into a destination guiding system by means of a defined input dialogue
US7209884B2 (en) 2000-03-15 2007-04-24 Bayerische Motoren Werke Aktiengesellschaft Speech input into a destination guiding system
GB2362017A (en) * 2000-03-29 2001-11-07 John Pepin Network access
WO2001095087A1 (en) * 2000-06-08 2001-12-13 Interactive Speech Technologies Voice-operated system for controlling a page stored on a server and capable of being downloaded for display on a client device
FR2810125A1 (en) * 2000-06-08 2001-12-14 Interactive Speech Technologie Voice control of server stored page capable of being downloaded for display on clients terminal, server has dictionaries associated with page download, client identifies relevant dictionary(s) and loads it/them into his terminal
KR20020012364A (en) * 2000-08-07 2002-02-16 최중인 Method for electronic commerce using voice web server
US7382770B2 (en) 2000-08-30 2008-06-03 Nokia Corporation Multi-modal content and automatic speech recognition in wireless telecommunication systems
EP1209660A2 (en) * 2000-11-23 2002-05-29 International Business Machines Corporation Voice navigation in web applications
US7146323B2 (en) 2000-11-23 2006-12-05 International Business Machines Corporation Method and system for gathering information by voice input
EP1209660A3 (en) * 2000-11-23 2002-11-20 International Business Machines Corporation Voice navigation in web applications
WO2002044887A2 (en) * 2000-12-01 2002-06-06 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
EP1881685A1 (en) * 2000-12-01 2008-01-23 The Trustees Of Columbia University In The City Of New York A method and system for voice activating web pages
WO2002044887A3 (en) * 2000-12-01 2003-04-24 Univ Columbia A method and system for voice activating web pages
US7640163B2 (en) 2000-12-01 2009-12-29 The Trustees Of Columbia University In The City Of New York Method and system for voice activating web pages
US7228495B2 (en) 2001-02-27 2007-06-05 International Business Machines Corporation Method and system for providing an index to linked sites on a web page for individuals with visual disabilities
WO2002073599A1 (en) * 2001-03-12 2002-09-19 Mediavoice S.R.L. Method for enabling the voice interaction with a web page
EP1246439A1 (en) * 2001-03-26 2002-10-02 Alcatel System and method for voice controlled internet browsing using a permanent D-channel connection
WO2002099786A1 (en) * 2001-06-01 2002-12-12 Nokia Corporation Method and device for multimodal interactive browsing
KR20030027359A (en) * 2001-09-28 2003-04-07 박기철 Method and System for interworking between voice-browser and existing web-browser
US9202467B2 (en) 2003-06-06 2015-12-01 The Trustees Of Columbia University In The City Of New York System and method for voice activating web pages
CN100424630C (en) * 2004-03-26 2008-10-08 宏碁股份有限公司 Operation method of web page speech interface
US8768711B2 (en) 2004-06-17 2014-07-01 Nuance Communications, Inc. Method and apparatus for voice-enabling an application
US9083798B2 (en) 2004-12-22 2015-07-14 Nuance Communications, Inc. Enabling voice selection of user preferences
EP1729284A1 (en) * 2005-05-30 2006-12-06 International Business Machines Corporation Method and systems for a accessing data by spelling discrimination letters of link names
CN100444097C (en) * 2005-06-16 2008-12-17 国际商业机器公司 Displaying available menu choices in a multimodal browser
EP1899952A4 (en) * 2005-07-07 2009-07-22 Enable Inc V System and method for searching for network-based content in a multi-modal system using spoken keywords
EP1899952A2 (en) * 2005-07-07 2008-03-19 V-Enable, Inc. System and method for searching for network-based content in a multi-modal system using spoken keywords
US8781840B2 (en) 2005-09-12 2014-07-15 Nuance Communications, Inc. Retrieval and presentation of network service results for mobile device using a multimodal browser
US9208785B2 (en) 2006-05-10 2015-12-08 Nuance Communications, Inc. Synchronizing distributed speech recognition
US9292183B2 (en) 2006-09-11 2016-03-22 Nuance Communications, Inc. Establishing a preferred mode of interaction between a user and a multimodal application
US9343064B2 (en) 2006-09-11 2016-05-17 Nuance Communications, Inc. Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction
US8600755B2 (en) 2006-09-11 2013-12-03 Nuance Communications, Inc. Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction
US8862471B2 (en) 2006-09-12 2014-10-14 Nuance Communications, Inc. Establishing a multimodal advertising personality for a sponsor of a multimodal application
US8612230B2 (en) 2007-01-03 2013-12-17 Nuance Communications, Inc. Automatic speech recognition with a selection list
US8938392B2 (en) 2007-02-27 2015-01-20 Nuance Communications, Inc. Configuring a speech engine for a multimodal application based on location
US9208783B2 (en) 2007-02-27 2015-12-08 Nuance Communications, Inc. Altering behavior of a multimodal application based on location
US8843376B2 (en) 2007-03-13 2014-09-23 Nuance Communications, Inc. Speech-enabled web content searching using a multimodal browser
US9123337B2 (en) 2007-03-20 2015-09-01 Nuance Communications, Inc. Indexing digitized speech with words represented in the digitized speech
US8706490B2 (en) 2007-03-20 2014-04-22 Nuance Communications, Inc. Indexing digitized speech with words represented in the digitized speech
US8909532B2 (en) 2007-03-23 2014-12-09 Nuance Communications, Inc. Supporting multi-lingual user interaction with a multimodal application
US8862475B2 (en) 2007-04-12 2014-10-14 Nuance Communications, Inc. Speech-enabled content navigation and control of a distributed multimodal browser
US9396721B2 (en) 2008-04-24 2016-07-19 Nuance Communications, Inc. Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise
US9349367B2 (en) 2008-04-24 2016-05-24 Nuance Communications, Inc. Records disambiguation in a multimodal application operating on a multimodal device
US9076454B2 (en) 2008-04-24 2015-07-07 Nuance Communications, Inc. Adjusting a speech engine for a mobile computing device based on background noise
EP2182452A1 (en) * 2008-10-29 2010-05-05 LG Electronics Inc. Mobile terminal and control method thereof
US9129011B2 (en) 2008-10-29 2015-09-08 Lg Electronics Inc. Mobile terminal and control method thereof
GB2467451B (en) * 2009-06-30 2011-06-01 Saad Ul Haq Discrete voice command navigator
GB2467451A (en) * 2009-06-30 2010-08-04 Saad Ul Haq Voice activated launching of hyperlinks using discrete characters or letters
US20110010180A1 (en) * 2009-07-09 2011-01-13 International Business Machines Corporation Speech Enabled Media Sharing In A Multimodal Application
US8510117B2 (en) * 2009-07-09 2013-08-13 Nuance Communications, Inc. Speech enabled media sharing in a multimodal application
EP2518722A3 (en) * 2011-04-28 2013-08-28 Samsung Electronics Co., Ltd. Method for providing link list and display apparatus applying the same
CN103136285A (en) * 2011-12-05 2013-06-05 英顺源(上海)科技有限公司 Translation query and operation system used for handheld device and method thereof
EP3401797A1 (en) * 2017-05-12 2018-11-14 Samsung Electronics Co., Ltd. Speech navigation for multilingual web pages
US10802851B2 (en) 2017-05-12 2020-10-13 Samsung Electronics Co., Ltd. Display apparatus and controlling method thereof
US11726806B2 (en) 2017-05-12 2023-08-15 Samsung Electronics Co., Ltd. Display apparatus and controlling method thereof
US11594218B2 (en) * 2020-09-18 2023-02-28 Servicenow, Inc. Enabling speech interactions on web-based user interfaces
CN116340685A (en) * 2023-03-28 2023-06-27 广东保伦电子股份有限公司 Webpage generating method and system based on voice
CN116340685B (en) * 2023-03-28 2024-01-30 广东保伦电子股份有限公司 Webpage generating method and system based on voice

Also Published As

Publication number Publication date
AU3104599A (en) 1999-10-11

Similar Documents

Publication Publication Date Title
WO1999048088A1 (en) Voice controlled web browser
JP3432076B2 (en) Voice interactive video screen display system
US6311177B1 (en) Accessing databases when viewing text on the web
US7406659B2 (en) Smart links
US6829746B1 (en) Electronic document delivery system employing distributed document object model (DOM) based transcoding
US7054952B1 (en) Electronic document delivery system employing distributed document object model (DOM) based transcoding and providing interactive javascript support
US6456974B1 (en) System and method for adding speech recognition capabilities to java
US6725424B1 (en) Electronic document delivery system employing distributed document object model (DOM) based transcoding and providing assistive technology support
US6088675A (en) Auditorially representing pages of SGML data
US7212971B2 (en) Control apparatus for enabling a user to communicate by speech with a processor-controlled apparatus
US8572209B2 (en) Methods and systems for authoring of mixed-initiative multi-modal interactions and related browsing mechanisms
US7240006B1 (en) Explicitly registering markup based on verbal commands and exploiting audio context
US6988240B2 (en) Methods and apparatus for low overhead enhancement of web page and markup language presentations
US9083798B2 (en) Enabling voice selection of user preferences
US7197462B2 (en) System and method for information access
US20020077823A1 (en) Software development systems and methods
US7756849B2 (en) Method of searching for text in browser frames
US7487453B2 (en) Multi-modal content presentation
EP0814414A2 (en) Embedding sound in web pages
EP2085963A1 (en) System and method for bilateral communication between a user and a system
US20020143821A1 (en) Site mining stylesheet generator
US20040025115A1 (en) Method, terminal, browser application, and mark-up language for multimodal interaction between a user and a terminal
JPH10275162A (en) Radio voice actuation controller controlling host system based upon processor
JP2004310748A (en) Presentation of data based on user input
WO2001050257A2 (en) Incorporating non-native user interface mechanisms into a user interface

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase