WO2003058938A1 - Information retrieval system including voice browser and data conversion server - Google Patents

Information retrieval system including voice browser and data conversion server Download PDF

Info

Publication number
WO2003058938A1
WO2003058938A1 PCT/US2002/041383 US0241383W WO03058938A1 WO 2003058938 A1 WO2003058938 A1 WO 2003058938A1 US 0241383 W US0241383 W US 0241383W WO 03058938 A1 WO03058938 A1 WO 03058938A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
protocol
content
formatted
voice
Prior art date
Application number
PCT/US2002/041383
Other languages
French (fr)
Inventor
Dipanshu Sharma
Sunil Kumar
Chandra Kholia
Original Assignee
V-Enable, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/040,525 external-priority patent/US20030125953A1/en
Application filed by V-Enable, Inc. filed Critical V-Enable, Inc.
Priority to AU2002364014A priority Critical patent/AU2002364014A1/en
Publication of WO2003058938A1 publication Critical patent/WO2003058938A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • the present invention relates to the field of browsers used for accessing data in distributed computing environments and, in particular, to techniques for accessing such data using Web browsers controlled at least in part through voice commands.
  • HTTP Hypertext Transfer Protocol
  • HTML Hypertext Markup Language
  • HTML provides document formatting allowing the developer to specify links to other servers in the network.
  • a Uniform Resource Locator (URL) defines the path to Web site hosted by a particular Web server.
  • the pages of Web sites are typically accessed using an HTML-compatible browser (e.g., Netscape Navigator or Internet Explorer) executing on a client machine.
  • the browser specifies a link to a Web server and particular Web page using a URL.
  • the client issues a request to a naming service to map a hostname in the URL to a particular network IP address at which the server is located.
  • the naming service returns a list of one or more IP addresses that can respond to the request.
  • the browser establishes a connection to a Web server. If the Web server is available, it returns a document or other object formatted according to HTML.
  • Client devices differ in their display capabilities, e.g., monochrome, color, different color palettes, resolution, sizes. Such devices also vary with regard to the peripheral devices that may be used to provide input signals or commands (e.g., mouse and keyboard, touch sensor, remote control for a TV set-top box). Furthermore, the browsers executing on such client devices can vary in the languages supported, (e.g., HTML, dynamic HTML, XML, Java, JavaScript). Because of these differences, the experience of browsing the same Web page may differ dramatically depending on the type of client device employed. The inability to adjust the display of Web pages based upon a client's capabilities and environment causes a number of problems.
  • languages supported e.g., HTML, dynamic HTML, XML, Java, JavaScript
  • a Web site may simply be incapable of servicing a particular set of clients, or may make the Web browsing experience confusing or unsatisfactory in some way. Even if the developers of a Web site have made an effort to accommodate a range of client devices, the code for the Web site may need to be duplicated for each client environment. Duplicated code consequently increases the maintenance cost for the Web site. In addition, different URLs are frequently required to be known in order to access the Web pages formatted for specific types of client devices.
  • Such non-visual Web browsers or "voice browsers" present audio output to a user by converting the text of Web pages to speech and by playing pre-recorded Web audio files from the Web.
  • a voice browser also permits a user to navigate between Web pages by following hypertext links, as well as to choose from a number of pre-defined links, or "bookmarks" to selected Web pages.
  • certain voice browsers permit users to pause and resume the audio output by the browser.
  • VoiceXML Voice extensible Markup Language
  • VoIP Voice extensible Markup Language
  • VoIP defines an audio interface through which users may interact with Web content, similar to the manner in which the Hypertext Markup Language (“HTML”) specifies the visual presentation of such content.
  • HTML Hypertext Markup Language
  • VoiceXML includes intrinsic constructs for tasks such as dialogue flow, grammars, call transfers, and embedding audio files.
  • VoiceXML standard generally contemplates that VoiceXML- compliant voice browsers interact exclusively with Web content of the VoiceXML format. This has limited the utility of existing VoiceXML-compliant voice browsers, since a relatively small percentage of Web sites include content formatted in accordance with
  • WML Wireless Markup Language
  • WAP Wireless Application Protocol
  • HDML Handheld Device Markup Language
  • the present invention relates to a method for retrieving information from remote information sources.
  • the inventive method contemplates transmitting a user request over a communication link to a voice browser operative in accordance with a voice- based protocol.
  • a browsing request identifying a remote information source corresponding to the user request is generated.
  • Content formatted in accordance with a predefined protocol is then retrieved from the remote information source in accordance with the browsing request.
  • the retrieved content is converted into a file of information formatted in compliance with the voice-based protocol.
  • a response is then provided to the user request on the basis of the file of converted information.
  • the present invention also pertains to a system for retrieving information from remote information sources.
  • the system includes a voice browser operating in accordance with a voice-based protocol.
  • the voice browser is disposed to receive a user request transmitted over a communication link and to generate a browsing request in response to the user request.
  • the system further includes a conversion server in communication with the voice browser.
  • the conversion server includes a retrieval module for retrieving content from a remote information source in accordance with the browsing request.
  • the retrieved content is formatted in accordance with a predefined protocol, and is converted by a conversion module of the conversion server into a file of converted information compliant with the voice-based protocol.
  • the file of converted information is then provided to the voice browser through an interface of the conversion server.
  • the present invention is directed to a conversion server responsive to browsing requests issued by a browser unit operative in accordance with a first protocol.
  • the conversion server includes a retrieval module for retrieving web page information from a web site in accordance with a first browsing request issued by the browsing unit.
  • the retrieved web page information is formatted in accordance with a second protocol different from the first protocol.
  • a conversion module serves to convert at least a primary portion of the web page information into a primary file of converted information compliant with the first protocol.
  • the conversion server also includes an interface module for providing said primary file of converted information to said browsing unit.
  • the present invention also relates to a method for facilitating browsing of the Internet.
  • the method includes receiving a browsing request from a browser unit operative in accordance with a first protocol, wherein the browsing request is issued by the browser unit in response to a first user request for web content.
  • Web page information formatted in accordance with a second protocol different from said first protocol, is retrieved from a web site in accordance with the browsing request.
  • the method further includes converting at least a primary portion of the web page information into a primary file of converted information compliant with the first protocol.
  • FIG. 1 provides a schematic diagram of a system for accessing Web content using a voice browser system in accordance with the present invention.
  • FIG. 2 shows a block diagram of a voice browser included within the system of FIG. 1.
  • FIG. 3 is a functional block diagram of a conversion server included within the voice browser system of the present invention.
  • FIG. 4 is a flow chart representative of operation of the system of the present invention in furnishing Web content to a requesting user.
  • FIG. 5 is a flow chart representative of operation of the system of the present invention in providing content from a proprietary database to a requesting user.
  • FIG. 6 is a flow chart representative of operation of the conversion server.
  • FIG. 7A and 7B are collectively a flowchart illustrating an exemplary process for transcoding a parse tree representation of WML-based document into an output document comporting with the VoiceXML protocol.
  • FIG. 1 provides a schematic diagram of a system 100 for accessing Web content using a voice browser in accordance with the present invention.
  • the system 100 includes a telephonic subscriber unit 102 in communication with a voice browser 110 through a telecommunications network 120.
  • the voice browser 110 executes dialogues with a user of the subscriber unit 102 on the basis of document files comporting with a known speech mark-up language (e.g., VoiceXML).
  • the voice browser 110 initiates, in response to requests for content submitted through the subscriber unit 102, the retrieval of information forming the basis of certain such document files from remote information sources.
  • Such remote information sources may comprise, for example, Web servers 140 and one or more databases represented by proprietary database 142.
  • the voice browser 1 10 initiates such retrieval by issuing a browsing request either directly to the applicable remote information source or to a conversion server 150.
  • the request for content pertains to a remote information source operative in accordance with the protocol applicable to the voice browser 1 10 (e.g., VoiceXML)
  • the voice browser 110 issues a browsing request directly to the remote information source of interest.
  • the request for content pertains to a Web site formatted consistently with the protocol of the voice browser 110
  • a document file containing such content is requested by the voice browser 110 via the Internet 130 directly from the Web server 140 hosting the Web site of interest.
  • the voice browser 1 10 issues a corresponding browsing request to a conversion server 150.
  • the conversion server 150 retrieves content from the Web server 140 hosting the Web site of interest and converts this content into a document file compliant with the protocol of the voice browser 110.
  • the converted document file is then provided by the conversion server 150 to the voice browser 110, which then uses this file to effect a dialogue conforming to the applicable voice-based protocol with the user of subscriber unit 102.
  • the voice browser 1 10 issues a corresponding browsing request to the conversion server 150.
  • the conversion server 150 retrieves content from the proprietary database 142 and converts this content into a document file compliant with the protocol of the voice browser 1 10.
  • the converted document file is then provided to the voice browser 1 10 and used as the basis for carrying out a dialogue with the user of subscriber unit 102.
  • the subscriber unit 102 is in communication with the voice browser 110 via the telecommunications network 120.
  • the subscriber unit 102 has a keypad (not shown) and associated circuitry for generating Dual Tone MultiFrequency (DTMF) tones.
  • DTMF Dual Tone MultiFrequency
  • the subscriber unit 102 transmits DTMF tones to, and receives audio output from, the voice browser 110 via the telecommunications network 120.
  • the subscriber unit 102 is exemplified with a mobile station and the telecommunications network 120 is represented as including a mobile communications network and the Public
  • the voice-based information retrieval services offered by the system 100 can be accessed by subscribers through a variety of other types of devices and networks.
  • the voice browser 110 may be accessed through the PSTN from, for example, a stand-alone telephone 104 (either analog or digital), or from a node on a PBX (not shown).
  • a personal computer 106 or other handheld or portable computing device disposed for voice over EP communication may access the voice browser 110 via the Internet 130.
  • FIG. 2 shows a block diagram of the voice browser 110.
  • the voice browser 110 includes certain standard server computer components, including a network connection device 202, a CPU 204 and memory (primary and/or secondary) 206.
  • the voice browser 110 also includes telephony infrastructure 226 for effecting communication with telephony- based subscriber units (e.g., the mobile subscriber unit 102 and landline telephone 104).
  • the memory 206 stores a set of computer programs to implement the processing effected by the voice browser 110.
  • One such program stored by memory 206 comprises a standard communication program 208 for conducting standard network communications via the Internet 130 with the conversion server 150 and any subscriber units operating in a voice over EP mode (e.g., personal computer 106).
  • the memory 206 also stores a voice browser interpreter 200 and an interpreter context module 210.
  • the voice browser interpreter 200 In response to requests from, for example, subscriber unit 102 for Web or proprietary database content formatted inconsistently with the protocol of the voice browser 110, the voice browser interpreter 200 initiates establishment of a communication channel via the Internet 130 with the conversion server 150.
  • the voice browser 110 then issues, over this communication channel and in accordance with conventional Internet protocols (i.e., HTTP and TCP/EP), browsing requests to the conversion server 150 corresponding to the requests for content submitted by the requesting subscriber unit.
  • the conversion server 150 retrieves the requested Web or proprietary database content in response to such browsing requests and converts the retrieved content into document files in a format (e.g., VoiceXML) comporting with the protocol of the voice browser 110.
  • a format e.g., VoiceXML
  • the converted document files are then provided to the voice browser 110 over the established Internet communication channel and utilized by the voice browser interpreter 200 in carrying out a dialogue with a user of the requesting unit.
  • the interpreter context module 210 uses conventional techniques to identify requests for help and the like which may be made by the user of the requesting subscriber unit.
  • the interpreter context module 210 may be disposed to identify predefined "escape" phrases submitted by the user in order to access menus relating to, for example, help functions or various user preferences (e.g., volume, text-to-speech characteristics).
  • audio content is transmitted and received by telephony infrastructure 226 under the direction of a set of audio processing modules 228.
  • the audio processing modules 228 include a text-to-speech ("TTS") converter 230, an audio file player 232, and a speech recognition module 234.
  • TTS text-to-speech
  • the telephony infrastructure 226 is responsible for detecting an incoming call from a telephony-based subscriber unit and for answering the call (e.g., by playing a predefined greeting). After a call from a telephony-based subscriber unit has been answered, the voice browser interpreter 200 assumes control of the dialogue with the telephony-based subscriber unit via the audio processing modules 228.
  • audio requests from telephony-based subscriber units are parsed by the speech recognition module 234 and passed to the voice browser inte ⁇ reter 200.
  • the voice browser inte ⁇ reter 200 communicates information to telephony-based subscriber units through the text-to-speech converter 230.
  • the telephony infrastructure 226 also receives audio signals from telephony-based subscriber units via the telecommunications network 120 in the form of DTMF signals.
  • the telephony infrastructure 226 is able to detect and inte ⁇ ret the DTMF tones sent from telephony-based subscriber units. Inte ⁇ reted DTMF tones are then transferred from the telephony infrastructure to the voice browser inte ⁇ reter 200.
  • the voice browser inte ⁇ reter 200 After the voice browser inte ⁇ reter 200 has retrieved a VoiceXML document from the conversion server 150 in response to a request from a subscriber unit, the retrieved VoiceXML document forms the basis for the dialogue between the voice browser 110 and the requesting subscriber unit.
  • text and audio file elements stored within the retrieved VoiceXML document are converted into audio streams in text-to-speech converter 230 and audio file player 232, respectively.
  • the streams are transferred to the telephony infrastructure 226 for adaptation and transmission via the telecommunications network 120 to such subscriber unit.
  • the streams In the case of requests for content from Internet-based subscriber units (e.g., the personal computer 106), the streams are adapted and transmitted by the network connection device 202.
  • the voice browser inte ⁇ reter 200 inte ⁇ rets each retrieved VoiceXML document in a manner analogous to the manner in which a standard Web browser inte ⁇ rets a visual markup language, such as HTML or WML.
  • the voice browser inte ⁇ reter 200 inte ⁇ rets scripts written in a speech markup language such as VoiceXML rather than a visual markup language.
  • the voice browser 1 10 may be realized using, consistent with the teachings herein, a voice browser licensed from, for example, Nuance Communications of Menlo Park, California.
  • FIG. 3 a functional block diagram is provided of the conversion server 150.
  • the conversion server 150 operates to convert or transcode conventional structured document formats (e.g., HTML) into the format applicable to the voice browser 1 10 (e.g., VoiceXML).
  • This conversion is generally effected by performing a predefined mapping of the syntactical elements of conventional structured documents harvested from Web servers 140 into corresponding equivalent elements contained within an XML-based file formatted in accordance with the protocol of the voice browser 1 10.
  • the resultant XML-based file may include all or part of the "target" structured document harvested from the applicable Web server 140, and may also optionally include additional content provided by the conversion server 150.
  • the target document is parsed, and identified tags, styles and content can either be replaced or removed.
  • the conversion server 150 may be physically implemented using a standard configuration of hardware elements including a CPU 314, a memory 316, and a network interface 310 operatively connected to the Internet 130. Similar to the voice browser 1 10, the memory 316 stores a standard communication program 318 to realize standard network communications via the Internet 130. In addition, the communication program 318 also controls communication occurring between the conversion server 150 and the proprietary database 142 by way of database interface 332. As is discussed below, the memory 316 also stores a set of computer programs to implement the content conversion process performed by the conversion module 150.
  • the memory 316 includes a retrieval module 324 for controlling retrieval of content from Web servers 140 and proprietary database 142 in accordance with browsing requests received from the voice browser 110.
  • a retrieval module 324 for controlling retrieval of content from Web servers 140 and proprietary database 142 in accordance with browsing requests received from the voice browser 110.
  • requests for content from Web servers 140 such content is retrieved via network interface 310 from Web pages formatted in accordance with protocols particularly suited to portable, handheld or other devices having limited display capability (e.g., WML, Compact HTML, xHTML and HDML).
  • the locations or URLs of such specially formatted sites may be provided by the voice browser or may be stored within a URL database 320 of the conversion server 150.
  • the voice browser 110 may specify the URL for the version of the "CNET" site accessed by WAP-compliant devices (i.e., comprised of WML- formatted pages).
  • WAP-compliant devices i.e., comprised of WML- formatted pages.
  • the voice browser 1 10 could simply proffer a generic request for content from the "CNET" site to the conversion server
  • the memory 316 of conversion server 150 also includes a conversion module 330 operative to convert the content collected under the direction of retrieval module 324 from Web servers 140 or the proprietary database 142 into corresponding VoiceXML documents.
  • the retrieved content is parsed by a parser 340 of conversion module 330 in accordance with a document type definition ("DTD") corresponding to the format of such content.
  • DTD document type definition
  • the parser 340 would parse the retrieved content using a DTD obtained from the applicable standards body, i.e., the Wireless Application Protocol Forum, Ltd. (www.wapforum.org) into a parsed file.
  • a DTD establishes a set of constraints for an XML-based document; that is, a DTD defines the manner in which an XML-based document is constructed.
  • the resultant parsed file is generally in the form of a Domain Object Model ("DOM") representation, which is arranged in a tree-like hierarchical structure composed of a plurality of interconnected nodes (i.e., a "parse tree").
  • DOM Domain Object Model
  • the parse tree includes a plurality of "child” nodes descending downward from its root node, each of which are recursively examined and processed in the manner described below.
  • a mapping module 350 within the conversion module 330 then traverses the parse tree and applies predefined conversion rules 363 to the elements and associated attributes at each of its nodes. In this way the mapping module 350 creates a set of corresponding equivalent elements and attributes conforming to the protocol of the voice browser 110.
  • a converted document file (e.g., a VoiceXML document file) is then generated by supplementing these equivalent elements and attributes with grammatical terms to the extent required by the protocol of the voice browser 1 10. This converted document file is then provided to the voice browser 110 via the network interface 310 in response to the browsing request originally issued by the voice browser 110.
  • the conversion module 330 is preferably a general pu ⁇ ose converter capable of transforming the above-described structured document content (e.g., WML) into corresponding VoiceXML documents.
  • the resultant VoiceXML content can then be delivered to users via any VoiceXML-compliant platform, thereby introducing a voice capability into existing structured document content.
  • a basic set of rules can be imposed to simplify the conversion of the structured document content into the VoiceXML format.
  • 330 may comprise the following:
  • the conversion module 330 will discard the images and generate the necessary information for presenting the image.
  • the conversion module 330 may generate appropriate warning messages or the like.
  • the warning message will typically inform the user that the structured content contains a script or some component not capable of being converted to voice and that meaningful information may not be being conveyed to the user.
  • the conversion module 330 When the structured document content contains instructions similar or identical to those such as the WML-based SELECT LIST options, the conversion module 330 generates information for presenting the SELECT LIST or similar options into a menu list for audio representation. For example, an audio playback of "Please say news weather mail" could be generated for the SELCT LIST defining the three options of news, weather and mail.
  • Any hyperlinks in the structured document content are converted to reference the conversion module 330, and the actual link location passed to the conversion module as a parameter to the referencing hyperlink. In this way hyperlinks and other commands which transfer control may be voice-activated and converted to an appropriate voice-based format upon request. 5. Input fields within the structured content are converted to an active voice- based dialogue, and the appropriate commands and vocabulary added as necessary to process them.
  • Multiple screens of structured content can be directly converted by the conversion module 330 into forms or menus of sequential dialogs.
  • Each menu is a stand-alone component (e.g., performing a complete task such as receiving input data).
  • the conversion module 330 may also include a feature that permits a user to interrupt the audio output generated by a voice platform (e.g., BeVocal, HeyAnita) prior to issuing a new command or input.
  • a voice platform e.g., BeVocal, HeyAnita
  • voice-activated commands may be employed to straightforwardly effect such actions.
  • the conversion module 330 operates to convert an entire page of structured content at once and to play the entire page in an uninterrupted manner. This enables relatively lengthy structured documents to be presented without the need for user intervention in the form of an audible "More" command or the equivalent.
  • FIG. 4 is a flow chart representative of an exemplary process 400 executed by the system 100 in providing content from Web servers 140 to a user of a subscriber unit.
  • the user of the subscriber unit places a call to the voice browser 1 10, which will then typically identify the originating user utilizing known techniques (step 404).
  • the voice browser retrieves a start page associated with such user, and initiates execution of an introductory dialogue with the user such as, for example, the dialogue set forth below (step 408).
  • the designation "C” identifies the phrases generated by the voice browser 110 and conveyed to the user's subscriber unit
  • the designation "U" identifies the words spoken or actions taken by such user.
  • the voice browser 110 may directly retrieve content from the Web server 140 hosting the requested Web site (e.g., "vxml.cnet.com”) in a manner consistent with the applicable voice-based protocol (step 416). If the format of the requested Web site (e.g., "cnet.com”) is inconsistent with the format of the voice browser 110, then the intelligence of the voice browser 110 influences the course of subsequent processing. Specifically, in the case where the voice browser 1 10 maintains a database (not shown) of Web sites having formats similar to its own (step 420), then the voice browser
  • step 424 the identity of such similarly formatted site (e.g., "wap.cnet.com") to the conversion server 150 via the Internet 130 in the manner described below (step 424). If such a database is not maintained by the voice browser 1 10, then in a step 428 the identity of the requested Web site itself (e.g., "cnet.com”) is similarly forwarded to the conversion server 150 via the Internet 130. In the latter case the conversion server 150 will recognize that the format of the requested Web site (e.g., HTML) is dissimilar from the protocol of the voice browser 1 10, and will then access the URL database 320 in order to determine whether there exists a version of the requested Web site of a format (e.g., WML) more easily convertible into the protocol of the voice browser 1 10.
  • a format e.g., WML
  • the voice-browser 110 is disposed to use substantially the same syntactical elements in requesting the conversion server 150 to obtain content from Web sites not formatted in conformance with the applicable voice-based protocol as are used in requesting content from Web sites compliant with the protocol of the voice browser 110.
  • the voice browser 110 may issue requests to Web servers 140 compliant with the VoiceXML protocol using, for example, the syntactical elements goto, choice, link and submit.
  • the voice browser 110 may be configured to request the conversion server 150 to obtain content from inconsistently formatted Web sites using these same syntactical elements.
  • the voice browser 110 could be configured to issue the following type of goto when requesting Web content through the conversion server 150:
  • variable ConSeverAddress within the next attribute of the goto element is set to the EP address of the conversion server 150
  • the variable Filename is set to the name of a conversion script (e.g., conversion.jsp) stored on the conversion server 150
  • the variable ContentAddress is used to specify the destination URL (e.g., "wap.cnet.com" of the Web server 140 of interest
  • the variable Protocol identifies the format (e.g., WAP) of such content server.
  • the conversion script is typically embodied in a file of conventional format
  • Web content is retrieved from the applicable Web server 140 and converted by the conversion script into the VoiceXML format per the conversion process described below.
  • the voice browser 110 may also request Web content from the conversion server
  • the choice element is utilized to define potential user responses to queries posed within a menu construct.
  • the menu construct provides a mechanism for prompting a user to make a selection, with control over subsequent dialogue with the user being changed on the basis of the user's selection.
  • the voice browser 110 may also request Web content from the conversion server 150 using the link element, which may be defined in a VoiceXML document as a child of the vxml or form constructs.
  • the link element may be defined in a VoiceXML document as a child of the vxml or form constructs.
  • An example of such a request based upon a link element is set forth below:
  • the submit element is similar to the goto element in that its execution results in procurement of a specified VoiceXML document. However, the submit element also enables an associated list of variables to be submitted to the identified Web server 140 by way of an HTTP GET or POST request.
  • An exemplary request for Web content from the conversion server 150 using a submit expression is given below:
  • the method attribute of the submit element specifies whether an HTTP GET or POST method will be invoked, and where the namelist attribute identifies a site protocol variable forwarded to the conversion server 150.
  • the site protocol variable is set to the formatting protocol applicable to the Web site specified by the ContentAddress variable.
  • the conversion server 150 operates to retrieve and convert Web content from the Web servers 140 in a unique and efficient manner (step 432). This retrieval process preferably involves collecting Web content not only from a "root" or
  • main page of the Web site of interest but also involves “prefetching” content from "child” or “branch” pages likely to be accessed from such main page (step 440).
  • content of the retrieved main page is converted into a document file having a format consistent with that of the voice browser 110. This document file is then provided to the voice browser 110 over the Internet by the interface
  • the conversion server 150 also immediately converts the "prefectched" content from each branch page into the format utilized by the voice browser 110 and stores the resultant document files within a prefetch cache 370 (step 450).
  • the voice browser 110 forwards the request in the above-described manner to the conversion server 150.
  • the document file corresponding to the requested branch page is then retrieved from the prefetch cache 370 and provided to the voice browser 110 through the network interface 310.
  • this document file is used in continuing a dialogue with the user of subscriber unit 102 (step 454). It follows that once the user has begun a dialogue with the voice browser 110 based upon the content of the main page of the requested Web site, such dialogue may continue substantially uninterrupted when a transitions is made to one of the prefetched branch pages of such site. This approach advantageously minimizes the delay exhibited by the system 100 in responding to subsequent user requests for content once a dialogue has been initiated.
  • FIG. 5 is a flow chart representative of operation of the system 100 in providing content from proprietary database 142 to a user of a subscriber unit.
  • the proprietary database 142 is assumed to comprise a message repository included within a text-based messaging system (e.g., an electronic mail system) compliant with the ARPA standard set forth in Requests for Comments (RFC) 822, which is entitled "RFC822: Standard for ARPA Internet Text Messages" and is available at, for example, www.w3.org/Protocols/rfc822/Overview.html.
  • RRC Requests for Comments
  • FIG. 5 at a step 502 a user of a subscriber unit places a call to the voice browser 110.
  • the originating user is then identified by the voice browser 110 utilizing known techniques (step 504).
  • the voice browser 110 then retrieves a start page associated with such user, and initiates execution of an introductory dialogue with the user such as, for example, the dialogue set forth below (step 508).
  • the voice browser 110 issues a browsing request to the conversion server 150 in order to obtain information applicable to the requesting user from the proprietary database 142 (step 514).
  • the voice browser 110 operates in accordance with the VoiceXML protocol, it issues such browsing request using the syntactical elements goto, choice, link and submit in a substantially similar manner as that described above with reference to FIG. 4.
  • the voice browser 110 could be configured to issue the following type of goto when requesting information from the proprietary database 142 through the conversion server 150:
  • email.jsp is a program file stored within memory 316 of the conversion server 150
  • ServerAddress is a variable identifying the address of the proprietary database 142 (e.g., mail. V-Enable.com)
  • Protocol is a variable identifying the format of the database 142 (e.g., POP3).
  • the conversion server 150 Upon receiving such a browsing request from the voice browser 110, the conversion server 150 initiates execution of the email.jsp program file. Under the direction of email.jsp, the conversion server 150 queries the voice browser 110 for the user name and password of the requesting user (step 516) and stores the returned user information Userlnfo within memory 316. The program email.jsp then calls function EmailFromUser, which forms a connection to ServerAddress based upon the Transport Control Protocol (TCP) via dedicated communication link 334 (step 520). The function EmailFromUser then invokes the method CheckEmail and furnishes the parameters ServerAddress, Protocol, and
  • TCP Transport Control Protocol
  • CheckEmail forwards Userlnfo over communication link 334 to the proprietary database 142 in accordance with RFC 822 (step 524).
  • the proprietary database 142 returns status information (e.g., number of new messages) for the requesting user to the conversion server 150 (step 528). This status information is then converted by the conversion server
  • step 532 The resultant initial file of converted information is then provided to the voice browser 110 over the Internet by the network interface 310 of the conversion server 150 (step 538). Dialogue between the voice browser 110 and the user of the subscriber unit may then continue as follows based upon the initial file of converted information (step 542):
  • CheckEmail Upon forwarding the initial file of converted information to the voice browser 110, CheckEmail again forms a connection to the proprietary database 142 over dedicated communication link 334 and retrieves the content of the requesting user's new messages in accordance with RFC 822 (step 544).
  • the retrieved message content is converted by the conversion server 150 into a format consistent with the protocol of the voice browser 110 using techniques described below (step 546).
  • the resultant additional file of converted information is then provided to the voice browser 110 over the Internet by the network interface 310 of the conversion server 150 (step 548).
  • the voice browser 110 then recites the retrieved message content to the requesting user in accordance with the applicable voice-based protocol based upon the additional file of converted information (step 552).
  • FIG. 6 is a flow chart representative of operation of the conversion server 150 in accordance with the present invention.
  • a source code listing of a top-level convert routine forming part of an exemplary software implementation of the conversion operation illustrated by FIG. 6 is contained in Appendix A.
  • Appendix B provides an example of conversion of a WML-based document into VoiceXML-based grammatical structure in accordance with the present invention.
  • the conversion server 150 receives one or more requests for Web content transmitted by the voice browser 110 via the Internet 130 using conventional protocols (i.e., HTTP and TCP/EP).
  • the conversion module 330 determines whether the format of the requested
  • Web site corresponds to one of a number of predefined formats (e.g., WML) readily convertible into the protocol of the voice browser 110 (step 606). If not, then the URL database 320 is accessed in order to determine whether there exists a version of the requested Web site formatted consistently with one of the predefined formats (step 608). If not, an error is returned (step 610) and processing of the request for content is terminated
  • predefined formats e.g., WML
  • step 612 Web content is retrieved by the retrieval module 310 of the conversion server 150 from the applicable content server 140 hosting the identified Web site (step 614).
  • the parser 340 is invoked to parse the retrieved content using the DTD applicable to the format of the retrieved content (step 616).
  • an error message is returned (step 620) and processing is terminated (step 622).
  • a root node of the DOM representation of the retrieved content generated by the parser 340 i.e., the parse tree
  • the root node is then classified into one of a number of predefined classifications (step 624).
  • each node of the parse tree is assigned to one of the following classifications: Attribute, CDATA, Document Fragment, Document Type, Comment, Element, Entity Reference, Notation, Processing Instruction, Text.
  • the content of the root node is then processed in accordance with its assigned classification in the manner described below (step 628).
  • step 630 If all nodes within two tree levels of the root node have not been processed (step 630), then the next node of the parse tree generated by the parser 340 is identified (step 634). If not, conversion of the desired portion of the retrieved content is deemed completed and an output file containing such desired converted content is generated.
  • step 634 If the node of the parse tree identified in step 634 is within two levels of the root node (step 636), then it is determined whether the identified node includes any child nodes (step 638). If not, the identified node is classified (step 624). If so, the content of a first of the child nodes of the identified node is retrieved (step 642). This child node is assigned to one of the predefined classifications described above (step 644) and is processed accordingly (step 646). Once all child nodes of the identified node have been processed (step 648), the identified node (which corresponds to the root node of the subtree containing the processed child nodes) is itself retrieved (step 650) and assigned to one of the predefined classifications (step 624).
  • Appendix C contains a source code listing for a TraverseNode function which implements various aspects of the node traversal and conversion functionality described with reference to FIG. 6.
  • Appendix D includes a source code listing of a ConvertAtr function, and of a ConverTag function referenced by the TraverseNode function, which collectively operate to WML tags and attributes to corresponding
  • FIGS. 7A and 7B are collectively a flowchart illustrating an exemplary process for transcoding a parse tree representation of an WML-based document into an output document comporting with the VoiceXML protocol.
  • FIG. 7 describes the inventive transcoding process with specific reference to the WML and VoiceXML protocols, the process is also applicable to conversion between other visual-based and voice-based protocols.
  • step 702 a root node of the parse tree for the target WML document to be transcoded is retrieved.
  • the type of the root node is then determined and, based upon this identified type, the root node is processed accordingly.
  • the conversion process determines whether the root node is an attribute node (step 706), a
  • CDATA node (step 708), a document fragment node (step 710), a document type node (step 712), a comment node (step 714), an element node (step 716), an entity reference node (step 718), a notation node (step 720), a processing instruction node (step 722), or a text node (step 724).
  • the node is processed by extracting the relevant CDATA information (step 728).
  • the CDATA information is acquired and directly inco ⁇ orated into the converted document without modification (step 730).
  • An exemplary WML-based CDATA block and its corresponding representation in VoiceXML is provided below.
  • step 716 If it is established that the root node is an element node (step 716), then processing proceeds as depicted in FIG. 7B (step 732). If a Select tag is found to be associated with the root node (step 734), then a new menu item is created based upon the data comprising the identified select tag (step 736). Any grammar necessary to ensure that the new menu item comports with the VoiceXML protocol is then added (step 738).
  • the operations defined by the WML-based Select tag are mapped to corresponding operations presented through the VoiceXML-based Menu tag.
  • the Select tag is typically utilized to specify a visual list of user options and to define corresponding actions to be taken depending upon the option selected.
  • a Menu tag in VoiceXML specifies an introductory message and a set of spoken prompts corresponding to a set of choices.
  • the Menu tag also specifies a corresponding set of possible responses to the prompts, and will typically also specify a URL to which a user is directed upon selecting a particular choice.
  • a grammar for matching the "title" text of the grammatical structure defined by a Menu tag may be activated upon being loaded. When a word or phrase which matches the title text of a Menu tag is spoken by a user, the user is directed to the grammatical structure defined by the Menu tag.
  • the main menu may serve as the top-level menu which is heard first when the user initiates a session using the voice browser 110.
  • the Enumerate tag inside the Menu tag automatically builds a list of words from identified by the Choice tags (i.e., "Cnet news”, “V-enable”, “Yahoo stocks", and "Visit Wireless Knowledge”.
  • the Prompt tag then causes it to prompt the user with following text
  • the user may select any of the choices by speaking a command consistent with the technology used by the voice browser 110.
  • the allowable commands may include various "attention" phrases (e.g., "go to” or "select") followed by the prompt words corresponding to various choices (e.g.,
  • any "child" tags of the Select tag are then processed as was described above with respect to the original "root” node of the parse tree and accordingly converted into VoiceXML-based grammatical structures (step 740).
  • the information associated with the next unprocessed node of the parse tree is retrieved (step 744).
  • step 746 the identified node is processed in the manner described above beginning with step 706.
  • an XML-based tag (including, e.g., a Select tag) may be associated with one or more subsidiary "child" tags.
  • every XML- based tag (except the tag associated with the root node of a parse tree) is also associated with a parent tag. The following XML-based notation exemplifies this parent/child relationship:
  • the parent tag is associated with two child tags (i.e., ch ⁇ dl and child!).
  • tag ch ⁇ dl has a child tag denominated grandchildl .
  • the Select tag is the parent of the
  • Option tag and the Option tag is the child of the Select tag.
  • Prompt and Choice tags are children of the Menu tag
  • Various types of information are typically associated with each parent and child tag. For example, list of various types of attributes are commonly associated with certain types of tags. Textual information associated with a given tag may also be encapsulated between the "start" and "end” tagname markings defining a tag structure (e.g., " ⁇ /tagname>”), with the specific semantics of the tag being dependent upon the type of tag.
  • An accepted structure for a WML-based tag is set forth below:
  • ⁇ tagname attribute 1 value attr ⁇ bute2- value > text information ⁇ /tagname>
  • step 750 if an "A" tag is determined to be associated with the element node (step 750), then a new field element and associated grammar are created (step 750).
  • the WML-based textual representation of "Hello” and “Next” are converted into a VoiceXML-based representation pursuant to which they are audibly presented. If the user utters "Hello” in response, control passes to the same link as was referenced by the WML "A” tag. If instead "Next" is spoken, then VoiceXML processing begins after the " ⁇ /field>” tag.
  • the template element is processed by converting it to a VoiceXML-based Link element (step 756)
  • the WML tag is converted to VoiceXML (step 760).
  • step 744 the next node in the parse tree is obtained and processing is continued at step 744 in the manner described above (step
  • each child node within the subtree of the parse tree formed by considering the element node to be the root node of the subtree is then processed beginning at step 706 in the manner described above (step 766).
  • Enumeration enum problems .elements () ; while (enum.hasMoreElements () ) out .write ( (String) enum.nextElement ( ) ) ,-
  • the following set of WML tags may be converted to VoiceXML tags of analogous function in accordance with Table Bl below.
  • a VoiceXML-based tag and any required ancillary grammar is directly substituted for the corresponding WML-based tag in accordance with Table Al .
  • Table Al a VoiceXML-based tag and any required ancillary grammar is directly substituted for the corresponding WML-based tag in accordance with Table Al .
  • additional processing is required to accurately map the information from the WML-based tag into a VoiceXML-based grammatical structure comprised of multiple VoiceXML elements.
  • the following exemplary block of VoiceXML elements may be utilized to emulate the functionality of the to the WML-based Template tag in the voice domain.
  • wml > Entertainment and Media ⁇ choice>
  • wml > Personal Technology ⁇ /choice>
  • CDATA_SECTION_NODE ⁇ buffer . append ( " ⁇ ! [CDATA [ “ ) ,- buffer. append (el .getNodeValue () ) ; buffer. append ("] ] >”) ; writeBuffe (buffer) ; break;
  • DOCUMENT_FRAGMENT_NODE ⁇ break
  • ResourceBundle rbd new WMLTagResourceBundle () ; try ⁇ return rbd. getString (wapelement) ; ⁇ catch (MissingResourceException e) ⁇ return " " ;
  • ResourceBundle rbd new WMLAtrResourceBundle () ;
  • searchTag searchTag
  • Purpose process a menu node, it converts a select list into an equivalent menu in vxml .
  • menuItem.addElement el2.getNodeValue () ) ;
  • StringBuffer linkString new StringBuffer () ;
  • StringBuffer link new StringBuffer () ;
  • StringBuffer promptStr new StringBuffer () ;
  • StringBuffer linkGrammar new StringBuffer () ;
  • Attr attr (Attr) nm. item (j ) ; if (attr . getNodeName ( ) . equals ( "href " ) ) ⁇ nextStr .append (" ⁇ goto " +ConvertAtr (el. getNodeName () , attr .getNodeName ( ) , attr .getNodeValue () ) +"/> ⁇ n”) ;
  • a voice browser system including a subscriber unit in communication with a voice browser through a telecommunications network has been described herein.
  • the voice browser obtains the requested content directly from the compliant Web site.
  • the voice browser issues a browsing request for such content to a conversion server using syntax substantially similar to that employed in making direct requests to compliant Web sites. That is, the voice browser is advantageously not required to operate in different modes when presented with requests for Web content of disparate formats.
  • the conversion server In response to browsing requests issued by the voice browser, the conversion server will attempt to identify a version of the requested Web site formatted in accordance with protocols suitable for serving content to devices having limited display capabilities (e.g., handheld or portable devices). The conversion server then preferably retrieves content from such a suitably formatted version of the requested Web site and converts this content into a document file compliant with the protocol of the voice browser.
  • the converted document file is then provided by the conversion server to the voice browser, which uses this file to effect a dialogue conforming to the applicable protocol with the requesting user.

Abstract

A method for retrieving content from one or more remote information sources (140a,140b,140c) is disclosed herein. The inventive method contemplates transmitting a user request over a communication link to a voice browser (110) operative in accordance with a voice-based protocol. In response, a browsing request identifying a remote information source (140a,140b,140c) corresponding to the user request is generated. Content formatted in accordance with a predefined protocol is then retrieved from the remote information source (140a,140b,140c) in accordance with the browsing request. The retrieved content is converted into a file of information formatted in compliance with the voice-based protocol. A response is provided to the user request on the basis of the file of converted information.

Description

INFORMATION RETRIEVAL SYSTEM INCLUDING VOICE BROWSER AND DATA CONVERSION SERVER
FIELD OF THE INVENTION The present invention relates to the field of browsers used for accessing data in distributed computing environments and, in particular, to techniques for accessing such data using Web browsers controlled at least in part through voice commands.
BACKGROUND OF THE INVENTION As is well known, the World Wide Web, or simply "the Web", is comprised of a large and continuously growing number of accessible Web pages. In the Web environment, clients request Web pages from Web servers using the Hypertext Transfer Protocol ("HTTP"). HTTP is a protocol which provides users access to files including text, graphics, images, and sound using a standard page description language known as the Hypertext Markup Language ("HTML"). HTML provides document formatting allowing the developer to specify links to other servers in the network. A Uniform Resource Locator (URL) defines the path to Web site hosted by a particular Web server.
The pages of Web sites are typically accessed using an HTML-compatible browser (e.g., Netscape Navigator or Internet Explorer) executing on a client machine. The browser specifies a link to a Web server and particular Web page using a URL. When the user of the browser specifies a link via a URL, the client issues a request to a naming service to map a hostname in the URL to a particular network IP address at which the server is located. The naming service returns a list of one or more IP addresses that can respond to the request. Using one of the IP addresses, the browser establishes a connection to a Web server. If the Web server is available, it returns a document or other object formatted according to HTML.
As Web browsers become the primary interface for access to many network and server services, Web applications in the future will need to interact with many different types of client machines including, for example, conventional personal computers and recently developed "thin" clients. Thin clients can range between 60 inch TV screens to handheld mobile devices. This large range of devices creates a need to customize the display of Web page information based upon the characteristics of the graphical user interface ("GUI") of the client device requesting such information. Using conventional technology would most likely require that different HTML pages or scripts be written in order to handle the GUI and navigation requirements of each client environment.
Client devices differ in their display capabilities, e.g., monochrome, color, different color palettes, resolution, sizes. Such devices also vary with regard to the peripheral devices that may be used to provide input signals or commands (e.g., mouse and keyboard, touch sensor, remote control for a TV set-top box). Furthermore, the browsers executing on such client devices can vary in the languages supported, (e.g., HTML, dynamic HTML, XML, Java, JavaScript). Because of these differences, the experience of browsing the same Web page may differ dramatically depending on the type of client device employed. The inability to adjust the display of Web pages based upon a client's capabilities and environment causes a number of problems. For example, a Web site may simply be incapable of servicing a particular set of clients, or may make the Web browsing experience confusing or unsatisfactory in some way. Even if the developers of a Web site have made an effort to accommodate a range of client devices, the code for the Web site may need to be duplicated for each client environment. Duplicated code consequently increases the maintenance cost for the Web site. In addition, different URLs are frequently required to be known in order to access the Web pages formatted for specific types of client devices.
In addition to being satisfactorily viewable by only certain types of client devices, content from Web pages has been generally been inaccessible to those users not having a personal computer or other hardware device similarly capable of displaying Web content.
Even if a user possesses such a personal computer or other device, the user needs to have access to a connection to the Internet. In addition, those users having poor vision or reading skills are likely to experience difficulties in reading text-based Web pages. For these reasons, efforts have been made to develop Web browsers for facilitating non-visual access to Web pages for users that wish to access Web-based information or services through a telephone. Such non-visual Web browsers, or "voice browsers", present audio output to a user by converting the text of Web pages to speech and by playing pre-recorded Web audio files from the Web. A voice browser also permits a user to navigate between Web pages by following hypertext links, as well as to choose from a number of pre-defined links, or "bookmarks" to selected Web pages. In addition, certain voice browsers permit users to pause and resume the audio output by the browser.
A particular protocol applicable to voice browsers appears to be gaining acceptance as an industry standard. Specifically, the Voice extensible Markup Language ("VoiceXML") is a markup language developed specifically for voice applications useable over the Web, and is described at http://www.voicexml.org. VoiceXML defines an audio interface through which users may interact with Web content, similar to the manner in which the Hypertext Markup Language ("HTML") specifies the visual presentation of such content. In this regard VoiceXML includes intrinsic constructs for tasks such as dialogue flow, grammars, call transfers, and embedding audio files.
Unfortunately, the VoiceXML standard generally contemplates that VoiceXML- compliant voice browsers interact exclusively with Web content of the VoiceXML format. This has limited the utility of existing VoiceXML-compliant voice browsers, since a relatively small percentage of Web sites include content formatted in accordance with
VoiceXML. In addition to the large number of HTML-based Web sites, Web sites serving content conforming to standards applicable to particular types of user devices are becoming increasingly prevalent. For example, the Wireless Markup Language ("WML") of the Wireless Application Protocol ("WAP") (see, e.g., http://www.wapforum.orgA) provides a standard for developing content applicable to wireless devices such as mobile telephones, pagers, and personal digital assistants. Some lesser-known standards for Web content include the Handheld Device Markup Language ("HDML"), and the relatively new Japanese standard Compact HTML.
The existence of myriad formats for Web content complicates efforts by corporations and other organizations make Web content accessible to substantially all Web users. That is, the ever increasing number of formats for Web content has rendered it time consuming and expensive to provide Web content in each such format. Accordingly, it would be desirable to provide a technique for enabling existing Web content to be accessed by standardized voice browsers, irrespective of the format of such content.
SUMMARY OF THE INVENTION
In summary, the present invention relates to a method for retrieving information from remote information sources. The inventive method contemplates transmitting a user request over a communication link to a voice browser operative in accordance with a voice- based protocol. In response, a browsing request identifying a remote information source corresponding to the user request is generated. Content formatted in accordance with a predefined protocol is then retrieved from the remote information source in accordance with the browsing request. The retrieved content is converted into a file of information formatted in compliance with the voice-based protocol. A response is then provided to the user request on the basis of the file of converted information.
The present invention also pertains to a system for retrieving information from remote information sources. The system includes a voice browser operating in accordance with a voice-based protocol. The voice browser is disposed to receive a user request transmitted over a communication link and to generate a browsing request in response to the user request. The system further includes a conversion server in communication with the voice browser. The conversion server includes a retrieval module for retrieving content from a remote information source in accordance with the browsing request. The retrieved content is formatted in accordance with a predefined protocol, and is converted by a conversion module of the conversion server into a file of converted information compliant with the voice-based protocol. The file of converted information is then provided to the voice browser through an interface of the conversion server.
In another aspect, the present invention is directed to a conversion server responsive to browsing requests issued by a browser unit operative in accordance with a first protocol.
The conversion server includes a retrieval module for retrieving web page information from a web site in accordance with a first browsing request issued by the browsing unit. The retrieved web page information is formatted in accordance with a second protocol different from the first protocol. A conversion module serves to convert at least a primary portion of the web page information into a primary file of converted information compliant with the first protocol. The conversion server also includes an interface module for providing said primary file of converted information to said browsing unit.
The present invention also relates to a method for facilitating browsing of the Internet. The method includes receiving a browsing request from a browser unit operative in accordance with a first protocol, wherein the browsing request is issued by the browser unit in response to a first user request for web content. Web page information, formatted in accordance with a second protocol different from said first protocol, is retrieved from a web site in accordance with the browsing request. The method further includes converting at least a primary portion of the web page information into a primary file of converted information compliant with the first protocol. BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the nature of the features of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
FIG. 1 provides a schematic diagram of a system for accessing Web content using a voice browser system in accordance with the present invention.
FIG. 2 shows a block diagram of a voice browser included within the system of FIG. 1. FIG. 3 is a functional block diagram of a conversion server included within the voice browser system of the present invention.
FIG. 4 is a flow chart representative of operation of the system of the present invention in furnishing Web content to a requesting user.
FIG. 5 is a flow chart representative of operation of the system of the present invention in providing content from a proprietary database to a requesting user.
FIG. 6 is a flow chart representative of operation of the conversion server.
FIG. 7A and 7B are collectively a flowchart illustrating an exemplary process for transcoding a parse tree representation of WML-based document into an output document comporting with the VoiceXML protocol.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 provides a schematic diagram of a system 100 for accessing Web content using a voice browser in accordance with the present invention. The system 100 includes a telephonic subscriber unit 102 in communication with a voice browser 110 through a telecommunications network 120. In a preferred embodiment the voice browser 110 executes dialogues with a user of the subscriber unit 102 on the basis of document files comporting with a known speech mark-up language (e.g., VoiceXML). The voice browser 110 initiates, in response to requests for content submitted through the subscriber unit 102, the retrieval of information forming the basis of certain such document files from remote information sources. Such remote information sources may comprise, for example, Web servers 140 and one or more databases represented by proprietary database 142. As is described hereinafter, the voice browser 1 10 initiates such retrieval by issuing a browsing request either directly to the applicable remote information source or to a conversion server 150. In particular, if the request for content pertains to a remote information source operative in accordance with the protocol applicable to the voice browser 1 10 (e.g., VoiceXML), then the voice browser 110 issues a browsing request directly to the remote information source of interest. For example, when the request for content pertains to a Web site formatted consistently with the protocol of the voice browser 110, a document file containing such content is requested by the voice browser 110 via the Internet 130 directly from the Web server 140 hosting the Web site of interest. On the other hand, when a request for content issued through the subscriber unit 102 identifies a Web site formatted inconsistently with the voice browser 110, the voice browser 1 10 issues a corresponding browsing request to a conversion server 150. In response, the conversion server 150 retrieves content from the Web server 140 hosting the Web site of interest and converts this content into a document file compliant with the protocol of the voice browser 110. The converted document file is then provided by the conversion server 150 to the voice browser 110, which then uses this file to effect a dialogue conforming to the applicable voice-based protocol with the user of subscriber unit 102. Similarly, when a request for content identifies a proprietary database 142, the voice browser 1 10 issues a corresponding browsing request to the conversion server 150. In response, the conversion server 150 retrieves content from the proprietary database 142 and converts this content into a document file compliant with the protocol of the voice browser 1 10. The converted document file is then provided to the voice browser 1 10 and used as the basis for carrying out a dialogue with the user of subscriber unit 102.
As shown in FIG. 1 , the subscriber unit 102 is in communication with the voice browser 110 via the telecommunications network 120. The subscriber unit 102 has a keypad (not shown) and associated circuitry for generating Dual Tone MultiFrequency (DTMF) tones. The subscriber unit 102 transmits DTMF tones to, and receives audio output from, the voice browser 110 via the telecommunications network 120. In FIG. 1, the subscriber unit 102 is exemplified with a mobile station and the telecommunications network 120 is represented as including a mobile communications network and the Public
Switched Telephone Network ("PSTN"). However, the voice-based information retrieval services offered by the system 100 can be accessed by subscribers through a variety of other types of devices and networks. For example, the voice browser 110 may be accessed through the PSTN from, for example, a stand-alone telephone 104 (either analog or digital), or from a node on a PBX (not shown). In addition, a personal computer 106 or other handheld or portable computing device disposed for voice over EP communication may access the voice browser 110 via the Internet 130. FIG. 2 shows a block diagram of the voice browser 110. The voice browser 110 includes certain standard server computer components, including a network connection device 202, a CPU 204 and memory (primary and/or secondary) 206. The voice browser 110 also includes telephony infrastructure 226 for effecting communication with telephony- based subscriber units (e.g., the mobile subscriber unit 102 and landline telephone 104). As is described below, the memory 206 stores a set of computer programs to implement the processing effected by the voice browser 110. One such program stored by memory 206 comprises a standard communication program 208 for conducting standard network communications via the Internet 130 with the conversion server 150 and any subscriber units operating in a voice over EP mode (e.g., personal computer 106). As shown, the memory 206 also stores a voice browser interpreter 200 and an interpreter context module 210. In response to requests from, for example, subscriber unit 102 for Web or proprietary database content formatted inconsistently with the protocol of the voice browser 110, the voice browser interpreter 200 initiates establishment of a communication channel via the Internet 130 with the conversion server 150. The voice browser 110 then issues, over this communication channel and in accordance with conventional Internet protocols (i.e., HTTP and TCP/EP), browsing requests to the conversion server 150 corresponding to the requests for content submitted by the requesting subscriber unit. The conversion server 150 retrieves the requested Web or proprietary database content in response to such browsing requests and converts the retrieved content into document files in a format (e.g., VoiceXML) comporting with the protocol of the voice browser 110. The converted document files are then provided to the voice browser 110 over the established Internet communication channel and utilized by the voice browser interpreter 200 in carrying out a dialogue with a user of the requesting unit. During the course of this dialogue the interpreter context module 210 uses conventional techniques to identify requests for help and the like which may be made by the user of the requesting subscriber unit. For example, the interpreter context module 210 may be disposed to identify predefined "escape" phrases submitted by the user in order to access menus relating to, for example, help functions or various user preferences (e.g., volume, text-to-speech characteristics).
Referring to FIG. 2, audio content is transmitted and received by telephony infrastructure 226 under the direction of a set of audio processing modules 228. Included among the audio processing modules 228 are a text-to-speech ("TTS") converter 230, an audio file player 232, and a speech recognition module 234. In operation, the telephony infrastructure 226 is responsible for detecting an incoming call from a telephony-based subscriber unit and for answering the call (e.g., by playing a predefined greeting). After a call from a telephony-based subscriber unit has been answered, the voice browser interpreter 200 assumes control of the dialogue with the telephony-based subscriber unit via the audio processing modules 228. In particular, audio requests from telephony-based subscriber units are parsed by the speech recognition module 234 and passed to the voice browser inteφreter 200. Similarly, the voice browser inteφreter 200 communicates information to telephony-based subscriber units through the text-to-speech converter 230. The telephony infrastructure 226 also receives audio signals from telephony-based subscriber units via the telecommunications network 120 in the form of DTMF signals. The telephony infrastructure 226 is able to detect and inteφret the DTMF tones sent from telephony-based subscriber units. Inteφreted DTMF tones are then transferred from the telephony infrastructure to the voice browser inteφreter 200. After the voice browser inteφreter 200 has retrieved a VoiceXML document from the conversion server 150 in response to a request from a subscriber unit, the retrieved VoiceXML document forms the basis for the dialogue between the voice browser 110 and the requesting subscriber unit. In particular, text and audio file elements stored within the retrieved VoiceXML document are converted into audio streams in text-to-speech converter 230 and audio file player 232, respectively. When the request for content associated with these audio streams originated with a telephony-based subscriber unit, the streams are transferred to the telephony infrastructure 226 for adaptation and transmission via the telecommunications network 120 to such subscriber unit. In the case of requests for content from Internet-based subscriber units (e.g., the personal computer 106), the streams are adapted and transmitted by the network connection device 202.
The voice browser inteφreter 200 inteφrets each retrieved VoiceXML document in a manner analogous to the manner in which a standard Web browser inteφrets a visual markup language, such as HTML or WML. The voice browser inteφreter 200, however, inteφrets scripts written in a speech markup language such as VoiceXML rather than a visual markup language. In a preferred embodiment the voice browser 1 10 may be realized using, consistent with the teachings herein, a voice browser licensed from, for example, Nuance Communications of Menlo Park, California. Turning now to FIG. 3, a functional block diagram is provided of the conversion server 150. As is described below, the conversion server 150 operates to convert or transcode conventional structured document formats (e.g., HTML) into the format applicable to the voice browser 1 10 (e.g., VoiceXML). This conversion is generally effected by performing a predefined mapping of the syntactical elements of conventional structured documents harvested from Web servers 140 into corresponding equivalent elements contained within an XML-based file formatted in accordance with the protocol of the voice browser 1 10. The resultant XML-based file may include all or part of the "target" structured document harvested from the applicable Web server 140, and may also optionally include additional content provided by the conversion server 150. In the exemplary embodiment the target document is parsed, and identified tags, styles and content can either be replaced or removed.
The conversion server 150 may be physically implemented using a standard configuration of hardware elements including a CPU 314, a memory 316, and a network interface 310 operatively connected to the Internet 130. Similar to the voice browser 1 10, the memory 316 stores a standard communication program 318 to realize standard network communications via the Internet 130. In addition, the communication program 318 also controls communication occurring between the conversion server 150 and the proprietary database 142 by way of database interface 332. As is discussed below, the memory 316 also stores a set of computer programs to implement the content conversion process performed by the conversion module 150.
Referring to FIG. 3, the memory 316 includes a retrieval module 324 for controlling retrieval of content from Web servers 140 and proprietary database 142 in accordance with browsing requests received from the voice browser 110. In the case of requests for content from Web servers 140, such content is retrieved via network interface 310 from Web pages formatted in accordance with protocols particularly suited to portable, handheld or other devices having limited display capability (e.g., WML, Compact HTML, xHTML and HDML). As is discussed below, the locations or URLs of such specially formatted sites may be provided by the voice browser or may be stored within a URL database 320 of the conversion server 150. For example, if the voice browser 110 receives a request from a user of a subscriber unit for content from the "CNET" Web site, then the voice browser 110 may specify the URL for the version of the "CNET" site accessed by WAP-compliant devices (i.e., comprised of WML- formatted pages). Alternatively, the voice browser 1 10 could simply proffer a generic request for content from the "CNET" site to the conversion server
150, which in response would consult the URL database 320 to determine the URL of an appropriately formatted site serving "CNET" content.
The memory 316 of conversion server 150 also includes a conversion module 330 operative to convert the content collected under the direction of retrieval module 324 from Web servers 140 or the proprietary database 142 into corresponding VoiceXML documents.
As is described below, the retrieved content is parsed by a parser 340 of conversion module 330 in accordance with a document type definition ("DTD") corresponding to the format of such content. For example, if the retrieved Web page content is formatted in WML, the parser 340 would parse the retrieved content using a DTD obtained from the applicable standards body, i.e., the Wireless Application Protocol Forum, Ltd. (www.wapforum.org) into a parsed file. A DTD establishes a set of constraints for an XML-based document; that is, a DTD defines the manner in which an XML-based document is constructed. The resultant parsed file is generally in the form of a Domain Object Model ("DOM") representation, which is arranged in a tree-like hierarchical structure composed of a plurality of interconnected nodes (i.e., a "parse tree"). In the exemplary embodiment the parse tree includes a plurality of "child" nodes descending downward from its root node, each of which are recursively examined and processed in the manner described below.
A mapping module 350 within the conversion module 330 then traverses the parse tree and applies predefined conversion rules 363 to the elements and associated attributes at each of its nodes. In this way the mapping module 350 creates a set of corresponding equivalent elements and attributes conforming to the protocol of the voice browser 110. A converted document file (e.g., a VoiceXML document file) is then generated by supplementing these equivalent elements and attributes with grammatical terms to the extent required by the protocol of the voice browser 1 10. This converted document file is then provided to the voice browser 110 via the network interface 310 in response to the browsing request originally issued by the voice browser 110.
The conversion module 330 is preferably a general puφose converter capable of transforming the above-described structured document content (e.g., WML) into corresponding VoiceXML documents. The resultant VoiceXML content can then be delivered to users via any VoiceXML-compliant platform, thereby introducing a voice capability into existing structured document content. In a particular embodiment, a basic set of rules can be imposed to simplify the conversion of the structured document content into the VoiceXML format. An exemplary set of such rules utilized by the conversion module
330 may comprise the following:
1. If the structured document content (e.g., WML pages) comprises images, the conversion module 330 will discard the images and generate the necessary information for presenting the image.
2. If the structured document content comprises scripts, data or some other component not capable of being presented by voice, the conversion module 330 may generate appropriate warning messages or the like. The warning message will typically inform the user that the structured content contains a script or some component not capable of being converted to voice and that meaningful information may not be being conveyed to the user.
3. When the structured document content contains instructions similar or identical to those such as the WML-based SELECT LIST options, the conversion module 330 generates information for presenting the SELECT LIST or similar options into a menu list for audio representation. For example, an audio playback of "Please say news weather mail" could be generated for the SELCT LIST defining the three options of news, weather and mail.
4. Any hyperlinks in the structured document content are converted to reference the conversion module 330, and the actual link location passed to the conversion module as a parameter to the referencing hyperlink. In this way hyperlinks and other commands which transfer control may be voice-activated and converted to an appropriate voice-based format upon request. 5. Input fields within the structured content are converted to an active voice- based dialogue, and the appropriate commands and vocabulary added as necessary to process them.
6. Multiple screens of structured content (e.g., card-based WML screens) can be directly converted by the conversion module 330 into forms or menus of sequential dialogs. Each menu is a stand-alone component (e.g., performing a complete task such as receiving input data). The conversion module 330 may also include a feature that permits a user to interrupt the audio output generated by a voice platform (e.g., BeVocal, HeyAnita) prior to issuing a new command or input.
7. For all those events and "do" type actions similar to WML-based "OK", "Back" and "Done" operations, voice-activated commands may be employed to straightforwardly effect such actions.
8. In the exemplary embodiment the conversion module 330 operates to convert an entire page of structured content at once and to play the entire page in an uninterrupted manner. This enables relatively lengthy structured documents to be presented without the need for user intervention in the form of an audible "More" command or the equivalent.
FIG. 4 is a flow chart representative of an exemplary process 400 executed by the system 100 in providing content from Web servers 140 to a user of a subscriber unit. At step 402, the user of the subscriber unit places a call to the voice browser 1 10, which will then typically identify the originating user utilizing known techniques (step 404). The voice browser then retrieves a start page associated with such user, and initiates execution of an introductory dialogue with the user such as, for example, the dialogue set forth below (step 408). In what follows the designation "C" identifies the phrases generated by the voice browser 110 and conveyed to the user's subscriber unit, and the designation "U" identifies the words spoken or actions taken by such user.
C: "Welcome home, please say the name of the Web site which you would like to access"
U: "CNET dot com" C : "Connecting, please wait . .."
C: "Welcome to CNET, please say one of: sports; weather; business; news; stock quotes"
U: "Sports" The manner in which the system 100 processes and responds to user input during a dialogue such as the above will vary depending upon the characteristics of the voice browser 1 10. Referring again to FIG. 4, in a step 412 the voice browser checks to determine whether the requested Web site is of a format consistent with its own format
(e.g., VoiceXML). If so, then the voice browser 110 may directly retrieve content from the Web server 140 hosting the requested Web site (e.g., "vxml.cnet.com") in a manner consistent with the applicable voice-based protocol (step 416). If the format of the requested Web site (e.g., "cnet.com") is inconsistent with the format of the voice browser 110, then the intelligence of the voice browser 110 influences the course of subsequent processing. Specifically, in the case where the voice browser 1 10 maintains a database (not shown) of Web sites having formats similar to its own (step 420), then the voice browser
110 forwards the identity of such similarly formatted site (e.g., "wap.cnet.com") to the conversion server 150 via the Internet 130 in the manner described below (step 424). If such a database is not maintained by the voice browser 1 10, then in a step 428 the identity of the requested Web site itself (e.g., "cnet.com") is similarly forwarded to the conversion server 150 via the Internet 130. In the latter case the conversion server 150 will recognize that the format of the requested Web site (e.g., HTML) is dissimilar from the protocol of the voice browser 1 10, and will then access the URL database 320 in order to determine whether there exists a version of the requested Web site of a format (e.g., WML) more easily convertible into the protocol of the voice browser 1 10. In this regard it has been found that display protocols adapted for the limited visual displays characteristic of handheld or portable devices (e.g., WAP, HDML, iMode, Compact HTML or XML) are most readily converted into generally accepted voice-based protocols (e.g., VoiceXML), and hence the URL database 320 will generally include the URLs of Web sites comporting with such protocols. Once the conversion server 150 has determined or been made aware of the identity of the requested Web site or of a corresponding Web site of a format more readily convertible to that of the voice browser 1 10, the conversion server 150 retrieves and converts Web content from such requested or similarly formatted site in the manner described below(step 432). In accordance with the invention, the voice-browser 110 is disposed to use substantially the same syntactical elements in requesting the conversion server 150 to obtain content from Web sites not formatted in conformance with the applicable voice-based protocol as are used in requesting content from Web sites compliant with the protocol of the voice browser 110. In the case where the voice browser 110 operates in accordance with the VoiceXML protocol, it may issue requests to Web servers 140 compliant with the VoiceXML protocol using, for example, the syntactical elements goto, choice, link and submit. As is described below, the voice browser 110 may be configured to request the conversion server 150 to obtain content from inconsistently formatted Web sites using these same syntactical elements. For example, the voice browser 110 could be configured to issue the following type of goto when requesting Web content through the conversion server 150:
<goto next= ttp //ConSeverAddress\poτt/Fϊlename?URL=ContentAddress&Protocol/> where the variable ConSeverAddress within the next attribute of the goto element is set to the EP address of the conversion server 150, the variable Filename is set to the name of a conversion script (e.g., conversion.jsp) stored on the conversion server 150, the variable ContentAddress is used to specify the destination URL (e.g., "wap.cnet.com") of the Web server 140 of interest, and the variable Protocol identifies the format (e.g., WAP) of such content server. The conversion script is typically embodied in a file of conventional format
(e.g., files of type "jsp", ".asp" or ".cgi"). Once this conversion script has been provided with this destination URL, Web content is retrieved from the applicable Web server 140 and converted by the conversion script into the VoiceXML format per the conversion process described below. The voice browser 110 may also request Web content from the conversion server
150 using the choice element defined by the VoiceXML protocol. Consistent with the VoiceXML protocol, the choice element is utilized to define potential user responses to queries posed within a menu construct. In particular, the menu construct provides a mechanism for prompting a user to make a selection, with control over subsequent dialogue with the user being changed on the basis of the user's selection. The following is an exemplary call for Web content which could be issued by the voice browser 110 to the conversion server 150 using the choice element in a manner consistent with the invention: <choice next- ' ttri://ConSeverAddress:OOTtJConversion.\sx)?UR =ContentAddress&Protocoi >
The voice browser 110 may also request Web content from the conversion server 150 using the link element, which may be defined in a VoiceXML document as a child of the vxml or form constructs. An example of such a request based upon a link element is set forth below:
<link next- ' Corwexs\ n.ispl\y^L=ContentAddress8cProtocoir>
Finally, the submit element is similar to the goto element in that its execution results in procurement of a specified VoiceXML document. However, the submit element also enables an associated list of variables to be submitted to the identified Web server 140 by way of an HTTP GET or POST request. An exemplary request for Web content from the conversion server 150 using a submit expression is given below:
<submit next="htttp ://http :l I ConSever Address :port//Conversion.j sp? RL=Content Address & Protocol method=""post" namelist- 'szte protocol" l>
where the method attribute of the submit element specifies whether an HTTP GET or POST method will be invoked, and where the namelist attribute identifies a site protocol variable forwarded to the conversion server 150. The site protocol variable is set to the formatting protocol applicable to the Web site specified by the ContentAddress variable.
As is described in detail below, the conversion server 150 operates to retrieve and convert Web content from the Web servers 140 in a unique and efficient manner (step 432). This retrieval process preferably involves collecting Web content not only from a "root" or
"main" page of the Web site of interest, but also involves "prefetching" content from "child" or "branch" pages likely to be accessed from such main page (step 440). In a preferred implementation the content of the retrieved main page is converted into a document file having a format consistent with that of the voice browser 110. This document file is then provided to the voice browser 110 over the Internet by the interface
310 of the conversion server 150, and forms the basis of the continuing dialogue between the voice browser 110 and the requesting user (step 444). The conversion server 150 also immediately converts the "prefectched" content from each branch page into the format utilized by the voice browser 110 and stores the resultant document files within a prefetch cache 370 (step 450). When a request for content from such a branch page is issued to the voice browser 110 through the subscriber unit of the requesting user, the voice browser 110 forwards the request in the above-described manner to the conversion server 150. The document file corresponding to the requested branch page is then retrieved from the prefetch cache 370 and provided to the voice browser 110 through the network interface 310. Upon being received by the voice browser 110, this document file is used in continuing a dialogue with the user of subscriber unit 102 (step 454). It follows that once the user has begun a dialogue with the voice browser 110 based upon the content of the main page of the requested Web site, such dialogue may continue substantially uninterrupted when a transitions is made to one of the prefetched branch pages of such site. This approach advantageously minimizes the delay exhibited by the system 100 in responding to subsequent user requests for content once a dialogue has been initiated.
FIG. 5 is a flow chart representative of operation of the system 100 in providing content from proprietary database 142 to a user of a subscriber unit. In the exemplary process 500 represented by FIG. 5, the proprietary database 142 is assumed to comprise a message repository included within a text-based messaging system (e.g., an electronic mail system) compliant with the ARPA standard set forth in Requests for Comments (RFC) 822, which is entitled "RFC822: Standard for ARPA Internet Text Messages" and is available at, for example, www.w3.org/Protocols/rfc822/Overview.html. Referring to FIG. 5, at a step 502 a user of a subscriber unit places a call to the voice browser 110. The originating user is then identified by the voice browser 110 utilizing known techniques (step 504). The voice browser 110 then retrieves a start page associated with such user, and initiates execution of an introductory dialogue with the user such as, for example, the dialogue set forth below (step 508).
C 'What do you want to do?" u 'Check Email"
C 'Please wait"
In response to the user's request to "Check Email", the voice browser 110 issues a browsing request to the conversion server 150 in order to obtain information applicable to the requesting user from the proprietary database 142 (step 514). In the case where the voice browser 110 operates in accordance with the VoiceXML protocol, it issues such browsing request using the syntactical elements goto, choice, link and submit in a substantially similar manner as that described above with reference to FIG. 4. For example, the voice browser 110 could be configured to issue the following type of goto when requesting information from the proprietary database 142 through the conversion server 150:
<goto next=http:/ '/ 'ConServerAddress χiortlemail.isp1=Server -Address &Protocol/> where email.jsp is a program file stored within memory 316 of the conversion server 150, ServerAddress is a variable identifying the address of the proprietary database 142 (e.g., mail. V-Enable.com), and Protocol is a variable identifying the format of the database 142 (e.g., POP3).
Upon receiving such a browsing request from the voice browser 110, the conversion server 150 initiates execution of the email.jsp program file. Under the direction of email.jsp, the conversion server 150 queries the voice browser 110 for the user name and password of the requesting user (step 516) and stores the returned user information Userlnfo within memory 316. The program email.jsp then calls function EmailFromUser, which forms a connection to ServerAddress based upon the Transport Control Protocol (TCP) via dedicated communication link 334 (step 520). The function EmailFromUser then invokes the method CheckEmail and furnishes the parameters ServerAddress, Protocol, and
Userlnfo to such method during the invocation process. Upon being invoked, CheckEmail forwards Userlnfo over communication link 334 to the proprietary database 142 in accordance with RFC 822 (step 524). In response, the proprietary database 142 returns status information (e.g., number of new messages) for the requesting user to the conversion server 150 (step 528). This status information is then converted by the conversion server
150 into a format consistent with the protocol of the voice browser 110 using techniques described below (step 532). The resultant initial file of converted information is then provided to the voice browser 110 over the Internet by the network interface 310 of the conversion server 150 (step 538). Dialogue between the voice browser 110 and the user of the subscriber unit may then continue as follows based upon the initial file of converted information (step 542):
C: "You have 3 new messages" C: " First message"
Upon forwarding the initial file of converted information to the voice browser 110, CheckEmail again forms a connection to the proprietary database 142 over dedicated communication link 334 and retrieves the content of the requesting user's new messages in accordance with RFC 822 (step 544). The retrieved message content is converted by the conversion server 150 into a format consistent with the protocol of the voice browser 110 using techniques described below (step 546). The resultant additional file of converted information is then provided to the voice browser 110 over the Internet by the network interface 310 of the conversion server 150 (step 548). The voice browser 110 then recites the retrieved message content to the requesting user in accordance with the applicable voice-based protocol based upon the additional file of converted information (step 552).
FIG. 6 is a flow chart representative of operation of the conversion server 150 in accordance with the present invention. A source code listing of a top-level convert routine forming part of an exemplary software implementation of the conversion operation illustrated by FIG. 6 is contained in Appendix A. In addition, Appendix B provides an example of conversion of a WML-based document into VoiceXML-based grammatical structure in accordance with the present invention. Referring to step 602 of FIG. 6, the conversion server 150 receives one or more requests for Web content transmitted by the voice browser 110 via the Internet 130 using conventional protocols (i.e., HTTP and TCP/EP). The conversion module 330 then determines whether the format of the requested
Web site corresponds to one of a number of predefined formats (e.g., WML) readily convertible into the protocol of the voice browser 110 (step 606). If not, then the URL database 320 is accessed in order to determine whether there exists a version of the requested Web site formatted consistently with one of the predefined formats (step 608). If not, an error is returned (step 610) and processing of the request for content is terminated
(step 612). Once the identity of the requested Web site or of a counteφart Web site of more appropriate format has been determined, Web content is retrieved by the retrieval module 310 of the conversion server 150 from the applicable content server 140 hosting the identified Web site (step 614). Once the identified Web-based or other content has been retrieved by the retrieval module 310, the parser 340 is invoked to parse the retrieved content using the DTD applicable to the format of the retrieved content (step 616). In the event of a parsing error (step 618), an error message is returned (step 620) and processing is terminated (step 622). A root node of the DOM representation of the retrieved content generated by the parser 340, i.e., the parse tree, is then identified (step 623). The root node is then classified into one of a number of predefined classifications (step 624). In the exemplary embodiment each node of the parse tree is assigned to one of the following classifications: Attribute, CDATA, Document Fragment, Document Type, Comment, Element, Entity Reference, Notation, Processing Instruction, Text. The content of the root node is then processed in accordance with its assigned classification in the manner described below (step 628). If all nodes within two tree levels of the root node have not been processed (step 630), then the next node of the parse tree generated by the parser 340 is identified (step 634). If not, conversion of the desired portion of the retrieved content is deemed completed and an output file containing such desired converted content is generated.
If the node of the parse tree identified in step 634 is within two levels of the root node (step 636), then it is determined whether the identified node includes any child nodes (step 638). If not, the identified node is classified (step 624). If so, the content of a first of the child nodes of the identified node is retrieved (step 642). This child node is assigned to one of the predefined classifications described above (step 644) and is processed accordingly (step 646). Once all child nodes of the identified node have been processed (step 648), the identified node (which corresponds to the root node of the subtree containing the processed child nodes) is itself retrieved (step 650) and assigned to one of the predefined classifications (step 624).
Appendix C contains a source code listing for a TraverseNode function which implements various aspects of the node traversal and conversion functionality described with reference to FIG. 6. In addition, Appendix D includes a source code listing of a ConvertAtr function, and of a ConverTag function referenced by the TraverseNode function, which collectively operate to WML tags and attributes to corresponding
VoiceXML tags and attributes.
FIGS. 7A and 7B are collectively a flowchart illustrating an exemplary process for transcoding a parse tree representation of an WML-based document into an output document comporting with the VoiceXML protocol. Although FIG. 7 describes the inventive transcoding process with specific reference to the WML and VoiceXML protocols, the process is also applicable to conversion between other visual-based and voice-based protocols. In step 702, a root node of the parse tree for the target WML document to be transcoded is retrieved. The type of the root node is then determined and, based upon this identified type, the root node is processed accordingly. Specifically, the conversion process determines whether the root node is an attribute node (step 706), a
CDATA node (step 708), a document fragment node (step 710), a document type node (step 712), a comment node (step 714), an element node (step 716), an entity reference node (step 718), a notation node (step 720), a processing instruction node (step 722), or a text node (step 724).
In the event the root node is determined to reference information within a CDATA block, the node is processed by extracting the relevant CDATA information (step 728). In particular, the CDATA information is acquired and directly incoφorated into the converted document without modification (step 730). An exemplary WML-based CDATA block and its corresponding representation in VoiceXML is provided below. WML-Based CDATA Block < ?xml versιon="l 0" ?> <'DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML J J//EN"
"http //www wapforum orε/DTD/wml I I xml" > <wml>
<card>
<p> < I [CDA TA f
]]> </p>
</card> </wml>
VoiceXML Representation of CDATA Block < ?xml versιon="l 0" ?>
<vxml> <form>
<block>
< ι [CDATA [
]]> </block> </form>
</vxml> If it is established that the root node is an element node (step 716), then processing proceeds as depicted in FIG. 7B (step 732). If a Select tag is found to be associated with the root node (step 734), then a new menu item is created based upon the data comprising the identified select tag (step 736). Any grammar necessary to ensure that the new menu item comports with the VoiceXML protocol is then added (step 738).
In accordance with the invention, the operations defined by the WML-based Select tag are mapped to corresponding operations presented through the VoiceXML-based Menu tag. The Select tag is typically utilized to specify a visual list of user options and to define corresponding actions to be taken depending upon the option selected. Similarly, a Menu tag in VoiceXML specifies an introductory message and a set of spoken prompts corresponding to a set of choices. The Menu tag also specifies a corresponding set of possible responses to the prompts, and will typically also specify a URL to which a user is directed upon selecting a particular choice. When the grammatical structure defined by a Menu tag is visited, its introductory text is spoken followed by the prompt text of any contained Choice tags. A grammar for matching the "title" text of the grammatical structure defined by a Menu tag may be activated upon being loaded. When a word or phrase which matches the title text of a Menu tag is spoken by a user, the user is directed to the grammatical structure defined by the Menu tag.
The following exemplary code corresponding to a WML-based Select operation and a corresponding VoiceXML-based Menu operation illustrate this conversion process. Each operation facilitates presentation of a set of four potential options for selection by a user: "cnet news", "V-enable", "Yahoo stocks", and "Wireless Knowledge"
Select operation <select ivalue="l " name=' 'action ">
<option title= "OK" onpick= "http://cnet.news. com>Cnet news</option> <option title= "OK" onpick= "http://www.v-enable.com> V -enable/ option> <option title= "OK" onpick= "http://stocks.yahoo.com>Yahoo stocks</option> <option title= "OK" onpick= "http://www.wirelessknowledse.com "> Visit Wireless Knowledge</option>
</select>
Menu operation <menu id="mainMenu" > <prompt> Please choose from <enumerate/> </prompt> <choice next= "http://server:port/Convert.jsp?url=http://cnet.news.com "> Cnet news </choice>
<choice next="http://server:port/Convert.jsp?url=http://www.v-enable.com "> V- enable</choice>
<choice next="http://server:port/Convert.jsp?url= http://stocks.yahoo.com"> Yahoo stocks</choice>
<choice next= "http://server:port/Convert.jsp?url= http://www.wirelessknowledge.com"> Visit Wireless Knowledge</choice> </menu>
The main menu may serve as the top-level menu which is heard first when the user initiates a session using the voice browser 110. The Enumerate tag inside the Menu tag automatically builds a list of words from identified by the Choice tags (i.e., "Cnet news", "V-enable", "Yahoo stocks", and "Visit Wireless Knowledge". When the voice browser 110 visits this menu, The Prompt tag then causes it to prompt the user with following text
"Please choose from Cnet news, V-enable, Yahoo stocks, Visit Wireless Knowledge". Once this menu has been loaded by the voice browser 110, the user may select any of the choices by speaking a command consistent with the technology used by the voice browser 110. For example, the allowable commands may include various "attention" phrases (e.g., "go to" or "select") followed by the prompt words corresponding to various choices (e.g.,
"select Cnet news"). After the user has voiced a selection, the voice browser 110 will visit the target URL specified by the relevant attribute associated with the selected choice. In the above conversion, the URL address specified in the onpick attribute of the Option tag is passed as an argument to the Convert.jsp process in the next attribute of the Choice tag. The Convert.jsp process then converts the content specified by the URL address into well- formatted VoiceXML. The format of a set of URL addresses associated with each of the choices defined by the foregoing exemplary main menu are set forth below:
Cnet news — > http://HvbridPlatform:port/Convert.isp?url=http://cnet.news.com V-enable — > http://HvbridPlatform:vort/Convert.isp?url=http://www.v-enable.com
Yahoo stocks — > http://HvbridPlatform:port/Convert.isp?url=http://stocks.yahoo.com Visit Wireless Knowledge —> http://HvbridPlatform:port/Convert. jsp ?url=http://www. wirelessknowledge. com
Referring again to FIG. 7B, any "child" tags of the Select tag are then processed as was described above with respect to the original "root" node of the parse tree and accordingly converted into VoiceXML-based grammatical structures (step 740). Upon completion of the processing of each child of the Select tag, the information associated with the next unprocessed node of the parse tree is retrieved (step 744). To the extent an unprocessed node was identified in step 744 (step 746), the identified node is processed in the manner described above beginning with step 706.
Again directing attention to step 740, an XML-based tag (including, e.g., a Select tag) may be associated with one or more subsidiary "child" tags. Similarly, every XML- based tag (except the tag associated with the root node of a parse tree) is also associated with a parent tag. The following XML-based notation exemplifies this parent/child relationship:
<parent>
<chιldl>
<grandchιldl> </grandchιldl> </chιldl>
<chιld2>
</chιld2>
</parent>
In the above example the parent tag is associated with two child tags (i.e., chύdl and child!). In addition, tag chύdl has a child tag denominated grandchildl . In the case of exemplary WML-based Select operation defined above, the Select tag is the parent of the
Option tag and the Option tag is the child of the Select tag. In the corresponding case of the VoiceXML-based Menu operation, the Prompt and Choice tags are children of the Menu tag
(and the Menu tag is the parent of both the Prompt and Choice tags).
Various types of information are typically associated with each parent and child tag. For example, list of various types of attributes are commonly associated with certain types of tags. Textual information associated with a given tag may also be encapsulated between the "start" and "end" tagname markings defining a tag structure (e.g., "</tagname>"), with the specific semantics of the tag being dependent upon the type of tag. An accepted structure for a WML-based tag is set forth below:
<tagname attribute 1 =value attrιbute2- value > text information </tagname>
Applying this structure to the case of the exemplary WML-based Option tag described above, it is seen to have the attπbutes of title and onpick. The title attπbute defines the title of the Option tag, while the option attibute specifies the action to be taken if the Option tag is selected. This Option tag also incoφorates descriptive text information presented to a user in order to facilitate selection of the Option.
Referring again to FIG. 7B, if an "A" tag is determined to be associated with the element node (step 750), then a new field element and associated grammar are created (step
752) in order to process the tag based upon its attributes. Upon completion of creation of this new field element and associated grammar, the next node in the parse tree is obtained and processing is continued at step 744 in the manner described above. An exemplary conversion of a WML-based A tag into a VoiceXML-based Field tag and associated grammar is set forth below:
WML File with "A" tag <?xml version=" 1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml 1.1 ,xml">
<wml>
<card id="test" title="Test"> <p>This is a test</p> <p>
<A title="Go" href="test.wml"> Hello </A> </p> </card> </wml>
Here "A" tag has
1. Title = "go"
2. href = "test, wml"
3. Display on screen: Hello [the content between <A ..> </A> is displayed on screen]
Converted VXML with Field Element
<?xml version=" 1.0"?>
<vxml>
<form id="test"> <block>This is a test</block>
<block>
<field name="act">
<prompt> Please say Hello or Next </prompt> <grammar>
[ Hello Next ] </grammar> <filled> <ιf cond="act = Ηello'">
<goto next="test.wml" /> </ιf> < filled> </field> </block>
</card> </vxml>
In the above example, the WML-based textual representation of "Hello" and "Next" are converted into a VoiceXML-based representation pursuant to which they are audibly presented. If the user utters "Hello" in response, control passes to the same link as was referenced by the WML "A" tag. If instead "Next" is spoken, then VoiceXML processing begins after the "</field>" tag.
If a Template tag is found to be associated with the element node (step 756), the template element is processed by converting it to a VoiceXML-based Link element (step
758). The next node in the parse tree is then obtained and processing is continued at step 744 in the manner described above. An exemplary conversion of the information associated with a WML-based Template tag into a VoiceXML-based Link element is set forth below. Template Tag < ?xml versιon = "l 0"?>
< 'DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1 I //EN" "http //www wap/wml l 1 xml ">
<wml> <template>
<do type~"optιons" label- " Maιn "> <go href="next wml"/> </do> </template> <card>
<p> hello </p> </card> </wml> Link Element
<?xml version— " 1.0"?> <vxml>
<link cachings" safe" next="next.wml"> <grammar>
[(Main)] </grammar> </link>
<form>
<block> hello </block> </form> </wml>
In the event that a WML tag is determined to be associated with the element node, then the WML tag is converted to VoiceXML (step 760).
If the element node does not include any child nodes, then the next node in the parse tree is obtained and processing is continued at step 744 in the manner described above (step
762). If the element node does include child nodes, each child node within the subtree of the parse tree formed by considering the element node to be the root node of the subtree is then processed beginning at step 706 in the manner described above (step 766).
APPENDIX A
* Function convert
*
* Input filename, document base
*
* Return None
*
* Purpose parses the input wml file and converts it into vxml file.
*
*/ public void convert (String fileName, String base)
{ try { Document doc;
Vector problems = new Vector ();
documentBase = base;
try {
VXMLErrorHandler errorhandler = new
VX LErrorHandler (problems) ;
DocumentBuilderFactory docBuilderFactory DocumentBuilderFactory .newlnstance () ;
DocumentBuilder docBuilder docBuilderFactory .newDocumentBuilder () ,-
doc = docBuilder .parse (new File (fileName));
TraverseNode (doc) ;
if (problems . size () > 0) {
Enumeration enum = problems .elements () ; while (enum.hasMoreElements () ) out .write ( (String) enum.nextElement ( ) ) ,-
} } catch (SAXParseException err) { out. write ("** Parsing error" + ", line " + err .getLineNumber () + ", uri " + err .getSystemld ()); out.write (" " + err .getMessage ()); } catch (SAXException e) {
Exception x = e.getException () ;
( (x == null) ? e : x) .printStackTrace () ; } catch (Throwable t) { t .printStackTrace () ;
} } catch (Exception err) { err.printStackTrace () ;
}
APPENDIX B EXEMPLARY WML τo VOICEXML CONVERSION
WML to VoiceXML Mapping Table
The following set of WML tags may be converted to VoiceXML tags of analogous function in accordance with Table Bl below.
TABLE Bl
Figure imgf000031_0001
Mapping of Individual WML Elements to Blocks of VoiceXML Elements
In an exemplary embodiment a VoiceXML-based tag and any required ancillary grammar is directly substituted for the corresponding WML-based tag in accordance with Table Al . In cases where direct mapping from a WML-based tag to a VoiceXML tag would introduce inaccuracies into the conversion process, additional processing is required to accurately map the information from the WML-based tag into a VoiceXML-based grammatical structure comprised of multiple VoiceXML elements. For example, the following exemplary block of VoiceXML elements may be utilized to emulate the functionality of the to the WML-based Template tag in the voice domain.
WML-Based Template Element
<?xml versιon="l 0"'?>
<'DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1 //EN"
"http //www wapforum org/DTD/wml_I I xml">
<wml>
<template> <do type=" options" label="DONE">
<go href="test.wml"/> </do> </template> <card>
<p align="left">Test</p> <select name= "newsitem ">
<option onpick="testl .wml">Testl </option> <option onpick="test2.wml">Test2</option> </select>
</card> </wml>
Corresponding Block of VoiceXML Elements <?xml version^ " 1.0" ?>
<vxml version="1.0"> <link next="test.vxml"> <grammar>
[ (DONE)
] </grammar>
</link> <menu> <prompt> Please say testl or test2</prompt>
<choice next="testl .vxml"> testl </choice> <choice next="test2.vxml"> test2 </choice>
</menu> </vxml>
Example of Conversion of Actual WML Code to VoiceXML Code
Exemplary WML Code <?xml version- ' 1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_l .1.xml">
<!-- Deck Source: "http://wap.cnet.com" — > <'-- DISCLAIMER This source was generated from parsed binary WML content — > <'— This representation of the deck contents does not necessarily preserve — > <'— original whitespace or accurately decode any CDATA Section contents, — > <— but otherwise is an accurate representation of the original deck contents — > <•— as determined from its WBXML encoding If a precise representation is required, — >
<•— then use the "Element Tree" or, if available, the "Original Source" view — >
<wml> <head> <meta http-equiv- 'Cache-Control" content="must-revahdate"/>
<meta http-equiv- 'Expires" content="Tue, 01 Jan 1980 1 00 00 GMT"/> <meta http-equιv="Cache-Control" content="max-age=0"/> </head>
<card htle="Top Tech News">
<p ahgn="left">
CNET News com </p>
<p mode="nowrap"> <select name="categoryId" ιvalue=" 1 ">
<optιon onpιck="/waρ/news/bπefs/0, 10870,0-1002-903-l-0,00 wml">Latest News
Bπefs</optιon>
<optιon onpick="/wap/news/0, 10716,0- 1002-901 ,00 wml">Latest News Headlmes</optιon> <optιon onpick- Vwap/news/0, 10716,0- 1007-901 ,00 wml">E-Busmess</optιon> <optιon onpick- 7wap/news/0, 10716,0- 1004-901 ,00 wml">Commumcatιons</optιon>
<optιon onpick="/wap/news/0, 10716,0- 1005-901,00 wml">Entertaιnment and Medιa</optιon> <optιon onpick="/wap/news/0, 10716,0- 1006-901,00 wml">Personal Technology</ophon> <optιon onpιck="/wap/news/0,10716,0-1003-901,00 wml">Enterpπse Computιng</optιon> </select> </p>
</card>
</wml> Corresponding VoiceXML code
< xml versιon=" l 0'"» <vxml versιon=" 1 0"> <head> <meta/> <meta/> <meta/> </head> <form>
<block> <prompt>CNET News.com</prompt>
</block>
<block>
<grammar> [ ( latest news briefs ) ( latest news headlines ) ( e-business ) ( communic ations ) ( entertainment and media ) ( personal technology ) ( enterprise com puting ) ]
</grammar>
<goto next="#categoryId" /> </block>
</form>
<menu id="categoryId" >
<property name="inputmodes" value- 'dtmf' />
<prompt>Please Say <enumerate/> </prompt>
<choice dtmf="0" next="http://server:port/Convert.jsp?url= http://wap.cnet.eom/wap/news/briefs/0.10870.0-1002-903-l-0.00.wml"> Latest News Briefs
</choice>
<choice dtmf="l" next="http:// server:port /Convert.jsp?url=http://wap.cnet.corn/wap/news/0,10716,0-1002-901,00.wml"> Latest News
Headlines </choice>
<choice dtmf="2" next="http:// server:port
/Convert.jsp?url=http://wap.cnet.com/wap/news/0,10716,0-1007-901,00.wml"> E-Business
</choice> <choice dtmf="3" next="http:// server:port
/Convert.jsp?url=http://wap.cnet.com/wap/news/0, 10716,0-1004-901, 00. wml"> Communications
</choice>
<choice dtmf="4" next="http:// server:port/Convert.jsp?url= http://wap.cnet.eom wap/news/0.10716.0- 1005-901.00. wml"> Entertainment and Media < choice> <choice dtmf="5" next="http:// server:port /Convert.jsp?url= http://wap.cnet.eom wap/news/0.10716.0- 1006-901.00. wml"> Personal Technology </choice>
<choice dtmf="6" next="http:// server:port /Convert.jsp?url= http://wap.cnet.eom/wap/news/0.10716.0- 1003-901.00. wml"> Enterprise Computing </choice>
<default> <reprompt/>
</default>
</menu>
</vxml>
<! END OF CONVERSION > APPENDIX C
/*
* Function : TraverseNode *
* Input : Node *
* Return : None
* Purpose : Traverse ' s the Dom tree node by node and converts the
* tag and attributes into equivalent vxml tags and attributes.
void TraverseNode (Node el) {
StringBuffer buffer = new StringBuffer ( )
if (el == null) return; int type = el .getNodeType () ;
switch (type) { case Node.ATTRIBUTE_NODE: { break;
} case Node . CDATA_SECTION_NODE : { buffer . append ( " < ! [CDATA [ " ) ,- buffer. append (el .getNodeValue () ) ; buffer. append ("] ] >") ; writeBuffe (buffer) ; break;
} case Node . DOCUMENT_FRAGMENT_NODE : { break;
} case Node . DOCUMENT_NODE : {
TraverseNode ( ( (Document) el) .getDocumentElement () ) ; break;
} case Node . DOCUMENT_TYPE_NODE : { break; } case Node . COMMENT_NODE : { break ;
} case Node . ELEMENT_NODE : { if (el. getNodeName () .equals ("select") ) { processMenu (el) ; }else if (el .getNodeName () .equals ( "a") ) { processA(el) ; } else { buffer . append ( " < " ) ; buffer .append (ConvertTag (el .getNodeName () ) ) ; NamedNodeMap nm = el .getAttributes () ; if (first) { buffer. append (" version=\"l .0\" " ) ; first=false;
} int len = (nm 1= null) ? nm.getLengt () : 0; for (int j =0; j < len; j++) { Attr attr = (Attr) nm. item (j ) ;
buffer .append (ConvertAtr (el .getNodeName () , attr .getNodeName () ,attr.getNodeV alueO));
} NodeList nl = el .getChildNodes () ; if ( (nl == null) | |
((len = nl.getLength() ) < 1) ) { buffer. append ("/>") ,- writeBuffer (buffer) ; }else{ buffer . append ( " > " ) ; writeBuffer (buffer) ; for (int j=0; j < len; j++) TraverseNode (nl . item (j ) ) ,- buffer. append ("</") ; buffer . append (ConvertTag (el . getNodeName ( ) ) ) ; buffer . append ( " > " ) ; writeBuffer (buffer) ;
} } break ;
} case Node . ENTITY_REFERENCE_NODE : {
NodeList nl = el .getChildNodes () ; if (nl != null) { int len = nl . getLength ( ) ; for (int j=0; j < len; j++) TraverseNode (nl . item (j ) ) ;
} break;
} case Node . NOTATION_NODE : { break;
} case Node.PROCESSING_INSTRUCTION_NODE: { buffer. append ( "<?") ; buffer. append (ConvertTag (el .getNodeName () ) ) ; String data = el. getNodeValue () ,- if ( data != null && data. lengt () > 0 ) { buffer . append ( " " ) ; buffer. append (data) ,-
} buffer. append (" ?>"); writeBuffer (buffer) ; break;
} case Node . TEXT_NODE : { if ( ! el. getNodeValue () .trim() .equals ("") ) { try { out .write ("<prompt>"+el .getNodeValue () . trimO +"</prompt>\n") ;
}catch (Exception e) { e. printStackTrace () ;
} } break; APPENDIX D
/*
* Function : ConvertTag * * Input : wpa tag *
* Return : equivalent vxml tag *
* Purpose : converts a wml tag to vxml tag using the LTagResourceBundle .
*
*/
String ConvertTag (String wapelement) {
ResourceBundle rbd = new WMLTagResourceBundle () ; try { return rbd. getString (wapelement) ; } catch (MissingResourceException e) { return " " ;
} }
/*
* Function : ConvertAtr
* Input : wap tag, wap attribute, attribute value
*
* Return : equivalent vxml attribute with it ' s value . *
* Purpose : converts the combination of tag+attribute of wml to a vxml
* attribute using WMLAtrResourceBundle .
*
*/
String ConvertAtr (String wapelement, String wapattrib, String val) {
ResourceBundle rbd = new WMLAtrResourceBundle () ;
String tempStr="",-
String searchTag ; searchTag =wapelement . trim ()+"-" +wapattrib . trim ( ) ; try { tempStr += " " ,- String convTag = rbd. getString (searchTag) ; tempStr += convTag,- if (convTag.equalsIgnoreCase ("next") ) tempStr += "=\" "+server+"?url="+documentBase; else tempStr += "=\""; tempStr += val; tempStr += »\"» ; return tempStr; } catch (MissingResourceException e) { return " " ;
}
* Function processMenu *
Input Node
Return : None
Purpose : process a menu node, it converts a select list into an equivalent menu in vxml .
private void processMenu (Node el) { try {
StringBuffer mnuString = new StringBuffer () ; StringBuffer mnu = new StringBuffer () ; String menuName ="NONAME"; int dtmfld = 0;
StringBuffer mnuGrammar = new StringBuffer () ; Vector menultem = new Vector ();
mnu . append ( " < " +ConvertTag (el . getNodeName ( ) ) ) ,- NamedNodeMap nm = el .getAttributes () ; int len = (nm != null) ? nm.getLength () : 0; for (int j =0; j < len; j++) {
Attr attr = (Attr) nm. item(j ) ; if (attr .getNodeName () .equals ("name") ) { menuName=attr . getNodeValue ( ) ; } mnu . append ( " " +
ConvertAtr (el . getNodeName ( ) , attr . getNodeName ( ) , attr . getNodeValue ( ) ) ) ;
} mnu . append ( " >\n" ) ; mnu . append (" <property name=\"inputmodes\" value=\"dtmf\" />\n") ; NodeList nl = el .getChildNodes () ; len = nl.getLengthO ;
for (int j=0; j < len; j++) {
Node ell = nl.item(j); int type = ell .getNodeType () ; switch (type) { case Node . ELEMENT_NODE : { mnuString. append ("<"+ConvertTag (ell. getNodeName () ) +" dtmf=\"" + dtmfld++ + "\" " ) ;
NamedNodeMap nml = ell .getAttributes () ; int len2 = (nml 1= null) ? nml .getLengthO : 0; for (int 1 =0; 1 < len2; 1++) { Attr attrl = (Attr) nml . item (1) ; mnuString . append ( " " +
ConvertAtr (ell .getNodeName () , attrl .getNodeName () , attrl .getNodeValue ()) ) ;
} mnuString. append (">\n") ; NodeList nil = ell. getChildNodes () ,- int lenl = nll.getLength () ; for (int k=0; k < lenl; k++) { Node el2 = nil. item (k); switch (el2. getNodeType ( ) ) { case Node . TEXT_N0DE : { if ( 1 el2. getNodeValue ( ) . trim ( ) . equals ("")){
mnuString. append (el2.getNodeValue () +"\n") ,-
menuItem.addElement (el2.getNodeValue () ) ;
} } break; } mnuString. append ("</"+ConvertTag (ell .getNodeName () ) +">\n") ; break; }
} mnuString. append ( "<default>\n<reprompt/>\n</default>\n" ) ; mnuString. append ( "</"+ConvertTag (el. getNodeName () ) +">\n") ; mnu. append ("<prompt>Please Say <enumerate/>" ) ; mnu. append ("\n</prompt>") ; mnu. append (" \n"+mnuString. toString () ) ;
mnuGrammar . append ( "<grammar>\n [ " ) ; for(int i=0; i< menultem.size () ; i++) { mnuGrammar . append ( " ( " + menuItem.elementAt (i) + " ) ");
} mnuGrammar . append ("] \n</grammar>\n") ;
out .write (mnuGrammar . toString ( ) . toLowerCase ( ) ) ; out .write ( "\n<goto next=\"#" + menuName + " \ " />\n</block>\n</form>\n") ; ou .write (mnu . toString ( ) ) ; out .write ("<form>\n<block>\n") ;
} catch (Exception e) { e . rintStackTrace ( ) ; }
}
/'
* Function : processA
Input link Node
* Return None
* Purpose : converts an <A> i.e. link element into an equivalent for
* vxml .
*/ private void processA(Node el) { try {
StringBuffer linkString = new StringBuffer () ; StringBuffer link = new StringBuffer () ;
StringBuffer nextStr = new StringBuffer () ;
StringBuffer promptStr = new StringBuffer () ; String fieldName ="NONAME"+field_id++; int dtmfld = 0; StringBuffer linkGrammar = new StringBuffer () ;
NamedNodeMap nm = el .getAttributes () ; int len = (nm != null) ? nm.getLengthO : 0;
linkGrammar. append ("<grammar> [(next) (dtmf-1) (dtmf-2) ") ; for (int j =0; j < len; j++) {
Attr attr = (Attr) nm. item (j ) ; if (attr . getNodeName ( ) . equals ( "href " ) ) { nextStr .append ("<goto " +ConvertAtr (el. getNodeName () , attr .getNodeName ( ) , attr .getNodeValue () ) +"/>\n") ;
} }
linkString. append ( "<field name=\" "+fieldName+"\">\n") ;
NodeList nl = el. getChildNodes () ; len = nl .getLength () ; link. append ("<filled>\n") ; for (int j=0; j < len; j++) {
Node ell = nl.item(j); int type = ell .getNodeType () ; switch (type) { case Node . TEXT_NODE : { if ( lell.getNodeValueO .trimf) .equals ("")) { promptStr. append ("<prompt> Please Say Next or "+ell .getNodeValue () +"</prompt>" ) ,-
linkGrammar . append ( " ( "+ell .getNodeValue ( ) . toLowerCase ( ) +" ) " ) ; link. append ("<if cond=\" "+fieldName+"
' "+ell. getNodeValue ()+" || "+fieldName+" == 'dtmf-1 ' \">\n") ; link. append (nextStr) ; link. append ("<else/>\n") ; link. append ("<prompt>Next Article</prompt>\n") ; link. append ("</if>\n") ;
} break;
}
1inkGrammar . append (" ] </grammar>\n") ; link. append ("</filled>\n") ; linkString. append (linkGrammar) ; linkString. append (promptStr) ; linkString. append (link) ; linkString. append ( "</field>\n" ) ; out . write ( " </block>\n" ) ; out .write (linkString . toString () ) ; out .write ( "<block>\n") ;
}catch (Exception e) { e.printStackTrace () ; }
* Function : writeBuffer
* Input buffer String *
* Return None *
* Purpose print the buffer to PrintWriter. *
*/
void writeBuffer (StringBuffer buffer) {
try { if ( ! buffer. toString () . trim() .equals ("")){ out .write (buffer . toString () ) ,- out. write ("\n") ,-
} } catch (Exception e) { e . printStackTrace ( ) ; }
buffer . delete ( 0 , buffer . length ( ) ) ;
Accordingly, a voice browser system including a subscriber unit in communication with a voice browser through a telecommunications network has been described herein. In response to requests for content from Web sites formatted in compliance with the protocol applicable to the voice browser, the voice browser obtains the requested content directly from the compliant Web site. When it is desired to obtain Web content formatted inconsistently with the voice browser, the voice browser issues a browsing request for such content to a conversion server using syntax substantially similar to that employed in making direct requests to compliant Web sites. That is, the voice browser is advantageously not required to operate in different modes when presented with requests for Web content of disparate formats. In response to browsing requests issued by the voice browser, the conversion server will attempt to identify a version of the requested Web site formatted in accordance with protocols suitable for serving content to devices having limited display capabilities (e.g., handheld or portable devices). The conversion server then preferably retrieves content from such a suitably formatted version of the requested Web site and converts this content into a document file compliant with the protocol of the voice browser.
The converted document file is then provided by the conversion server to the voice browser, which uses this file to effect a dialogue conforming to the applicable protocol with the requesting user.
The foregoing description, for puφoses of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. In other instances, well-known circuits and devices are shown in block diagram form in order to avoid unnecessary distraction from the underlying invention. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for puφoses of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, obviously many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following Claims and their equivalents define the scope of the invention.

Claims

What is claimed is:
1. A method for browsing the Internet comprising: transmitting a first user request over a communication link to a voice browser, said voice browser operating in accordance with a voice-based protocol; generating a browsing request in response to said first user request, said browsing request identifying a web server corresponding to said first user request; retrieving web page information from said web server in accordance with said browsing request, said web page information being formatted in accordance with a predefined protocol; converting at least a first portion of said web page information into a file of converted information formatted in compliance with said voice-based protocol; and responding to said first user request on the basis of said file of converted information.
2. The method of claim 1 wherein said browsing request specifies an address of a conversion server, said conversion server establishing a communication channel with said voice browser upon receipt of said browsing request.
3. The method of claim 1 wherein said retrieving includes issuing a query to said web server in accordance with said browsing request, said query being formatted in accordance with a standard Internet protocol.
4. The method of claim 1 wherein said retrieving includes performing a branch traversal process by retrieving branched content from at least one first level branched page linked to a root page wherein content from said root page is included within said first portion of said web page information.
5. The method of claim 4 wherein said branch traversal process includes retrieving additional branched content from at least one second level branched page linked to said at least one first level branched page, said additional branched content being included within a second portion of said web page information.
6. The method of claim 4 further including converting said second portion of said web page information into an additional file of converted information formatted in compliance with said voice-based protocol; receiving at said voice browser a second user request corresponding to said branched content and responding to said second user request on the basis of information relating to said branched content included within said additional file of converted information.
7. The method of claim 6 wherein said first and second user requests are comprised of audio information
8. The method of claim 1 wherein said first user request identifies a first web site formatted inconsistently with said predefined protocol, said generating a browsing request including selecting a second web site comprising a version of said first web site formatted consistently with said predefined protocol.
9. A system for browsing the Internet comprising: a voice browser operating in accordance with a voice-based protocol, said voice browser receiving a first user request transmitted over a communication link and generating a browsing request in response to said first user request; and a conversion server in communication with said voice browser, said conversion server including a retrieval module for retrieving web page information from a destination web site in accordance with said browsing request, said web page information being formatted in accordance with a predefined protocol; a conversion module for converting at least a first portion of said web page information into a file of converted information compliant with said voice-based protocol; and an interface for providing said file of converted information to said voice browser.
10. The system of claim 9 wherein said browsing request specifies an address of said conversion server, said conversion server establishing a communication channel with said voice browser upon receipt of said browsing request.
11. The system of claim 9 wherein said web page information includes branched content from at least one first level branched page linked to a root page, said retrieval module performing a branch traversal process by retrieving said branched content and content from said root page.
12. The system of claim 11 wherein said branch traversal process includes retrieving additional branched content from at least one second level branched page linked to said at least one first level branched page, said additional branched content being included within said web page information.
13. The system of claim 12 wherein a second portion of said web page information is converted into an additional file of converted information formatted in compliance with said voice-based protocol, said voice browser receiving a second user request corresponding to said branched content and responding to said second user request on the basis of information relating to said branched content included within said additional file of converted information.
14. The system of claim 9 wherein said conversion server further includes a database of web sites formatted in accordance with said predefined protocol and wherein said browsing request identifies a first web site formatted inconsistently with said predefined protocol, said retrieval module selecting said destination web site from said database wherein said destination web site comprises a version of said first web site formatted consistently with said predefined protocol.
15. A method for facilitating the retrieval of information through a voice browser operative in accordance with a voice-based protocol, said method comprising: receiving a browsing request from said voice browser, said browsing request being issued by said voice browser in response to a first user request for content; retrieving information from a remote information source in accordance with said browsing request, said information being formatted in accordance with a predefined protocol; and converting said information into a file of converted information compliant with said voice-based protocol.
16. The method of claim 15 wherein said first user request identifies a first web site formatted inconsistently with said predefined protocol, said generating a browsing request including selecting said remote information source from a predefined set of protocol compliant web sites wherein said remote information source comprises a version of said first web site formatted consistently with said predefined protocol.
17. The method of claim 15 further including providing said file of converted information to said voice browser using standard Internet protocols.
18. The method of claim 15 wherein said browsing request identifies a conversion script, said conversion script executing upon receipt of said browsing request.
19. The method of claim 15 further including maintaining a database of web sites formatted in accordance with said predefined protocol wherein said browsing request identifies a first web site formatted inconsistently with said predefined protocol, said method further including selecting said remote information source from said database wherein said remote information source comprises a version of said first web site formatted consistently with said predefined protocol.
20. A method for retrieving content using a voice-based communication system comprising: transmitting a first user request over a communication link to a voice browser, said voice browser operating in accordance with a voice-based protocol; generating a browsing request in response to said first user request, said browsing request identifying a first remote information source corresponding to said first user request; retrieving content from said first remote information source in accordance with said browsing request, said content being formatted in accordance with a predefined protocol; converting said content into a file of converted information formatted in compliance with said voice-based protocol; and responding to said first user request on the basis of said file of converted information.
21. The method of claim 20 wherein said browsing request specifies an address of a conversion server, said conversion server establishing a communication channel with said voice browser upon receipt of said browsing request.
22. The method of claim 20 wherein said first user request identifies a web site formatted inconsistently with said predefined protocol, said generating a browsing request including selecting a second web site as said first remote information source wherein said second web site is formatted consistently with said predefined protocol.
23. The method of claim 22 further including: receiving at said voice browser a second user request corresponding to a second remote information source comprising a database formatted inconsistently with said voice- based protocol, retrieving information from said database, and converting said information into an additional file of converted information formatted in compliance with said voice-based protocol.
24. A method for facilitating browsing of the Internet comprising: receiving a browsing request from a browser unit operative in accordance with a first protocol, said browsing request being issued by said browser unit in response to a first user request for web content; retrieving web page information from a web site in accordance with said browsing request, said web page information being formatted in accordance with a second protocol different from said first protocol; and converting at least a primary portion of said web page information into a primary file of converted information compliant with said first protocol.
25. The method of claim 24 wherein said web page information includes primary content from a primary page of said web site and secondary content from a secondary page referenced by said primary page, said primary portion of said web page information including said primary content.
26. The method of claim 25 further including: converting said secondary content into a secondary file of converted information compliant with said first protocol; receiving an additional browsing request from said browser unit, said additional browsing request being issued by said browser unit in response to a second user request for web content; and providing said secondary file in response to said additional browsing request.
27. The method of claim 24 wherein said retrieving includes obtaining said web page information using standard Internet protocols.
28. The method of claim 24 wherein said browsing request identifies a conversion script, said conversion script executing upon receipt of said browsing request.
29. The method of claim 24 wherein said first user request identifies a first web site formatted inconsistently with said second protocol, said generating a browsing request including selecting a second web site comprising a version of said first web site formatted consistently with said second protocol.
30. A conversion server responsive to browsing requests issued by a browser unit operative in accordance with a first protocol, said conversion server comprising: a retrieval module for retrieving web page information from a web site in accordance with a first browsing request issued by said browsing unit, said web page information being formatted in accordance with a second protocol different from said first protocol; a conversion module for converting at least a primary portion of said web page information into a primary file of converted information compliant with said first protocol; and an interface module for providing said primary file of converted information to said browsing unit.
31. The conversion server of claim 30 wherein said web page information includes primary content from a primary page of said web site and secondary content from a secondary page referenced by said primary page, said primary portion of said web page information including said primary content.
32. The conversion server of claim 31 wherein said conversion module converts said secondary content into a secondary file of converted information compliant with said first protocol, said interface module providing said secondary file of converted information to said browser unit in response to a second browsing request issued by said browser unit.
33. The conversion server of claim 31 wherein said retrieval module performs a branch traversal process in retrieving said web page information, said branch traversal process including includes retrieving tertiary content from at least one tertiary page referenced by said secondary page.
34. A method for facilitating information retrieval from remote information sources comprising: receiving a browsing request from a browser unit operative in accordance with a first protocol, said browsing request being issued by said browser unit in response to a first user request; retrieving content from a remote information source in accordance with said browsing request, said content being formatted in accordance with a second protocol different from said first protocol; and converting said content into a file of converted information compliant with said first protocol.
35. The method of claim 34 wherein said first user request identifies a first web site formatted inconsistently with said second protocol, said generating a browsing request including selecting a second web site as said remote information source wherein said second web site comprises a version of said first web site formatted consistently with said second protocol.
36. The method of claim 35 further including: receiving at said browsing unit a second user request corresponding to a database formatted inconsistently with said first protocol, retrieving information from said database, and converting said information into an additional file of converted information formatted in compliance with said first protocol.
37. A method for facilitating information retrieval from remote information sources comprising: receiving a browsing request from a browser unit, said browsing request being issued by said browser unit in response to a first user request; retrieving content from a remote information source in accordance with said browsing request; parsing said content in accordance with a predefined document type definition and storing a resultant document object model representation, said document object model representation including a plurality of nodes; determining a first classification associated with a first of said nodes; and converting information at said first of said nodes into converted information based upon said first classification.
38. The method of claim 37 further comprising determining a second classification of a second of said nodes and converting information associated with said second of said nodes into converted information based upon said second classification.
39. The method of claim 37 further including identifying a first child node related to said first of said nodes; classifying said first child node; and converting information at said first child node into converted information based upon said classifying.
40. The method of claim 39 further including identifying a second child node related to said first of said nodes; classifying said second child node; and converting information at said second child node into converted information.
41. A method for facilitating information retrieval from remote information sources comprising: receiving a URL from a browser unit, said URL being issued by said browser unit in response to a first user request; retrieving content from a remote information source identified by said URL; parsing said information and storing a resultant document object model representation, said document object model representation including a plurality of nodes organized in a hierarchical structure; classifying each of said plurality of nodes into one of a set of predefined classifications during traversal of said hierarchical structure, said traversal originating at a root node of said hierarchical structure; and converting information at each of said plurality of nodes into converted information based upon the one of said predefined classifications associated with each of said nodes.
PCT/US2002/041383 2001-12-28 2002-12-23 Information retrieval system including voice browser and data conversion server WO2003058938A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002364014A AU2002364014A1 (en) 2001-12-28 2002-12-23 Information retrieval system including voice browser and data conversion server

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10/040,525 2001-12-28
US10/040,525 US20030125953A1 (en) 2001-12-28 2001-12-28 Information retrieval system including voice browser and data conversion server
US34857902P 2002-01-14 2002-01-14
US60/348,579 2002-01-14

Publications (1)

Publication Number Publication Date
WO2003058938A1 true WO2003058938A1 (en) 2003-07-17

Family

ID=26717149

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/041383 WO2003058938A1 (en) 2001-12-28 2002-12-23 Information retrieval system including voice browser and data conversion server

Country Status (2)

Country Link
AU (1) AU2002364014A1 (en)
WO (1) WO2003058938A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105162692A (en) * 2015-09-16 2015-12-16 北京暴风科技股份有限公司 Efficient data serialization interaction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US5953392A (en) * 1996-03-01 1999-09-14 Netphonic Communications, Inc. Method and apparatus for telephonically accessing and navigating the internet
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5953392A (en) * 1996-03-01 1999-09-14 Netphonic Communications, Inc. Method and apparatus for telephonically accessing and navigating the internet
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105162692A (en) * 2015-09-16 2015-12-16 北京暴风科技股份有限公司 Efficient data serialization interaction method
CN105162692B (en) * 2015-09-16 2018-08-10 暴风集团股份有限公司 A kind of efficient Data Serialization exchange method

Also Published As

Publication number Publication date
AU2002364014A1 (en) 2003-07-24

Similar Documents

Publication Publication Date Title
US20080133702A1 (en) Data conversion server for voice browsing system
US7054818B2 (en) Multi-modal information retrieval system
US20060168095A1 (en) Multi-modal information delivery system
US20060064499A1 (en) Information retrieval system including voice browser and data conversion server
Freire et al. WebViews: accessing personalized web content and services
US20020054090A1 (en) Method and apparatus for creating and providing personalized access to web content and services from terminals having diverse capabilities
US8032577B2 (en) Apparatus and methods for providing network-based information suitable for audio output
US6665642B2 (en) Transcoding system and method for improved access by users with special needs
US7640163B2 (en) Method and system for voice activating web pages
US7953597B2 (en) Method and system for voice-enabled autofill
JP3936718B2 (en) System and method for accessing Internet content
US7058698B2 (en) Client aware extensible markup language content retrieval and integration in a wireless portal system
US20080133215A1 (en) Method and system of interpreting and presenting web content using a voice browser
KR20020004931A (en) Conversational browser and conversational systems
US20060253785A1 (en) Remote-agent-object based multilevel broweser
US20020174147A1 (en) System and method for transcoding information for an audio or limited display user interface
US20080275893A1 (en) Aggregating Content Of Disparate Data Types From Disparate Data Sources For Single Point Access
US7305626B2 (en) Method and apparatus for DOM filtering in UAProf or CC/PP profiles
WO2001003011A2 (en) Cross-media information server
WO2001057661A2 (en) Method and system for reusing internet-based applications
US20020112081A1 (en) Method and system for creating pervasive computing environments
US20010005865A1 (en) Apparatus control system and method
US20010056497A1 (en) Apparatus and method of providing instant information service for various devices
WO2001048630A9 (en) Client-server data communication system and method for data transfer between a server and different clients
US8806326B1 (en) User preference based content linking

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP