WO2001033553A2 - System and method of increasing the recognition rate of speech-input instructions in remote communication terminals - Google Patents

System and method of increasing the recognition rate of speech-input instructions in remote communication terminals Download PDF

Info

Publication number
WO2001033553A2
WO2001033553A2 PCT/EP2000/010742 EP0010742W WO0133553A2 WO 2001033553 A2 WO2001033553 A2 WO 2001033553A2 EP 0010742 W EP0010742 W EP 0010742W WO 0133553 A2 WO0133553 A2 WO 0133553A2
Authority
WO
WIPO (PCT)
Prior art keywords
character sequence
module
character
signal
speech
Prior art date
Application number
PCT/EP2000/010742
Other languages
French (fr)
Other versions
WO2001033553A3 (en
Inventor
Alberto Diego JIMENEZ FELTSTRÖM
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to JP2001535162A priority Critical patent/JP2003513341A/en
Priority to EP00975973A priority patent/EP1226576A2/en
Priority to AU13905/01A priority patent/AU1390501A/en
Publication of WO2001033553A2 publication Critical patent/WO2001033553A2/en
Publication of WO2001033553A3 publication Critical patent/WO2001033553A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones

Definitions

  • the present invention relates to speech-input recognition in communication devices and more particularly to systems and methods for enhancing the accuracy of speech dialing systems in remote communication terminals.
  • Remote communication terminals such as, for example, mobile telephones are ubiquitous in many modern industrialized countries. Most remote communication terminals utilize a keypad as an input device. However, keypads suffer from certain drawbacks. Foremost, the use of keypads may require a user to direct his or her attention to the communication device, if only for a brief moment. In certain circumstances, such as when driving, this is considered undesirable. Further, market forces continuously drive manufacturers to produce smaller remote telephone terminal devices, also referred to as handsets. Reducing the size of the terminal device renders keypad errors more likely, thereby reducing the accuracy of the keypad as an input device.
  • U.S. Patent No. 4,959,850 to Kuniyoshi discloses a radio telephone apparatus that includes speech recognition capabilities for speech-based dialing of the phone.
  • U.S. Patents No. 5,042,063 to Sakanishi and No. 4,870,686 to Gerson et al. disclose a telephone apparatus that utilizes speech recognition capabilities to allow speech-based dialing. Speech recognition functions are also disclosed in the following references: U.S. Patents No. 5,917,891 to Will; No. 5,884,257 to Maekawa et al.; No. 5,651,056 to Eting et al; No. 5,638,425 to Meador; No. 5,509,049 to Peterson; No. 5,495,553 to Jakatdar; and No. 5,303,299 to
  • speech recognition is a difficult task, particularly when the speech signal is combined with ambient noise from the surrounding environment, such as automobile noise or street noise. Inadequate enunciation and/or interference from ambient noise may render a user's speech unrecognizable to the device. In speech- based dialing applications, this may result in the telephone device dialing an incorrect number. Alternatively, the telephone device may prompt the user to repeat the unrecognized digit(s), or the entire digit sequence. Depending upon the accuracy of the speech recognition system, the user may be required to repeat numbers a significant percentage of the time, rendering the speech-based dialing feature less convenient for the user.
  • a remote terminal is adapted to use information stored in a memory to enhance the accuracy of the speech-recognition routine.
  • the information includes a-priori information about phone numbers previously dialed from the remote terminal, which can be matched with phone numbers input by a speech-based dialing method to enhance the accuracy of the speech recognition system.
  • the invention provides a system for facilitating speech-based dialing of a communication device.
  • the system comprises a conversion module for receiving speech input representative of an input character sequence and generating a signal representative of each character in the input character sequence, a determining module for determining whether the input character sequence includes unrecognized characters, a memory module including a plurality of character sequences corresponding to network addresses, and a search module for searching the memory module for a character sequence having characters that correspond to recognized characters in the input character sequence.
  • the search module can search the memory module for one or more character sequences in the memory module having characters that match the recognized characters of the input character sequence.
  • the invention provides a method of facilitating speech- based calling in a communication device.
  • the method comprises the steps of receiving a speech input representative of a desired character sequence, generating a signal representative of each character in the character sequence, determining whether the character sequence includes unrecognized characters, and if so, then searching a memory module for a matching character sequence having characters that correspond to recognized characters in the input character sequence, and generating a signal representative of a matching character sequence.
  • Fig. 1 is a block diagram of an exemplary GSM communication suitable for implementing the present invention
  • Fig. 2 is a flow chart illustrating a method of facilitating speech-based calling in a communication device according to an embodiment of the invention.
  • Fig. 3 is a schematic depiction of a remote communication terminal according to an embodiment of the invention.
  • Time Division Multiple Access Time Division Multiple Access
  • GSM Global System for Mobile communications
  • D-AMPS Digital- Advanced Mobile Phone System
  • PDC Personal Digital Cellular
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • CDMA code division multiple access
  • GSM Global System for Mobile Communications
  • a communication system 10 in which the present invention can be implemented is depicted.
  • the system 10 is a hierarchical network with multiple levels for managing calls.
  • remote communication terminals 12 operating within the system 10 participate in calls using time slots allocated to them on these frequencies.
  • a group of Mobile Switching Centers (MSCs) 14 route calls from originators to destinations. In particular, these entities are responsible for setup, control and termination of calls.
  • MSCs 14 commonly referred to as a gateway MSC, handles communication with a Public Switched Telephone Network (PSTN) 18, or other public and private networks.
  • PSTN Public Switched Telephone Network
  • Each of the MSCs 14 are connected to one or more base station controllers (BSCs) 16.
  • BSCs base station controllers
  • the BSC 16 communicates with a MSC 14 under a standard interface known as the A-interface, which is based on the Mobile
  • Each of the BSCs 16 controls one or more base transceiver stations (BTSs) 20.
  • Each BTS 20 includes one or more transceivers (TRXs) (not shown) that use the uplink and downlink radio frequencies (RF channels) to serve a particular geographical area, such as one or more communication cells 21.
  • TRXs transceivers
  • the BTSs 20 primarily provide the RF links for the transmission and reception of data bursts to and from the remote stations 12 within their respective cells.
  • a number of BTSs 20 are incorporated into a radio base station (RBS) 22.
  • the RBS 22 may be, for example, configured according to a family of RBS- 2000 products, which products are offered by Konaktiebolaget LM Ericsson, the assignee of the present invention.
  • remote terminal 200 Accordingly, only those aspects of remote terminal 200 that are pertinent to the present invention are described in detail.
  • the interested reader is referred to U.S. Patent No. 5,745,523 to Dent et al., the disclosure of which is incorporated here by reference.
  • remote terminal 200 includes, in relevant part, a microphone 210 for receiving speech input from a user of the phone. Microphone 210 is connected to conversion module 220. Conversion module 220 may comprise an analog to digital (A/D) converter 224 for converting analog speech input to a digital signal. Conversion module 220 may also include an automatic speech recognition (ASR) module 228 for recognizing the speech of the user. Remote terminal 200 further includes a determining module 230 for determining whether a character spoken by the user was recognized by ASR module 228 with a desired degree of accuracy. Remote terminal 200 further includes a memory module 250 for storing character sequences that represent valid phone numbers, and a search module 240 for searching memory module 250. Remote terminal 200 also includes a connection module 260 for establishing a communication connection with a communication network such as, for example, a GSM network as depicted in Fig. 1.
  • ASR automatic speech recognition
  • Remote terminal 200 further includes a suitable display 270 (e.g., an LED or LCD display) for displaying information to a user.
  • a suitable display 270 e.g., an LED or LCD display
  • One terminal with a suitable speech recognition module is the T28 commercially available from Ericsson.
  • modules 220-260 may be embodied in a suitable application specific integrated circuit (ASIC) or a programmed digital signal processor (DSP), or by a chip set comprising a plurality of ASICs. Electrical connections are formed between the respective modules 220-260 and other components of the remote terminal. For example, determining module 230 and search module 240 are electrically connected to display 270, to speaker 280, and to connection module 260.
  • ASIC application specific integrated circuit
  • DSP programmed digital signal processor
  • an electrical connection between memory module 250 and connection module 260 allows memory module 250 to store telephone numbers associated with connections established by remote terminal
  • memory module 250 maintains a list of previously-dialed telephone numbers that can be used as a- priori information to enhance the accuracy of speech-based dialing, as described below.
  • Fig. 3 illustrates a method for speech-based dialing according to an embodiment of the invention.
  • the method includes receiving a spoken character from a user, converting the character to a digital signal, and determining whether the character sequence is complete. If the character sequence is not complete, the system iteratively receives additional characters and converts the characters to a digital signal. After a complete character sequence has been received, the system determines whether the character sequence includes one or more unrecognized characters. If the character sequence does not include unrecognized characters, then the character sequence may be transmitted to a module (e.g., a connection module) that enables the phone to dial the number corresponding to the recognized character sequence.
  • a module e.g., a connection module
  • a search module is invoked.
  • the search module compares the recognized digits in the character sequence with corresponding digits in character sequences in an associated memory to determine whether a character sequence in memory is a likely match with the character sequence input by the user.
  • the character sequence may be transmitted to a module that enables the phone to dial the number corresponding to the recognized character sequence.
  • the character sequence may be displayed or audibly presented to the user of the phone, who can indicate whether the character sequence does, in fact, match the desired character sequence. This process will be explained in greater detail below.
  • the process set forth in Fig. 3 may be implemented in a remote communication terminal, e.g., a mobile phone, having a speech-based dialing feature.
  • a remote communication terminal e.g., a mobile phone
  • the speech-based dialing feature is activated and the remote terminal receives speech input representative of a first character in a character sequence.
  • the character preferably represents one digit of the well-known ten-digit dialing format (e.g., xxx-xxx-xxxx).
  • the character sequence could be in a format adapted for a dialing system of a different geographic region, or, in a data application, could represent a network address in a data network (e.g., a URL or an IP address).
  • the character sequence may represent commands addressed to the remote terminal, or a memory location that includes a number for speed dialing.
  • the received character is converted to a digital signal representative of the character spoken by the user. Conversion may be accomplished using an analog-to-digital (A/D) converter in combination with a suitable ASR module. Many ASR modules implement statistical procedures for reporting reliability metrics of the determination made for a particular character.
  • A/D analog-to-digital
  • Desired reliability rates may be programmed into the ASR module's logic, or may be selectable by the user and input to the system as a parameter.
  • ASR modules are known in the art, and particular details of the ASR module are not critical to the invention.
  • a test is performed to determine whether the character sequence input is complete. For example, in the United States telephone system, which uses a ten character format, the character sequence may be considered complete at the entry of the tenth character. In an alternate embodiment, the determination step may use a time-out procedure, such that the character sequence is assumed to be complete if a predetermined time elapses after the entry of a particular character.
  • a user may actively indicate that the character sequence is complete, either by pressing a designated key or by speaking a designated code.
  • One of ordinary skill in the art will recognize numerous other ways to detect the end of an input character sequence. If the character sequence is not complete, then steps 310 through 330 may be repeated until the character sequence is complete, or the user indicates a desire to cancel the speech input process.
  • a test is conducted to determine whether the character sequence includes one or more unrecognized characters.
  • the term "unrecognized character” shall refer to a character in the character sequence that is not validated by the ASR module.
  • the system may test to determine whether a reliability metric associated with one or more characters in the character sequence is less than a predetermined threshold (e.g., 95%, or 90%), and, if so, then the character sequence may be characterized as having unrecognized characters. Additional tests may also be applied.
  • a predetermined threshold e.g. 95%, or 90%
  • the character sequence may be characterized as having unrecognized characters. If the character sequence does not include unrecognized characters, then at step 380, the character sequence is dialed and remote terminal 200 attempts to establish a connection with the network.
  • a memory module associated with the remote terminal is searched to determine whether a character sequence in the memory module matches the recognized characters in the character sequence input by the user. If at step 360, a match is found, then the character sequence is retrieved from memory and optionally may be presented to the user, at step 370. In one embodiment, the character sequence is visually presented to the user, such as by display on a LCD or other suitable display. In another embodiment, a speech synthesizer presents the character sequence to the user audibly. Upon receiving an indication of approval from the user, the character sequence is dialed at step 380. It will be recognized that some or all of steps 310 through 380 may be performed by a suitable ASIC, DSP, or chip set, or by logic instructions operating on a general purpose processor.

Abstract

A method for enhancing the accuracy of speech-based dialing of remote communication terminals, and terminals incorporating the method, are disclosed. Analog speech input representative of a desired phone number is converted to a digital signal. An automatic speech recognition module identifies the digits and produces an output signal representative of the digits. A determining module applies a test to determine whether one or more digits in the phone number were not recognized by the conversion module. If the phone number includes unrecognized digits, a search module searches an associated memory module for phone numbers having digits that match the recognized digits of the phone number input by the user. Phone numbers from the memory that match may be presented to the user, either visually or audibly. If desired, the remote terminal may establish a connection with the phone number selected from the memory module.

Description

SYSTEM AND METHOD OF INCREASING THE RECOGNITION
RATE OF SPEECH-INPUT INSTRUCTIONS
IN REMOTE COMMUNICATION TERMINALS
BACKGROUND
The present invention relates to speech-input recognition in communication devices and more particularly to systems and methods for enhancing the accuracy of speech dialing systems in remote communication terminals.
Remote communication terminals such as, for example, mobile telephones are ubiquitous in many modern industrialized countries. Most remote communication terminals utilize a keypad as an input device. However, keypads suffer from certain drawbacks. Foremost, the use of keypads may require a user to direct his or her attention to the communication device, if only for a brief moment. In certain circumstances, such as when driving, this is considered undesirable. Further, market forces continuously drive manufacturers to produce smaller remote telephone terminal devices, also referred to as handsets. Reducing the size of the terminal device renders keypad errors more likely, thereby reducing the accuracy of the keypad as an input device.
Manufacturers have implemented speech-based input devices adapted to receive a speech input, to recognize the input, and to perform an action based on the input. By way of example, U.S. Patent No. 4,959,850 to Kuniyoshi discloses a radio telephone apparatus that includes speech recognition capabilities for speech-based dialing of the phone. Similarly, U.S. Patents No. 5,042,063 to Sakanishi and No. 4,870,686 to Gerson et al. disclose a telephone apparatus that utilizes speech recognition capabilities to allow speech-based dialing. Speech recognition functions are also disclosed in the following references: U.S. Patents No. 5,917,891 to Will; No. 5,884,257 to Maekawa et al.; No. 5,651,056 to Eting et al; No. 5,638,425 to Meador; No. 5,509,049 to Peterson; No. 5,495,553 to Jakatdar; and No. 5,303,299 to
1
CONFIRMATION C0PV Hunt et al.
However, speech recognition is a difficult task, particularly when the speech signal is combined with ambient noise from the surrounding environment, such as automobile noise or street noise. Inadequate enunciation and/or interference from ambient noise may render a user's speech unrecognizable to the device. In speech- based dialing applications, this may result in the telephone device dialing an incorrect number. Alternatively, the telephone device may prompt the user to repeat the unrecognized digit(s), or the entire digit sequence. Depending upon the accuracy of the speech recognition system, the user may be required to repeat numbers a significant percentage of the time, rendering the speech-based dialing feature less convenient for the user.
Accordingly, there is a need in the art for improved speech-based dialing systems and methods.
SUMMARY
The present invention addresses these and other problems by providing an apparatus and method for facilitating speech-based dialing of remote communication terminals, including mobile phones. According to the invention, a remote terminal is adapted to use information stored in a memory to enhance the accuracy of the speech-recognition routine. Preferably, the information includes a-priori information about phone numbers previously dialed from the remote terminal, which can be matched with phone numbers input by a speech-based dialing method to enhance the accuracy of the speech recognition system.
In one aspect, the invention provides a system for facilitating speech-based dialing of a communication device. The system comprises a conversion module for receiving speech input representative of an input character sequence and generating a signal representative of each character in the input character sequence, a determining module for determining whether the input character sequence includes unrecognized characters, a memory module including a plurality of character sequences corresponding to network addresses, and a search module for searching the memory module for a character sequence having characters that correspond to recognized characters in the input character sequence. In use, if the conversion module is unable to convert one or more characters in the input character sequence, then the search module can search the memory module for one or more character sequences in the memory module having characters that match the recognized characters of the input character sequence.
In another aspect, the invention provides a method of facilitating speech- based calling in a communication device. The method comprises the steps of receiving a speech input representative of a desired character sequence, generating a signal representative of each character in the character sequence, determining whether the character sequence includes unrecognized characters, and if so, then searching a memory module for a matching character sequence having characters that correspond to recognized characters in the input character sequence, and generating a signal representative of a matching character sequence.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other objects, features and advantages of the present invention will become more apparent upon reading this description, taken in conjunction with the accompanying drawings, wherein:
Fig. 1 is a block diagram of an exemplary GSM communication suitable for implementing the present invention;
Fig. 2 is a flow chart illustrating a method of facilitating speech-based calling in a communication device according to an embodiment of the invention; and
Fig. 3 is a schematic depiction of a remote communication terminal according to an embodiment of the invention. DETAILED DESCRIPTION
Many digital wireless systems in use today utilize a time slotted access system. User information (e.g., speech) is segmented, compressed, packetized and transmitted in a pre-allocated time slot. Time slots can be allocated to different users, a scheme commonly referred to as Time Division Multiple Access (TDMA). Time Division Multiple Access (TDMA) communication systems, such as the Global System for Mobile communications (GSM) system in Europe, the Digital- Advanced Mobile Phone System (D-AMPS) system in North America, or the Personal Digital Cellular (PDC) system in Japan, allow a single radio frequency channel to be shared between multiple remote terminals, thereby increasing the capacity of the communication system.
The following exemplary embodiments are provided in the context of time division multiple access (TDMA) radiocommunication systems. However, those skilled in the art will appreciate that a TDMA methodology is described solely for purposes of illustration, and that the present invention is readily applicable to all types of access methodologies including frequency division multiple access (FDMA), TDMA, code division multiple access (CDMA) and/or hybrids thereof. Operation of a cellular communication system in accordance with the GSM standard is described in European Telecommunication Standard Institute (ETSI) documents ETS 300 573, ETS 300 574, and ETS 300 578, which are hereby incorporated by reference. Therefore, the operation of an exemplary GSM system is only briefly described herein. Although the present invention is described in terms of exemplary embodiments in a GSM system, those skilled in the art will appreciate that the present invention could be used in other communication systems.
Referring to Fig. 1, a communication system 10 in which the present invention can be implemented is depicted. The system 10 is a hierarchical network with multiple levels for managing calls. Using a set of uplink and downlink radio frequencies, remote communication terminals 12 operating within the system 10 participate in calls using time slots allocated to them on these frequencies. At an upper hierarchical level, a group of Mobile Switching Centers (MSCs) 14 route calls from originators to destinations. In particular, these entities are responsible for setup, control and termination of calls. One of the MSCs 14, commonly referred to as a gateway MSC, handles communication with a Public Switched Telephone Network (PSTN) 18, or other public and private networks.
Each of the MSCs 14 are connected to one or more base station controllers (BSCs) 16. Under the GSM standard, the BSC 16 communicates with a MSC 14 under a standard interface known as the A-interface, which is based on the Mobile
Application Part of CCITT Signaling System No. 7.
Each of the BSCs 16 controls one or more base transceiver stations (BTSs) 20. Each BTS 20 includes one or more transceivers (TRXs) (not shown) that use the uplink and downlink radio frequencies (RF channels) to serve a particular geographical area, such as one or more communication cells 21. The BTSs 20 primarily provide the RF links for the transmission and reception of data bursts to and from the remote stations 12 within their respective cells. In an exemplary embodiment, a number of BTSs 20 are incorporated into a radio base station (RBS) 22. The RBS 22 may be, for example, configured according to a family of RBS- 2000 products, which products are offered by Telefonaktiebolaget LM Ericsson, the assignee of the present invention. For more details regarding exemplary remote station 12 and RBS 22 implementations, the interested reader is referred to U.S. Patent No. 5,909,469 to Frodigh et al, the disclosure of which is expressly incorporated here by reference. Fig. 2 presents a schematic depiction of a remote terminal 200 adapted for use in accordance with the present invention. Remote terminal 200 is preferably a mobile phone for use in a digital TDMA cellular communication system, such as, for example, a GSM system, a PDC system, or a D-AMPS system. However, as noted above, the present invention is applicable to all types of access systems, and can easily be applied in TDMA or CDMA systems, or hybrids thereof. Remote terminals are widely known and readily commercially available. Accordingly, only those aspects of remote terminal 200 that are pertinent to the present invention are described in detail. For additional information relating to remote terminals, the interested reader is referred to U.S. Patent No. 5,745,523 to Dent et al., the disclosure of which is incorporated here by reference.
Referring to Fig. 2, remote terminal 200 includes, in relevant part, a microphone 210 for receiving speech input from a user of the phone. Microphone 210 is connected to conversion module 220. Conversion module 220 may comprise an analog to digital (A/D) converter 224 for converting analog speech input to a digital signal. Conversion module 220 may also include an automatic speech recognition (ASR) module 228 for recognizing the speech of the user. Remote terminal 200 further includes a determining module 230 for determining whether a character spoken by the user was recognized by ASR module 228 with a desired degree of accuracy. Remote terminal 200 further includes a memory module 250 for storing character sequences that represent valid phone numbers, and a search module 240 for searching memory module 250. Remote terminal 200 also includes a connection module 260 for establishing a communication connection with a communication network such as, for example, a GSM network as depicted in Fig. 1.
Remote terminal 200 further includes a suitable display 270 (e.g., an LED or LCD display) for displaying information to a user. One terminal with a suitable speech recognition module is the T28 commercially available from Ericsson.
It will be appreciated that some or all of modules 220-260 may be embodied in a suitable application specific integrated circuit (ASIC) or a programmed digital signal processor (DSP), or by a chip set comprising a plurality of ASICs. Electrical connections are formed between the respective modules 220-260 and other components of the remote terminal. For example, determining module 230 and search module 240 are electrically connected to display 270, to speaker 280, and to connection module 260.
Additionally, in a preferred embodiment, an electrical connection between memory module 250 and connection module 260 allows memory module 250 to store telephone numbers associated with connections established by remote terminal
200. For example, each time a user enters a phone number in remote terminal 200, the number may be stored in memory module 250. In this manner, memory module 250 maintains a list of previously-dialed telephone numbers that can be used as a- priori information to enhance the accuracy of speech-based dialing, as described below.
Fig. 3 illustrates a method for speech-based dialing according to an embodiment of the invention. In brief overview, referring to Fig. 3, the method includes receiving a spoken character from a user, converting the character to a digital signal, and determining whether the character sequence is complete. If the character sequence is not complete, the system iteratively receives additional characters and converts the characters to a digital signal. After a complete character sequence has been received, the system determines whether the character sequence includes one or more unrecognized characters. If the character sequence does not include unrecognized characters, then the character sequence may be transmitted to a module (e.g., a connection module) that enables the phone to dial the number corresponding to the recognized character sequence. If the character sequence includes one or more unrecognized characters, then a search module is invoked. The search module compares the recognized digits in the character sequence with corresponding digits in character sequences in an associated memory to determine whether a character sequence in memory is a likely match with the character sequence input by the user. When a likely match is detected, the character sequence may be transmitted to a module that enables the phone to dial the number corresponding to the recognized character sequence. Alternatively, the character sequence may be displayed or audibly presented to the user of the phone, who can indicate whether the character sequence does, in fact, match the desired character sequence. This process will be explained in greater detail below.
In an exemplary embodiment, the process set forth in Fig. 3 may be implemented in a remote communication terminal, e.g., a mobile phone, having a speech-based dialing feature. Referring to Fig. 3, at step 310 the speech-based dialing feature is activated and the remote terminal receives speech input representative of a first character in a character sequence. In the United States, the character preferably represents one digit of the well-known ten-digit dialing format (e.g., xxx-xxx-xxxx). However, it will be appreciated that the character sequence could be in a format adapted for a dialing system of a different geographic region, or, in a data application, could represent a network address in a data network (e.g., a URL or an IP address). Alternatively, the character sequence may represent commands addressed to the remote terminal, or a memory location that includes a number for speed dialing.
At step 320, the received character is converted to a digital signal representative of the character spoken by the user. Conversion may be accomplished using an analog-to-digital (A/D) converter in combination with a suitable ASR module. Many ASR modules implement statistical procedures for reporting reliability metrics of the determination made for a particular character.
Desired reliability rates may be programmed into the ASR module's logic, or may be selectable by the user and input to the system as a parameter. ASR modules are known in the art, and particular details of the ASR module are not critical to the invention. At step 330, a test is performed to determine whether the character sequence input is complete. For example, in the United States telephone system, which uses a ten character format, the character sequence may be considered complete at the entry of the tenth character. In an alternate embodiment, the determination step may use a time-out procedure, such that the character sequence is assumed to be complete if a predetermined time elapses after the entry of a particular character. In another alternate embodiment, a user may actively indicate that the character sequence is complete, either by pressing a designated key or by speaking a designated code. One of ordinary skill in the art will recognize numerous other ways to detect the end of an input character sequence. If the character sequence is not complete, then steps 310 through 330 may be repeated until the character sequence is complete, or the user indicates a desire to cancel the speech input process.
After it is determined that the character sequence is complete, at step 340, a test is conducted to determine whether the character sequence includes one or more unrecognized characters. As used herein, the term "unrecognized character" shall refer to a character in the character sequence that is not validated by the ASR module. In one embodiment, the system may test to determine whether a reliability metric associated with one or more characters in the character sequence is less than a predetermined threshold (e.g., 95%, or 90%), and, if so, then the character sequence may be characterized as having unrecognized characters. Additional tests may also be applied. For example, if the reliability metric associated with two characters is less than a predetermined threshold, then the character sequence may be characterized as having unrecognized characters. If the character sequence does not include unrecognized characters, then at step 380, the character sequence is dialed and remote terminal 200 attempts to establish a connection with the network.
If the character sequence includes unrecognized characters, then at step 350, a memory module associated with the remote terminal is searched to determine whether a character sequence in the memory module matches the recognized characters in the character sequence input by the user. If at step 360, a match is found, then the character sequence is retrieved from memory and optionally may be presented to the user, at step 370. In one embodiment, the character sequence is visually presented to the user, such as by display on a LCD or other suitable display. In another embodiment, a speech synthesizer presents the character sequence to the user audibly. Upon receiving an indication of approval from the user, the character sequence is dialed at step 380. It will be recognized that some or all of steps 310 through 380 may be performed by a suitable ASIC, DSP, or chip set, or by logic instructions operating on a general purpose processor.
Although the invention has been described in detail with reference to a few exemplary embodiments, those skilled in the art will appreciate that various modifications can be made without departing from the invention. Accordingly, the invention is defined only by the following claims which are intended to embrace all equivalents thereof.

Claims

What is claimed is:
1. A system for facilitating speech-dialing of a communication device, comprising: a conversion module for receiving speech input representative of an input character sequence and generating a signal representative of each character in the input character sequence; a determining module for determining whether the input character sequence includes unrecognized characters; a memory module including a plurality of character sequences corresponding to network addresses; and a search module for searching the memory module for a character sequence having characters that correspond to recognized characters in the input character sequence; such that, if the conversion module is unable to convert one or more characters in the input character sequence, then the search module can search the memory module for one or more character sequences in the memory module having characters that match the recognized characters of the input character sequence.
2. A system according to claim 1, wherein the conversion module comprises: an A/D converter for digitizing the received speech input signal.
3. A system accordmg to claim 1, wherein the conversion module comprises: a speech recognition module for analyzing the digital signal and generating a signal indicative of a character sequence represented by the digital signal.
4. A system according to claim 1, wherein: the conversion module generates a signal representative of a confidence level associated with the accuracy of the conversion; and the determining module generates a signal indicative of whether the confidence level is greater than a predetermined threshold.
5. A system according to claim 1, wherein: the conversion module and the determining module are embodied within a digital signal processor.
6. A system according to claim 1, further comprising: an output module for generating a signal representative of a character sequence in the memory.
7. A system according to claim 6, further comprising: a display module for displaying the character sequence represented by the signal generated by the output module.
8. A system according to claim 6, further comprising: a module for audibly announcing the character sequence represented by the signal generated by the output module.
9. A system according to claim 1, further comprising: a connection module for establishing a connection with the character sequence represented by the signal generated by the output module.
10. A method of facilitating speech-based calling in a communication device, comprising the steps of: receiving a speech input representative of a desired character sequence; generating a signal representative of each character in the character sequence; determining whether the character sequence includes unrecognized characters, and if so, then searching a memory module for a matching character sequence having characters that correspond to recognized characters in the input character sequence; and generating a signal representative of a matching character sequence.
11. A method according to claim 10, wherein the step of generating a signal representative of each character in the character sequence includes digitizing the received speech input signal.
12. A method according to claim 11, wherein the step of generating a signal representative of each character in the character sequence includes analyzing the digital signal and generating a signal indicative of a character sequence represented by the digital signal.
13. A method according to claim 10, wherein the step of generating a signal representative of each character in the character sequence includes generating a first signal representative of a confidence level associated with the accuracy of the conversion.
14. A method according to claim 13, wherein the step of determining whether the character sequence includes unrecognized characters includes comparing the confidence level to a predetermined threshold and generating a second signal indicative of whether the confidence level is greater than a predetermined threshold.
15. A method according to claim 10, further comprising displaying the character sequence represented by the signal generated by the output module.
16. A method according to claim 10, further comprising audibly announcing the character sequence represented by the signal generated by the output module.
PCT/EP2000/010742 1999-11-04 2000-10-31 System and method of increasing the recognition rate of speech-input instructions in remote communication terminals WO2001033553A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2001535162A JP2003513341A (en) 1999-11-04 2000-10-31 System and method for increasing recognition rate of voice input command in telecommunications terminal
EP00975973A EP1226576A2 (en) 1999-11-04 2000-10-31 System and method of increasing the recognition rate of speech-input instructions in remote communication terminals
AU13905/01A AU1390501A (en) 1999-11-04 2000-10-31 System and method of increasing the recognition rate of speech-input instructions in remote communication terminals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US43414199A 1999-11-04 1999-11-04
US09/434,141 1999-11-04

Publications (2)

Publication Number Publication Date
WO2001033553A2 true WO2001033553A2 (en) 2001-05-10
WO2001033553A3 WO2001033553A3 (en) 2001-11-29

Family

ID=23722981

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2000/010742 WO2001033553A2 (en) 1999-11-04 2000-10-31 System and method of increasing the recognition rate of speech-input instructions in remote communication terminals

Country Status (5)

Country Link
EP (1) EP1226576A2 (en)
JP (1) JP2003513341A (en)
CN (1) CN1191566C (en)
AU (1) AU1390501A (en)
WO (1) WO2001033553A2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100412474B1 (en) * 2001-06-28 2003-12-31 유승혁 a Phone-book System and Management Method Of Telephone and Mobile-Phone used to Voice Recognition and Remote Phone-book Server
WO2006023806A2 (en) * 2004-08-23 2006-03-02 Exbiblio B.V. A method and system for character recognition
US7162424B2 (en) 2001-04-26 2007-01-09 Siemens Aktiengesellschaft Method and system for defining a sequence of sound modules for synthesis of a speech signal in a tonal language
KR100869878B1 (en) * 2001-12-31 2008-11-24 주식회사 케이티 System for generating pronunciation dictionary in intelligent network services using voice recognition and method for using the same system
US8442331B2 (en) 2004-02-15 2013-05-14 Google Inc. Capturing text from rendered documents using supplemental information
US9030699B2 (en) 2004-04-19 2015-05-12 Google Inc. Association of a portable scanner with input/output and storage devices
US9075779B2 (en) 2009-03-12 2015-07-07 Google Inc. Performing actions based on capturing information from rendered documents, such as documents under copyright
US9081799B2 (en) 2009-12-04 2015-07-14 Google Inc. Using gestalt information to identify locations in printed information
US10769431B2 (en) 2004-09-27 2020-09-08 Google Llc Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635723B2 (en) 2004-02-15 2020-04-28 Google Llc Search engines and systems with handheld document data capture devices
US9116890B2 (en) 2004-04-01 2015-08-25 Google Inc. Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US20070300142A1 (en) 2005-04-01 2007-12-27 King Martin T Contextual dynamic advertising based upon captured rendered text
US20080313172A1 (en) 2004-12-03 2008-12-18 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US9143638B2 (en) 2004-04-01 2015-09-22 Google Inc. Data capture from rendered documents using handheld device
US8874504B2 (en) 2004-12-03 2014-10-28 Google Inc. Processing techniques for visual capture data from a rendered document
US8620083B2 (en) 2004-12-03 2013-12-31 Google Inc. Method and system for character recognition
US8346620B2 (en) 2004-07-19 2013-01-01 Google Inc. Automatic modification of web pages
EP2406767A4 (en) 2009-03-12 2016-03-16 Google Inc Automatically providing content associated with captured information, such as information captured in real-time
US9323784B2 (en) 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
DE102014200570A1 (en) * 2014-01-15 2015-07-16 Bayerische Motoren Werke Aktiengesellschaft Method and system for generating a control command

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0425291A2 (en) * 1989-10-25 1991-05-02 Xerox Corporation Word recognition process and apparatus
DE19532114A1 (en) * 1995-08-31 1997-03-06 Deutsche Telekom Ag Telephone speech dialogue method for automated output of dialling numbers and locations
EP0844583A2 (en) * 1996-11-20 1998-05-27 Matsushita Electric Industrial Co., Ltd. Method and apparatus for character recognition
WO1999035806A1 (en) * 1998-01-09 1999-07-15 Alcatel Usa, Inc. Method and system for totally voice activated dialing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0425291A2 (en) * 1989-10-25 1991-05-02 Xerox Corporation Word recognition process and apparatus
DE19532114A1 (en) * 1995-08-31 1997-03-06 Deutsche Telekom Ag Telephone speech dialogue method for automated output of dialling numbers and locations
EP0844583A2 (en) * 1996-11-20 1998-05-27 Matsushita Electric Industrial Co., Ltd. Method and apparatus for character recognition
WO1999035806A1 (en) * 1998-01-09 1999-07-15 Alcatel Usa, Inc. Method and system for totally voice activated dialing

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7162424B2 (en) 2001-04-26 2007-01-09 Siemens Aktiengesellschaft Method and system for defining a sequence of sound modules for synthesis of a speech signal in a tonal language
KR100412474B1 (en) * 2001-06-28 2003-12-31 유승혁 a Phone-book System and Management Method Of Telephone and Mobile-Phone used to Voice Recognition and Remote Phone-book Server
KR100869878B1 (en) * 2001-12-31 2008-11-24 주식회사 케이티 System for generating pronunciation dictionary in intelligent network services using voice recognition and method for using the same system
US8442331B2 (en) 2004-02-15 2013-05-14 Google Inc. Capturing text from rendered documents using supplemental information
US9030699B2 (en) 2004-04-19 2015-05-12 Google Inc. Association of a portable scanner with input/output and storage devices
WO2006023806A2 (en) * 2004-08-23 2006-03-02 Exbiblio B.V. A method and system for character recognition
WO2006023806A3 (en) * 2004-08-23 2010-09-10 Exbiblio B.V. A method and system for character recognition
US10769431B2 (en) 2004-09-27 2020-09-08 Google Llc Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device
US9075779B2 (en) 2009-03-12 2015-07-07 Google Inc. Performing actions based on capturing information from rendered documents, such as documents under copyright
US9081799B2 (en) 2009-12-04 2015-07-14 Google Inc. Using gestalt information to identify locations in printed information

Also Published As

Publication number Publication date
WO2001033553A3 (en) 2001-11-29
CN1387663A (en) 2002-12-25
EP1226576A2 (en) 2002-07-31
AU1390501A (en) 2001-05-14
JP2003513341A (en) 2003-04-08
CN1191566C (en) 2005-03-02

Similar Documents

Publication Publication Date Title
WO2001033553A2 (en) System and method of increasing the recognition rate of speech-input instructions in remote communication terminals
US6782278B2 (en) Dialing method for dynamically simplifying international call in cellular phone
US7643619B2 (en) Method for offering TTY/TDD service in a wireless terminal and wireless terminal implementing the same
KR100617784B1 (en) Apparatus and method for searching telephone number in mobile terminal equipment
US6751481B2 (en) Dialing method for effecting international call in intelligent cellular phone
JP2008523770A (en) Method and apparatus for supporting enhanced international dialing in cellular systems
US20050288926A1 (en) Network support for wireless e-mail using speech-to-text conversion
KR100393398B1 (en) Systems and methods for generating current time in cellular wireless telephones
US6122485A (en) Method and system for confirming receipt of a message by a message reception unit
RU98108604A (en) METHOD AND DEVICE FOR ENSURING THE IMPROVED INTERFACE OF THE CALLING SUBSCRIBER IN THE STATIONARY CELLULAR COMMUNICATION SYSTEM
EP1751742A1 (en) Mobile station and method for transmitting and receiving messages
US7043436B1 (en) Apparatus for synthesizing speech sounds of a short message in a hands free kit for a mobile phone
JP2002171332A (en) Communication terminal equipment
EP1244260B1 (en) Communication terminal unit capable of receiving a message and method for identifying a message sender in the same
US20030013494A1 (en) Mobile radio terminal equipment
US7561873B2 (en) Mobile handset as TTY device
EP1657893A1 (en) Apparatus and method for reporting incoming call or message reception through voice synthesis in mobile communication terminal
KR20070065688A (en) Method and mobile communication terminal for displaying sms message received during video communication
KR100981896B1 (en) Method and system for international dialing over a cdma air interface
US20040204033A1 (en) Communication device connected to a first and a second communication networks
US20050107112A1 (en) Apparatus, and an associated method, for creating and using a call-screening list to screen calls placed to a communication station
KR20020006864A (en) Method of Changing Telephone signals
JPH0818501A (en) Radio communication system and radio communication terminal equipment
WO1999001865A1 (en) Digital cellular phone with voice recognition function and method for controlling the same
KR100658889B1 (en) Method for generating a receiving ring in a mobile communication system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 535162

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 2000975973

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 008153701

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2000975973

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2000975973

Country of ref document: EP