US6728677B1 - Method and system for dynamically improving performance of speech recognition or other speech processing systems - Google Patents

Method and system for dynamically improving performance of speech recognition or other speech processing systems Download PDF

Info

Publication number
US6728677B1
US6728677B1 US09/773,996 US77399601A US6728677B1 US 6728677 B1 US6728677 B1 US 6728677B1 US 77399601 A US77399601 A US 77399601A US 6728677 B1 US6728677 B1 US 6728677B1
Authority
US
United States
Prior art keywords
dynamically
speech processing
performance
recited
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US09/773,996
Inventor
Ashvin Kannan
Hy Murveit
Christopher Leggetter
Michael Schuster
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
Nuance Communications Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuance Communications Inc filed Critical Nuance Communications Inc
Priority to US09/773,996 priority Critical patent/US6728677B1/en
Assigned to NUANCE COMMUNICATIONS reassignment NUANCE COMMUNICATIONS ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCHUSTER, MICHAEL, KANNAN, ASHVIN, LEGGETTER, CHRISTOPHER, MURVEIT, HY
Application granted granted Critical
Publication of US6728677B1 publication Critical patent/US6728677B1/en
Assigned to USB AG, STAMFORD BRANCH reassignment USB AG, STAMFORD BRANCH SECURITY AGREEMENT Assignors: NUANCE COMMUNICATIONS, INC.
Assigned to USB AG. STAMFORD BRANCH reassignment USB AG. STAMFORD BRANCH SECURITY AGREEMENT Assignors: NUANCE COMMUNICATIONS, INC.
Assigned to ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DELAWARE CORPORATION, AS GRANTOR, NUANCE COMMUNICATIONS, INC., AS GRANTOR, SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR, SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPORATION, AS GRANTOR, DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS GRANTOR, TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTOR, DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPORATON, AS GRANTOR reassignment ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DELAWARE CORPORATION, AS GRANTOR PATENT RELEASE (REEL:017435/FRAME:0199) Assignors: MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT
Assigned to MITSUBISH DENKI KABUSHIKI KAISHA, AS GRANTOR, NORTHROP GRUMMAN CORPORATION, A DELAWARE CORPORATION, AS GRANTOR, STRYKER LEIBINGER GMBH & CO., KG, AS GRANTOR, ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DELAWARE CORPORATION, AS GRANTOR, NUANCE COMMUNICATIONS, INC., AS GRANTOR, SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR, SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPORATION, AS GRANTOR, DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS GRANTOR, HUMAN CAPITAL RESOURCES, INC., A DELAWARE CORPORATION, AS GRANTOR, TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTOR, DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPORATON, AS GRANTOR, NOKIA CORPORATION, AS GRANTOR, INSTITIT KATALIZA IMENI G.K. BORESKOVA SIBIRSKOGO OTDELENIA ROSSIISKOI AKADEMII NAUK, AS GRANTOR reassignment MITSUBISH DENKI KABUSHIKI KAISHA, AS GRANTOR PATENT RELEASE (REEL:018160/FRAME:0909) Assignors: MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/285Memory allocation or algorithm optimisation to reduce hardware requirements

Definitions

  • the present invention relates to the field of speech recognition or other speech processing fields such as speaker verification or text-to-speech processing.
  • the present invention discloses a system and method for dynamically improving the performance of speech recognition or other speech processing systems.
  • Speech recognition systems are currently in use for responding to various forms of commerce via a telephone network.
  • One example of such a system is utilized in conjunction with a stock brokerage.
  • a caller can provide their account number, obtain a quotation for the price of a particular stock issue, purchase or sell a particular number of shares at market price or a predetermined target price among other types of transactions.
  • Natural language systems can also be used to respond to such things as requests for telephone directory assistance.
  • speech recognition systems are typically deployed to handle a maximum call capacity for peak periods. This means that the hardware supporting the system provides enough memory, processing power and bandwidth to handle calls with a predetermined level of accuracy. For example, a company may deploy a speech recognition system that may handle 10,000 callers because at noon the company has that many calls. However, at all other times the system is not used to its full potential because only 5,000 callers are in the system.
  • Speech recognition systems typically are configurable, within limits, as to the amount of processing power, memory, network bandwidth, and other system resources that they may consume. Often, memory, speed, and accuracy can be traded off against each other. For instance, a particular configuration of one system may use less CPU resources than another, typically at the cost of lower average speech recognition accuracy. System configuration is often done ahead of time, resulting in a particular resource/performance tradeoff for the particular deployment.
  • the present invention introduces a system and method for dynamically improving speech recognition or other speech processing systems by estimating the utilization of resources in the system; and improving the performance of the system according to the availability of resources.
  • FIG. 1 is a high level block diagram of an exemplary speech recognition system according to one embodiment of the present invention
  • FIG. 2 is an exemplary block diagram of a computer architecture used to implement embodiments of the present invention.
  • FIG. 3 shows an example of the processing flow of a speech recognition system according to one embodiment of the present invention.
  • a system and method for dynamically improving speech recognition in a speech recognition or other speech processing system such as a speaker verification system or a text-to-speech processing system is described.
  • the method comprises dynamically adjusting the system, which comprises estimating the utilization of resources in the system; and improving the performance of the system according to the availability of resources.
  • the techniques described herein may be implemented using one or more general purpose computers selectively activated or configured by computer software stored in the computer or elsewhere.
  • Such computer software may be stored in a computer readable storage medium, such as, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • references to “one embodiment” or “an embodiment” mean that the feature being referred to is included in at least one embodiment of the present invention. Further, separate references to “one embodiment” in this description do not necessarily refer to the same embodiment, however, neither are such embodiments mutually exclusive, unless so stated and except as will be readily apparent to those skilled in the art.
  • one or more servers 110 communicate to a plurality of clients 150 .
  • the clients 150 may transmit and receive data from servers 110 over a variety of communication media including (but not limited to) a local area network and/or a larger network 199 (e.g., the Internet).
  • Other types of communication channels such as wireless communication via satellite broadcast (not shown) may additionally (or alternatively) be used.
  • Clients 150 service callers 151 - 155 .
  • Callers 151 - 155 may be analog or digital telephones, cellular phones or other similar device capable of transmitting and receiving voice.
  • Servers 110 may include a database 140 for storing various types of data. This data may include, for example, specific caller data (e.g., caller account information and caller preferences) and/or more general data.
  • the database 140 may also store information regarding the level of service a caller is entitled to receive. For instance, callers may be “Platinum” callers and have a higher level of service then “Gold” or “Silver” level callers.
  • Database 140 may also include voice prints that are used to verify a caller's 151 - 155 identity.
  • the database on servers 110 in one embodiment runs an instance of a Relational Database Management System (RDBMS), such as MicrosoftTM SQL-Server, OracleTM or the like.
  • RDBMS Relational Database Management System
  • a user/client may interact with and receive feedback from servers 110 using various different communication devices and/or protocols.
  • a user connects to servers 110 via client software.
  • the client software may include a browser application such as Netscape NavigatorTM or Microsoft Internet ExplorerTM on the user's personal computer which communicates to servers 110 via the Hypertext Transfer Protocol (hereinafter “HTTP”).
  • HTTP Hypertext Transfer Protocol
  • clients may communicate with servers 110 via pagers (e.g., in which the necessary transaction software is electronic in a microchip) or, handheld computing devices.
  • System 100 also includes a dynamic performance adjuster 130 .
  • the adjuster 130 considers what resources are available in system 100 and increases the performance of system 100 until the resources are substantially exhausted. Resources may include CPU usage, memory usage, and bandwidth usage of servers 110 , for example. If the system 100 has no available resources, then adjuster 130 may decrease the performance of system 100 to free-up resources and avoid system overload. In another embodiment, adjuster 130 does nothing when no resources are available.
  • Resource manager 120 is also connected to the network 199 and is included in system 100 to balance the load carried by each server 110 . Thus, if callers 151 - 155 call into system 100 through client 150 , resource manager 120 will distribute caller 151 - 155 such that the resources (CPU and memory) of servers 110 are equally balanced.
  • the dynamic performance adjuster 130 requires information about resource utilization. Although the individual servers 110 may be aware of their own resource utilization, the resource manager 120 is useful because it is aware of resource utilization across all servers 110 , and may also be tracking historical usage patterns. It may be able to provide the dynamic performance adjuster 130 with good estimates of current and future resource utilization.
  • the resource manager 120 may be of the type described in U.S. Pat. No. 6,119,087 to Kuhn, et al. entitled “System Architecture for and Method of Voice Processing” assigned to Nuance Communications of Menlo Park, Calif. and herein incorporated by reference.
  • Resource manager 120 is not included as part of system 100 in alternate embodiments of the invention.
  • a computer system 200 representing exemplary clients 150 servers (e.g., commerce servers 110 ), dynamic performance improvers 130 , and/or resource manager 120 in which elements of the present invention may be implemented will now be described with reference to FIG. 2 .
  • servers e.g., commerce servers 110
  • dynamic performance improvers 130 e.g., dynamic performance improvers 130
  • resource manager 120 e.g., resource manager 120
  • Computer system 200 comprises a system bus 220 for communicating information, and a processor 210 coupled to bus 220 for processing information.
  • Computer system 200 further comprises a random access memory (RAM) or other dynamic storage device 225 (referred to herein as main memory), coupled to bus 220 for storing information and instructions to be executed by processor 210 .
  • Main memory 225 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 210 .
  • Computer system 200 also may include a read only memory (ROM) and/or other static storage device 226 coupled to bus 220 for storing static information and instructions used by processor 210 .
  • ROM read only memory
  • a data storage device 227 such as a magnetic disk or optical disc and its corresponding drive may also be coupled to computer system 200 for storing information and instructions.
  • Computer system 200 can also be coupled to a second I/O bus 250 via an I/O interface 230 .
  • a plurality of I/O devices may be coupled to I/O bus 250 , including a display device 243 , an input device (e.g., an alphanumeric input device 242 and/or a cursor control device 241 ). For example, video news clips and related information may be presented to the user on the display device 243 .
  • the communication device 240 is for accessing other computers (servers or clients) via a network 199 .
  • the communication device 240 may comprise a modem, a network interface card, or other well known interface device, such as those used for coupling to Ethernet, token ring, or other types of networks.
  • System 100 may be used to verify a caller's identity and allow access to the system 100 .
  • Another application may include generating speech to be played out to callers 151 - 155 , from some stored text representation (i.e. text-to-speech processing) or from other computer representations of what should be spoken to the caller.
  • System 100 may be used for recognizing a caller, 151 - 155 verifying that a caller 155 is who they claim to be. Specific examples include, using voice responses to listen to credit and banking information, or voice activated, systems used by airlines and other transportation agencies, automated operator systems, etc.
  • Caller 151 attempts to access system 100 via a telephone in order to obtain the caller's 151 checking account balance.
  • Caller 151 may be prompted to submit an utterance for identification and verification purposes. For example, the caller 151 may be asked to say their name. That utterance will be recognized by system 100 , and a database of permitted callers will be searched to determine if the caller's 151 name is valid. If caller 151 's name is valid, the name may then be analyzed by a verifier. The verifier will determine whether caller 151 is truly caller 151 or an imposter. If the caller 151 has been recognized and verified, then caller 151 may access their checking account information.
  • an adequate number of servers 110 should be available to handle the callers 151 - 155 at any given time. This is often referred to as quality of service. For example, if at a peak calling time 5,000 callers attempt to access system 100 , system 100 must have an adequate number of servers 110 (resources) available to handle the call volume. Furthermore, more resources may be desired if system 100 is required to perform with near perfect accuracy at maximum speed. However, there are often periods of low call volume where the resources of system 100 are not all used.
  • Dynamic performance adjuster 130 continuously monitors system 100 for periods of low or very high system 100 resource usages. In low periods, adjuster 130 dynamically improves the performance of system 100 to maximize the use of available resources. For example, during peak call volume periods, system 100 may be configured to deliver a 5% error rate in accurately identifying a caller's utterance. However, during a low-call period adjuster 130 can utilize all system 100 resources to lower the error rate below 5%. In another embodiment, a 5% error may be for aggregate caller population. But a group within that population may have a 15% error rate. For example, a 15% error rate may be attributable to non-native speakers of the English language. Adjuster 130 may use available resources to lower that specific group's error rate below 15%, targeting the extra resource available to the utterances that can benefit most from those extra resources.
  • a level of service value stored in profile database 140 may indicate a level of service associated with the particular caller. Therefore, if the caller has a high level of service, more resources will be dedicated to that caller. This may be done whether or not there are extra resources available.
  • the caller's level of service may be determined by many factors, including, for instance, if the caller has paid an extra subscription cost, if the caller is a frequent customer, or if the caller is more likely than average to purchase goods.
  • improves 130 may improve recognition of utterances, verification of callers 151 - 155 , as well as system latency. Latency involves how quickly system 100 responds to a caller 151 - 155 after an utterance is made. Thus system 100 may decrease the accuracy of the utterance recognition and verification, but in return provide increased system response.
  • speech processing system 100 may adjust its parameters to consume more system resources in order to improve processing quality. Likewise system 100 , may consume fewer system resources and, therefore, potentially reduce speech processing quality.
  • system's 100 pruning beam width may be altered to increase or decrease the systems CPU usage. Pruning beam width is a well-known speech recognition technique to one of ordinary skill in the art. Speech recognition system 100 converts speech to a sequence of words. This sequence of words is one of many possible sequences permitted by the grammar used by the speech recognition system. Speech recognition system 100 may save CPU resources by removing from consideration some of the possible word sequences. Possible word sequences are removed early on in its processing of a given sentence, it processing at that point indicates that these sequences are less likely than others to be the final recognition result.
  • speech recognition system 100 When speech recognition system 100 discards a sequence, it may make a speech recognition error (here referred to as a pruning error) if the sequence discarded was the word sequence actually spoken by the caller. It is also a pruning error if that sequence would have ultimately been chosen by the system had it not been discarded.
  • the number of sequences considered or discarded is affected by the pruning beam width. Increasing the width, results in more sequences considered, which results in more CPU usage, and accordingly, fewer potential errors. Conversely, reducing the beam width reduces the CPU usage, but may reduce the speech recognition accuracy of the system as well.
  • the dynamic adjuster 130 may use different processing strategies which differ in computational cost. Examples of this would be to change the acoustic models used for speech recognition to include models with more parameters, to include models adapted specifically to the current caller (caller 151 - 155 ), or different recognition algorithms.
  • Dynamic performance adjuster 130 may include a resource utilization estimator (RUE).
  • RUE analyzes servers 110 to determine if they have any available resources. Available resource may be from available CPU usage, available memory and bandwidth.
  • RUE can analyze the bandwidth availability or constraints on network 199 .
  • RUE can analyze the network 199 on a per node basis.
  • RUE may be part of server 110 , client 150 or resource manager 120 .
  • resource manager 120 may be part of RUE.
  • the functions of performance adjuster 130 may be isolated as shown in FIG. 1, or distributed throughout nodes of network 100 .
  • FIG. 3 shows an example of the processing flow of a speech recognition system according to one embodiment of the present invention.
  • the process commences at block 300 .
  • dynamic performance adjuster 130 receives available resources from servers 110 and network 199 .
  • adjuster 130 determines if there are too many unused resources. A percentage of the resource used from the available pool may be the criteria used (e.g. if fewer than 75% of the resources are used). If there are too many unused resources then the flow continues to processing block 380 where system parameters are adjusted to use more resource, aimed at improving speech recognition accuracy, speaker verification accuracy, or text-to-speech quality. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151 - 155 processes speech (e.g. recognizes one utterance) and then flow passes back to start block 300 . However, if too many resources were not unused in the test at decision.
  • adjuster 130 determines if the caller 151 - 155 might be a preferred caller who is eligible for improved service. If the caller is a preferred caller, then flow passes on to processing block 350 where system parameters are temporarily adjusted for the current caller to use more resource, aimed at improving speech recognition accuracy, speaker verification accuracy, or text-to-speech quality for that caller only. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151 - 155 processes speech for that caller. Flow then passes back to start block 300 . However, if the caller 151 - 155 is not a preferred caller, flow passes on to decision block 340 .
  • adjuster 130 determines if there are too few unused resources. A percentage of the resource used from the available pool may be the criteria used (e.g. if more than 90% of the resources are used). If there are too few unused resources, then the flow continues to processing block 370 where system parameters are adjusted to use fewer resources, aimed at preventing the system from excessive latency if its resources are overused. This can come at a cost of reduced speech recognition accuracy, reduced speaker verification accuracy, or reduced text-to-speech quality. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151 - 155 processes speech (e.g. recognizes one utterance) and then flow passes back to start block 300 . However, an adequate amount of resource is still available, given the test at decision block 340 , the flow passes on to start block 300 . The percentages described above are provided
  • the combination of improvements may include fewer than that described above. In another embodiment the order of the improvements may be rearranged as well.
  • system 100 may cooperate with an automatic speech recognition and verification software package, such as, Nuance 7 manufactured by Nuance Communications of Menlo Park, Calif.

Abstract

The present invention introduces a system and method for dynamically improving speech recognition in a speech recognition or other speech processing system. The method comprises dynamically adjusting the system, which comprises estimating the utilization of resources in the system; and improving the performance of the system according to the availability of resources.

Description

FIELD OF THE INVENTION
The present invention relates to the field of speech recognition or other speech processing fields such as speaker verification or text-to-speech processing. In particular the present invention discloses a system and method for dynamically improving the performance of speech recognition or other speech processing systems.
BACKGROUND OF THE INVENTION
Speech recognition systems are currently in use for responding to various forms of commerce via a telephone network. One example of such a system is utilized in conjunction with a stock brokerage. According to this system, a caller can provide their account number, obtain a quotation for the price of a particular stock issue, purchase or sell a particular number of shares at market price or a predetermined target price among other types of transactions. Natural language systems can also be used to respond to such things as requests for telephone directory assistance.
These types of speech recognition systems are typically deployed to handle a maximum call capacity for peak periods. This means that the hardware supporting the system provides enough memory, processing power and bandwidth to handle calls with a predetermined level of accuracy. For example, a company may deploy a speech recognition system that may handle 10,000 callers because at noon the company has that many calls. However, at all other times the system is not used to its full potential because only 5,000 callers are in the system.
Speech recognition systems typically are configurable, within limits, as to the amount of processing power, memory, network bandwidth, and other system resources that they may consume. Often, memory, speed, and accuracy can be traded off against each other. For instance, a particular configuration of one system may use less CPU resources than another, typically at the cost of lower average speech recognition accuracy. System configuration is often done ahead of time, resulting in a particular resource/performance tradeoff for the particular deployment.
SUMMARY OF THE INVENTION
The present invention introduces a system and method for dynamically improving speech recognition or other speech processing systems by estimating the utilization of resources in the system; and improving the performance of the system according to the availability of resources.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.
BRIEF DESCRIPTION OF THE DRAWINGS
The objects, features and advantages of the present invention will be apparent to one skilled in the art in view of the following detailed description in which:
FIG. 1 is a high level block diagram of an exemplary speech recognition system according to one embodiment of the present invention;
FIG. 2 is an exemplary block diagram of a computer architecture used to implement embodiments of the present invention; and
FIG. 3 shows an example of the processing flow of a speech recognition system according to one embodiment of the present invention.
DETAILED DESCRIPTION
A system and method for dynamically improving speech recognition in a speech recognition or other speech processing system such as a speaker verification system or a text-to-speech processing system is described. The method comprises dynamically adjusting the system, which comprises estimating the utilization of resources in the system; and improving the performance of the system according to the availability of resources.
The techniques described herein may be implemented using one or more general purpose computers selectively activated or configured by computer software stored in the computer or elsewhere. Such computer software may be stored in a computer readable storage medium, such as, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently constrained to any particular type of computer or other system. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized system to perform the required method steps. The required structure for a variety of these systems will be apparent from the description below. In addition, any of a variety of programming languages may be used to implement the teachings of the techniques described herein.
Note that in this description, references to “one embodiment” or “an embodiment” mean that the feature being referred to is included in at least one embodiment of the present invention. Further, separate references to “one embodiment” in this description do not necessarily refer to the same embodiment, however, neither are such embodiments mutually exclusive, unless so stated and except as will be readily apparent to those skilled in the art.
The techniques and elements described herein may be included within a speech recognition system 100 such as that illustrated in FIG. 1. According to the embodiment depicted in FIG. 1, one or more servers 110 communicate to a plurality of clients 150. The clients 150 may transmit and receive data from servers 110 over a variety of communication media including (but not limited to) a local area network and/or a larger network 199 (e.g., the Internet). Other types of communication channels such as wireless communication via satellite broadcast (not shown) may additionally (or alternatively) be used.
Clients 150 service callers 151-155. Callers 151-155 may be analog or digital telephones, cellular phones or other similar device capable of transmitting and receiving voice. Servers 110 may include a database 140 for storing various types of data. This data may include, for example, specific caller data (e.g., caller account information and caller preferences) and/or more general data. The database 140 may also store information regarding the level of service a caller is entitled to receive. For instance, callers may be “Platinum” callers and have a higher level of service then “Gold” or “Silver” level callers. Database 140 may also include voice prints that are used to verify a caller's 151-155 identity. The database on servers 110 in one embodiment runs an instance of a Relational Database Management System (RDBMS), such as Microsoft™ SQL-Server, Oracle™ or the like.
A user/client may interact with and receive feedback from servers 110 using various different communication devices and/or protocols. According to one embodiment, a user connects to servers 110 via client software. The client software may include a browser application such as Netscape Navigator™ or Microsoft Internet Explorer™ on the user's personal computer which communicates to servers 110 via the Hypertext Transfer Protocol (hereinafter “HTTP”). In other embodiments, clients may communicate with servers 110 via pagers (e.g., in which the necessary transaction software is electronic in a microchip) or, handheld computing devices.
System 100 also includes a dynamic performance adjuster 130. The adjuster 130 considers what resources are available in system 100 and increases the performance of system 100 until the resources are substantially exhausted. Resources may include CPU usage, memory usage, and bandwidth usage of servers 110, for example. If the system 100 has no available resources, then adjuster 130 may decrease the performance of system 100 to free-up resources and avoid system overload. In another embodiment, adjuster 130 does nothing when no resources are available.
Resource manager 120 is also connected to the network 199 and is included in system 100 to balance the load carried by each server 110. Thus, if callers 151-155 call into system 100 through client 150, resource manager 120 will distribute caller 151-155 such that the resources (CPU and memory) of servers 110 are equally balanced. The dynamic performance adjuster 130 requires information about resource utilization. Although the individual servers 110 may be aware of their own resource utilization, the resource manager 120 is useful because it is aware of resource utilization across all servers 110, and may also be tracking historical usage patterns. It may be able to provide the dynamic performance adjuster 130 with good estimates of current and future resource utilization. In one embodiment, the resource manager 120, may be of the type described in U.S. Pat. No. 6,119,087 to Kuhn, et al. entitled “System Architecture for and Method of Voice Processing” assigned to Nuance Communications of Menlo Park, Calif. and herein incorporated by reference.
Resource manager 120 is not included as part of system 100 in alternate embodiments of the invention.
An Exemplary Architecture
Having briefly described an exemplary network architecture, which employs various elements of the present invention, a computer system 200 representing exemplary clients 150 servers (e.g., commerce servers 110), dynamic performance improvers 130, and/or resource manager 120 in which elements of the present invention may be implemented will now be described with reference to FIG. 2.
One embodiment of computer system 200 comprises a system bus 220 for communicating information, and a processor 210 coupled to bus 220 for processing information. Computer system 200 further comprises a random access memory (RAM) or other dynamic storage device 225 (referred to herein as main memory), coupled to bus 220 for storing information and instructions to be executed by processor 210. Main memory 225 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 210. Computer system 200 also may include a read only memory (ROM) and/or other static storage device 226 coupled to bus 220 for storing static information and instructions used by processor 210.
A data storage device 227 such as a magnetic disk or optical disc and its corresponding drive may also be coupled to computer system 200 for storing information and instructions. Computer system 200 can also be coupled to a second I/O bus 250 via an I/O interface 230. A plurality of I/O devices may be coupled to I/O bus 250, including a display device 243, an input device (e.g., an alphanumeric input device 242 and/or a cursor control device 241). For example, video news clips and related information may be presented to the user on the display device 243.
The communication device 240 is for accessing other computers (servers or clients) via a network 199. The communication device 240 may comprise a modem, a network interface card, or other well known interface device, such as those used for coupling to Ethernet, token ring, or other types of networks.
It should be noted, however, that the described techniques are not limited to use in speech recognition systems, nor are they limited in application to speech signals or to any particular type of signal. In a speech recognition system such as the one shown in FIG. 1, multiple functions and tasks may be performed. System 100 may be used to verify a caller's identity and allow access to the system 100. Another application may include generating speech to be played out to callers 151-155, from some stored text representation (i.e. text-to-speech processing) or from other computer representations of what should be spoken to the caller. System 100 may be used for recognizing a caller, 151-155 verifying that a caller 155 is who they claim to be. Specific examples include, using voice responses to listen to credit and banking information, or voice activated, systems used by airlines and other transportation agencies, automated operator systems, etc.
Systems 100 may perform these tasks as follows. Caller 151 attempts to access system 100 via a telephone in order to obtain the caller's 151 checking account balance. Caller 151 may be prompted to submit an utterance for identification and verification purposes. For example, the caller 151 may be asked to say their name. That utterance will be recognized by system 100, and a database of permitted callers will be searched to determine if the caller's 151 name is valid. If caller 151's name is valid, the name may then be analyzed by a verifier. The verifier will determine whether caller 151 is truly caller 151 or an imposter. If the caller 151 has been recognized and verified, then caller 151 may access their checking account information.
In order for system 100 to be effective and useful, an adequate number of servers 110 should be available to handle the callers 151-155 at any given time. This is often referred to as quality of service. For example, if at a peak calling time 5,000 callers attempt to access system 100, system 100 must have an adequate number of servers 110 (resources) available to handle the call volume. Furthermore, more resources may be desired if system 100 is required to perform with near perfect accuracy at maximum speed. However, there are often periods of low call volume where the resources of system 100 are not all used.
Dynamic performance adjuster 130 continuously monitors system 100 for periods of low or very high system 100 resource usages. In low periods, adjuster 130 dynamically improves the performance of system 100 to maximize the use of available resources. For example, during peak call volume periods, system 100 may be configured to deliver a 5% error rate in accurately identifying a caller's utterance. However, during a low-call period adjuster 130 can utilize all system 100 resources to lower the error rate below 5%. In another embodiment, a 5% error may be for aggregate caller population. But a group within that population may have a 15% error rate. For example, a 15% error rate may be attributable to non-native speakers of the English language. Adjuster 130 may use available resources to lower that specific group's error rate below 15%, targeting the extra resource available to the utterances that can benefit most from those extra resources.
In yet another embodiment, once a caller has been identified, a level of service value stored in profile database 140 may indicate a level of service associated with the particular caller. Therefore, if the caller has a high level of service, more resources will be dedicated to that caller. This may be done whether or not there are extra resources available. The caller's level of service may be determined by many factors, including, for instance, if the caller has paid an extra subscription cost, if the caller is a frequent customer, or if the caller is more likely than average to purchase goods.
In another embodiment, improves 130 may improve recognition of utterances, verification of callers 151-155, as well as system latency. Latency involves how quickly system 100 responds to a caller 151-155 after an utterance is made. Thus system 100 may decrease the accuracy of the utterance recognition and verification, but in return provide increased system response.
There are many ways in which speech processing system 100 may adjust its parameters to consume more system resources in order to improve processing quality. Likewise system 100, may consume fewer system resources and, therefore, potentially reduce speech processing quality. In one embodiment system's 100 pruning beam width may be altered to increase or decrease the systems CPU usage. Pruning beam width is a well-known speech recognition technique to one of ordinary skill in the art. Speech recognition system 100 converts speech to a sequence of words. This sequence of words is one of many possible sequences permitted by the grammar used by the speech recognition system. Speech recognition system 100 may save CPU resources by removing from consideration some of the possible word sequences. Possible word sequences are removed early on in its processing of a given sentence, it processing at that point indicates that these sequences are less likely than others to be the final recognition result. When speech recognition system 100 discards a sequence, it may make a speech recognition error (here referred to as a pruning error) if the sequence discarded was the word sequence actually spoken by the caller. It is also a pruning error if that sequence would have ultimately been chosen by the system had it not been discarded. The number of sequences considered or discarded is affected by the pruning beam width. Increasing the width, results in more sequences considered, which results in more CPU usage, and accordingly, fewer potential errors. Conversely, reducing the beam width reduces the CPU usage, but may reduce the speech recognition accuracy of the system as well.
In another embodiment the dynamic adjuster 130 may use different processing strategies which differ in computational cost. Examples of this would be to change the acoustic models used for speech recognition to include models with more parameters, to include models adapted specifically to the current caller (caller 151-155), or different recognition algorithms.
Dynamic performance adjuster 130 may include a resource utilization estimator (RUE). RUE analyzes servers 110 to determine if they have any available resources. Available resource may be from available CPU usage, available memory and bandwidth. In addition RUE can analyze the bandwidth availability or constraints on network 199. RUE can analyze the network 199 on a per node basis.
In another embodiment RUE may be part of server 110, client 150 or resource manager 120. In one embodiment, resource manager 120 may be part of RUE. The functions of performance adjuster 130 may be isolated as shown in FIG. 1, or distributed throughout nodes of network 100.
FIG. 3 shows an example of the processing flow of a speech recognition system according to one embodiment of the present invention. The process commences at block 300. At processing block 310, dynamic performance adjuster 130 receives available resources from servers 110 and network 199.
At decision block 320, adjuster 130 determines if there are too many unused resources. A percentage of the resource used from the available pool may be the criteria used (e.g. if fewer than 75% of the resources are used). If there are too many unused resources then the flow continues to processing block 380 where system parameters are adjusted to use more resource, aimed at improving speech recognition accuracy, speaker verification accuracy, or text-to-speech quality. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151-155 processes speech (e.g. recognizes one utterance) and then flow passes back to start block 300. However, if too many resources were not unused in the test at decision
At decision block 330, adjuster 130 determines if the caller 151-155 might be a preferred caller who is eligible for improved service. If the caller is a preferred caller, then flow passes on to processing block 350 where system parameters are temporarily adjusted for the current caller to use more resource, aimed at improving speech recognition accuracy, speaker verification accuracy, or text-to-speech quality for that caller only. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151-155 processes speech for that caller. Flow then passes back to start block 300. However, if the caller 151-155 is not a preferred caller, flow passes on to decision block 340.
At decision block 340, adjuster 130 determines if there are too few unused resources. A percentage of the resource used from the available pool may be the criteria used (e.g. if more than 90% of the resources are used). If there are too few unused resources, then the flow continues to processing block 370 where system parameters are adjusted to use fewer resources, aimed at preventing the system from excessive latency if its resources are overused. This can come at a cost of reduced speech recognition accuracy, reduced speaker verification accuracy, or reduced text-to-speech quality. After the parameters are adjusted, the flow continues to processing block 390 where the server 110 serving a caller 151-155 processes speech (e.g. recognizes one utterance) and then flow passes back to start block 300. However, an adequate amount of resource is still available, given the test at decision block 340, the flow passes on to start block 300. The percentages described above are provided
In alternate embodiments, the combination of improvements may include fewer than that described above. In another embodiment the order of the improvements may be rearranged as well.
In another embodiment, system 100 may cooperate with an automatic speech recognition and verification software package, such as, Nuance 7 manufactured by Nuance Communications of Menlo Park, Calif.
The foregoing has described a system and method for dynamically improving the performance of a speech processing system. It is contemplated that changes and modifications may be made by one of ordinary skill in the art, to the materials and arrangements of elements of the present invention without departing from the scope of the invention.

Claims (60)

We claim:
1. A method comprising:
monitoring utilization of computing resources in a speech processing system; and
based on said monitoring, dynamically improving performance of speech processing operations in the speech processing system by increasing utilization of the computing resources in the speech processing system in the absence of a need for greater utilization of the computing resources.
2. A method as recited in claim 1, wherein said dynamically improving performance of speech processing operations comprises dynamically increasing accuracy of speech processing operations.
3. A method as recited in claim 1, wherein said dynamically improving performance of speech processing operations comprises dynamically reducing latency of speech processing operations.
4. A method as recited in claim 1, wherein said dynamically improving performance of speech processing operations comprises dynamically improving performance of automatic speech recognition.
5. A method as recited in claim 4, wherein said dynamically improving performance of speech processing operations comprises dynamically increasing accuracy of speech recognition.
6. A method as recited in claim 4, wherein said dynamically improving performance of speech processing operations comprises dynamically reducing latency of speech recognition.
7. A method as recited in claim 1, wherein said dynamically improving performance of speech processing operations comprises dynamically improving performance of speaker authentication.
8. A method as recited in claim 7, wherein said dynamically improving performance of speech processing operations comprises dynamically increasing accuracy of speaker authentication.
9. A method as recited in claim 7, wherein said dynamically improving performance of speech processing operations comprises dynamically reducing latency of speaker authentication.
10. A method as recited in claim 1, wherein said dynamically improving performance of speech processing operations comprises dynamically improving performance of speaker identification.
11. A method as recited in claim 10, wherein said dynamically improving performance of speech processing operations comprises dynamically increasing accuracy of speaker identification.
12. A method as recited in claim 10, wherein said dynamically improving performance of speech processing operations comprises dynamically reducing latency of speaker identification.
13. A method comprising:
monitoring utilization of computing resources in a speech processing system; and
based on said monitoring, dynamically adjusting performance of speech processing operations in the speech processing system from a first adequate level of performance to a second adequate level of performance different from the first level, by dynamically adjusting utilization of the computing resources in the speech processing system.
14. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations in the speech processing system comprises dynamically degrading performance of speech processing operations in the speech processing system by dynamically reducing utilization computing of resources in the speech processing system.
15. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations in the speech processing system comprises dynamically improving performance of speech processing operations in the speech processing system by dynamically increasing utilization computing of resources in the speech processing system.
16. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speech processing operations.
17. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speech processing operations.
18. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of automatic speech recognition.
19. A method as recited in claim 18, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speech recognition.
20. A method as recited in claim 18, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speech recognition.
21. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of speaker authentication.
22. A method as recited in claim 21, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speaker authentication.
23. A method as recited in claim 21, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speaker authentication.
24. A method as recited in claim 13, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of speaker identification.
25. A method as recited in claim 24, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speaker identification.
26. A method as recited in claim 24, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speaker identification.
27. A method comprising:
receiving at a speech processing system a signal representing speech from an end user device;
using a speech recognition system in the speech processing system to automatically recognize the speech;
monitoring utilization of computing resources in the speech processing system; and
based on said monitoring, dynamically improving performance of speech recognition by the speech recognition system from a first level of performance to a second level of performance, by dynamically adjusting utilization of the computing resources in the speech processing system.
28. A method as recited in claim 27, wherein the first level of performance is an adequate level of performance, and wherein said dynamically improving performance of the speech recognition system is done in the absence of a need for greater utilization of computing resources for speech recognition.
29. A method as recited in claim 27, wherein said dynamically improving performance of speech recognition comprises dynamically increasing accuracy of speech recognition.
30. A method as recited in claim 27, wherein said dynamically improving performance of speech recognition comprises dynamically reducing latency of speech recognition.
31. A method comprising:
receiving at a speech processing system a signal representing speech of a speaker;
using a speaker authentication system in the speech processing system to automatically authenticate the speaker;
monitoring utilization of computing resources in the speech processing system; and
based on said monitoring, dynamically improving performance of speaker authentication by the speaker authentication system from a first level of performance to a second level of performance, by dynamically adjusting utilization of the computing resources in the speech processing system.
32. A method as recited in claim 31, wherein the first level of performance is an adequate level of performance, and wherein said dynamically improving performance of speaker authentication is done in the absence of a need for greater utilization of computing resources for speaker authentication.
33. A method as recited in claim 31, wherein said dynamically improving performance of speaker authentication comprises dynamically increasing accuracy of speaker authentication.
34. A method as recited in claim 31, wherein said dynamically improving performance of speaker authentication comprises dynamically reducing latency of speaker authentication.
35. A processing system comprising:
means for monitoring utilization of computing resources in a speech processing system; and
means for dynamically improving performance of speech processing operations in the speech processing system based on said monitoring, by increasing utilization of the computing resources in the speech processing system in the absence of a need for greater utilization of the computing resources.
36. A processing system as recited in claim 35, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically increasing accuracy of speech processing operations.
37. A processing system as recited in claim 35, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically reducing latency of speech processing operations.
38. A processing system as recited in claim 35, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically improving performance of automatic speech recognition.
39. A processing system as recited in claim 38, wherein said dynamically improving performance of speech processing operations comprises dynamically increasing accuracy of speech recognition.
40. A processing system as recited in claim 38, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically reducing latency of speech recognition.
41. A processing system as recited in claim 35, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically improving performance of speaker authentication.
42. A processing system as recited in claim 41, wherein said dynamically improving performance of speaker authentication comprises dynamically increasing accuracy of speaker authentication.
43. A processing system as recited in claim 41, wherein said dynamically improving performance of speaker authentication comprises dynamically reducing latency of speaker authentication.
44. A processing system as recited in claim 35, wherein said means for dynamically improving performance of speech processing operations comprises means for dynamically improving performance of speaker identification.
45. A processing system as recited in claim 44, wherein said dynamically improving performance of speaker identification comprises dynamically increasing accuracy of speaker identification.
46. A processing system as recited in claim 44, wherein said dynamically improving performance of speaker identification comprises dynamically reducing latency of speaker identification.
47. A processing system comprising:
a processor; and
a memory storing instructions which, when executed by the processor, cause the processing system to perform a process that comprises
monitoring utilization of computing resources in the processing system; and
based on said monitoring, dynamically adjusting performance of speech processing operations in the processing system from a first adequate level of performance to a second adequate level of performance different from the first level, by dynamically adjusting utilization of the computing resources in the speech processing system.
48. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations in the speech processing system comprises dynamically degrading performance of speech processing operations in the speech processing system by dynamically reducing utilization computing of resources in the speech processing system.
49. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations in the speech processing system comprises dynamically improving performance of speech processing operations in the speech processing system by dynamically increasing utilization computing of resources in the speech processing system.
50. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speech processing operations.
51. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speech processing operations.
52. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of automatic speech recognition.
53. A processing system as recited in claim 52, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speech recognition.
54. A processing system as recited in claim 52, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speech recognition.
55. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of speaker authentication.
56. A processing system as recited in claim 55, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speaker authentication.
57. A processing system as recited in claim 55, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speaker authentication.
58. A processing system as recited in claim 47, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting performance of speaker identification.
59. A processing system as recited in claim 58, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting accuracy of speaker identification.
60. A processing system as recited in claim 58, wherein said dynamically adjusting performance of speech processing operations comprises dynamically adjusting latency of speaker identification.
US09/773,996 2001-01-31 2001-01-31 Method and system for dynamically improving performance of speech recognition or other speech processing systems Expired - Lifetime US6728677B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/773,996 US6728677B1 (en) 2001-01-31 2001-01-31 Method and system for dynamically improving performance of speech recognition or other speech processing systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/773,996 US6728677B1 (en) 2001-01-31 2001-01-31 Method and system for dynamically improving performance of speech recognition or other speech processing systems

Publications (1)

Publication Number Publication Date
US6728677B1 true US6728677B1 (en) 2004-04-27

Family

ID=32108455

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/773,996 Expired - Lifetime US6728677B1 (en) 2001-01-31 2001-01-31 Method and system for dynamically improving performance of speech recognition or other speech processing systems

Country Status (1)

Country Link
US (1) US6728677B1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038659A1 (en) * 2001-11-29 2005-02-17 Marc Helbing Method of operating a barge-in dialogue system
US20050096906A1 (en) * 2002-11-06 2005-05-05 Ziv Barzilay Method and system for verifying and enabling user access based on voice parameters
US20080125748A1 (en) * 2006-09-25 2008-05-29 Medtronic Vascular, Inc. High Torque, Low Profile Catheters and Methods for Transluminal Interventions
US20090309698A1 (en) * 2008-06-11 2009-12-17 Paul Headley Single-Channel Multi-Factor Authentication
US20100005296A1 (en) * 2008-07-02 2010-01-07 Paul Headley Systems and Methods for Controlling Access to Encrypted Data Stored on a Mobile Device
US20100115114A1 (en) * 2008-11-03 2010-05-06 Paul Headley User Authentication for Social Networks
US7720683B1 (en) * 2003-06-13 2010-05-18 Sensory, Inc. Method and apparatus of specifying and performing speech recognition operations
US20100312556A1 (en) * 2009-06-09 2010-12-09 AT & T Intellectual Property I , L.P. System and method for speech personalization by need
US20110054905A1 (en) * 2009-08-26 2011-03-03 Me2Me Ag Voice interactive service system and method for providing different speech-based services
WO2011022854A1 (en) * 2009-08-26 2011-03-03 Me2Me Ag Voice interactive service system and method for providing different speech-based services
US20140146962A1 (en) * 2005-07-28 2014-05-29 At&T Intellectual Property I, L.P. Methods, systems, and computer program products for providing human-assisted natural language call routing
US20150170647A1 (en) * 2008-08-29 2015-06-18 Mmodal Ip Llc Distributed Speech Recognition Using One Way Communication
US9514747B1 (en) * 2013-08-28 2016-12-06 Amazon Technologies, Inc. Reducing speech recognition latency
US10388272B1 (en) 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US10573312B1 (en) 2018-12-04 2020-02-25 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US11017778B1 (en) 2018-12-04 2021-05-25 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US11170761B2 (en) 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems
US20220068280A1 (en) * 2020-09-03 2022-03-03 International Business Machines Corporation Speech-to-text auto-scaling for live use cases
US11488604B2 (en) 2020-08-19 2022-11-01 Sorenson Ip Holdings, Llc Transcription of audio
US11516039B2 (en) 2018-03-08 2022-11-29 Samsung Electronics Co., Ltd. Performance mode control method and electronic device supporting same

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119087A (en) * 1998-03-13 2000-09-12 Nuance Communications System architecture for and method of voice processing
US6345279B1 (en) * 1999-04-23 2002-02-05 International Business Machines Corporation Methods and apparatus for adapting multimedia content for client devices
US6505161B1 (en) * 2000-05-01 2003-01-07 Sprint Communications Company L.P. Speech recognition that adjusts automatically to input devices
US6542600B1 (en) * 1999-06-22 2003-04-01 At&T Corp. Method for improved resource management in a telecommunication application platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119087A (en) * 1998-03-13 2000-09-12 Nuance Communications System architecture for and method of voice processing
US6345279B1 (en) * 1999-04-23 2002-02-05 International Business Machines Corporation Methods and apparatus for adapting multimedia content for client devices
US6542600B1 (en) * 1999-06-22 2003-04-01 At&T Corp. Method for improved resource management in a telecommunication application platform
US6505161B1 (en) * 2000-05-01 2003-01-07 Sprint Communications Company L.P. Speech recognition that adjusts automatically to input devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Douglas A. Reynolds, Automatic Speaker Recognition Using Gaussian Mixture Speaker Models, vol. 8, No. 2, 1995, The Lincoln Laboratory Journal, Pp. 173-192.

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038659A1 (en) * 2001-11-29 2005-02-17 Marc Helbing Method of operating a barge-in dialogue system
US20050096906A1 (en) * 2002-11-06 2005-05-05 Ziv Barzilay Method and system for verifying and enabling user access based on voice parameters
US7054811B2 (en) 2002-11-06 2006-05-30 Cellmax Systems Ltd. Method and system for verifying and enabling user access based on voice parameters
US7720683B1 (en) * 2003-06-13 2010-05-18 Sensory, Inc. Method and apparatus of specifying and performing speech recognition operations
US9020129B2 (en) * 2005-07-28 2015-04-28 At&T Intellectual Property I, L.P. Methods, systems, and computer program products for providing human-assisted natural language call routing
US20140146962A1 (en) * 2005-07-28 2014-05-29 At&T Intellectual Property I, L.P. Methods, systems, and computer program products for providing human-assisted natural language call routing
US20080125748A1 (en) * 2006-09-25 2008-05-29 Medtronic Vascular, Inc. High Torque, Low Profile Catheters and Methods for Transluminal Interventions
US20090309698A1 (en) * 2008-06-11 2009-12-17 Paul Headley Single-Channel Multi-Factor Authentication
US8536976B2 (en) 2008-06-11 2013-09-17 Veritrix, Inc. Single-channel multi-factor authentication
US20100005296A1 (en) * 2008-07-02 2010-01-07 Paul Headley Systems and Methods for Controlling Access to Encrypted Data Stored on a Mobile Device
US8166297B2 (en) 2008-07-02 2012-04-24 Veritrix, Inc. Systems and methods for controlling access to encrypted data stored on a mobile device
US8555066B2 (en) 2008-07-02 2013-10-08 Veritrix, Inc. Systems and methods for controlling access to encrypted data stored on a mobile device
US9502033B2 (en) * 2008-08-29 2016-11-22 Mmodal Ip Llc Distributed speech recognition using one way communication
US20150170647A1 (en) * 2008-08-29 2015-06-18 Mmodal Ip Llc Distributed Speech Recognition Using One Way Communication
US8185646B2 (en) 2008-11-03 2012-05-22 Veritrix, Inc. User authentication for social networks
US20100115114A1 (en) * 2008-11-03 2010-05-06 Paul Headley User Authentication for Social Networks
US11620988B2 (en) 2009-06-09 2023-04-04 Nuance Communications, Inc. System and method for speech personalization by need
US10504505B2 (en) 2009-06-09 2019-12-10 Nuance Communications, Inc. System and method for speech personalization by need
US20100312556A1 (en) * 2009-06-09 2010-12-09 AT & T Intellectual Property I , L.P. System and method for speech personalization by need
US20180090129A1 (en) * 2009-06-09 2018-03-29 Nuance Communications, Inc. System and method for speech personalization by need
US9002713B2 (en) * 2009-06-09 2015-04-07 At&T Intellectual Property I, L.P. System and method for speech personalization by need
US9837071B2 (en) 2009-06-09 2017-12-05 Nuance Communications, Inc. System and method for speech personalization by need
WO2011022854A1 (en) * 2009-08-26 2011-03-03 Me2Me Ag Voice interactive service system and method for providing different speech-based services
US9350860B2 (en) 2009-08-26 2016-05-24 Swisscom Ag Voice interactive service system and method for providing different speech-based services
US8886542B2 (en) 2009-08-26 2014-11-11 Roger Lagadec Voice interactive service system and method for providing different speech-based services
US20110054905A1 (en) * 2009-08-26 2011-03-03 Me2Me Ag Voice interactive service system and method for providing different speech-based services
EP2293290A1 (en) 2009-08-26 2011-03-09 Swisscom AG Voice interactive service system and method for providing different speech-based services
US9514747B1 (en) * 2013-08-28 2016-12-06 Amazon Technologies, Inc. Reducing speech recognition latency
US11516039B2 (en) 2018-03-08 2022-11-29 Samsung Electronics Co., Ltd. Performance mode control method and electronic device supporting same
US11017778B1 (en) 2018-12-04 2021-05-25 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US10971153B2 (en) 2018-12-04 2021-04-06 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US10672383B1 (en) 2018-12-04 2020-06-02 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US20210233530A1 (en) * 2018-12-04 2021-07-29 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US11145312B2 (en) 2018-12-04 2021-10-12 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US11170761B2 (en) 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems
US10573312B1 (en) 2018-12-04 2020-02-25 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US11594221B2 (en) * 2018-12-04 2023-02-28 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US10388272B1 (en) 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US11935540B2 (en) 2018-12-04 2024-03-19 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US11488604B2 (en) 2020-08-19 2022-11-01 Sorenson Ip Holdings, Llc Transcription of audio
US20220068280A1 (en) * 2020-09-03 2022-03-03 International Business Machines Corporation Speech-to-text auto-scaling for live use cases
US11521617B2 (en) * 2020-09-03 2022-12-06 International Business Machines Corporation Speech-to-text auto-scaling for live use cases

Similar Documents

Publication Publication Date Title
US6728677B1 (en) Method and system for dynamically improving performance of speech recognition or other speech processing systems
US6804647B1 (en) Method and system for on-line unsupervised adaptation in speaker verification
US8442187B2 (en) Secure voice transaction method and system
US20180166070A1 (en) System and Method for Mobile Automatic Speech Recognition
US7398212B2 (en) System and method for quality of service management with a call handling system
US8700518B2 (en) System and method for trading financial instruments using speech
US7395212B2 (en) Online reactivation of an account or service
US8396715B2 (en) Confidence threshold tuning
US20040042592A1 (en) Method, system and apparatus for providing an adaptive persona in speech-based interactive voice response systems
US6629075B1 (en) Load-adjusted speech recogintion
WO2007056139A2 (en) Method and apparatus for speech processing
US11714601B1 (en) Conversational virtual assistant
US8954317B1 (en) Method and apparatus of processing user text input information
US6341264B1 (en) Adaptation system and method for E-commerce and V-commerce applications
US8311822B2 (en) Method and system of enabling intelligent and lightweight speech to text transcription through distributed environment
US7206387B2 (en) Resource allocation for voice processing applications
WO2019210556A1 (en) Call reservation method, agent leaving processing method and apparatus, device, and medium
US7191130B1 (en) Method and system for automatically optimizing recognition configuration parameters for speech recognition systems
US20100217603A1 (en) Method, System, and Apparatus for Enabling Adaptive Natural Language Processing
WO2020233318A1 (en) Data adjustment method based on data analysis and related devices
US8600757B2 (en) System and method of dynamically modifying a spoken dialog system to reduce hardware requirements
US20030014254A1 (en) Load-shared distribution of a speech system
AU2011349110B2 (en) Voice authentication system and methods
US8050930B2 (en) Telephone voice command enabled computer administration method and system
US20050069122A1 (en) System and method for operator assisted automated call handling

Legal Events

Date Code Title Description
AS Assignment

Owner name: NUANCE COMMUNICATIONS, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KANNAN, ASHVIN;MURVEIT, HY;LEGGETTER, CHRISTOPHER;AND OTHERS;REEL/FRAME:011795/0256;SIGNING DATES FROM 20010507 TO 20010509

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: USB AG, STAMFORD BRANCH,CONNECTICUT

Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:017435/0199

Effective date: 20060331

Owner name: USB AG, STAMFORD BRANCH, CONNECTICUT

Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:017435/0199

Effective date: 20060331

AS Assignment

Owner name: USB AG. STAMFORD BRANCH,CONNECTICUT

Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:018160/0909

Effective date: 20060331

Owner name: USB AG. STAMFORD BRANCH, CONNECTICUT

Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:018160/0909

Effective date: 20060331

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DEL

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: NUANCE COMMUNICATIONS, INC., AS GRANTOR, MASSACHUS

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: HUMAN CAPITAL RESOURCES, INC., A DELAWARE CORPORAT

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: NOKIA CORPORATION, AS GRANTOR, FINLAND

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTO

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTO

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: STRYKER LEIBINGER GMBH & CO., KG, AS GRANTOR, GERM

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPOR

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPOR

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: NUANCE COMMUNICATIONS, INC., AS GRANTOR, MASSACHUS

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: INSTITIT KATALIZA IMENI G.K. BORESKOVA SIBIRSKOGO

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPOR

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPOR

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: MITSUBISH DENKI KABUSHIKI KAISHA, AS GRANTOR, JAPA

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS

Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824

Effective date: 20160520

Owner name: NORTHROP GRUMMAN CORPORATION, A DELAWARE CORPORATI

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520

Owner name: ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DEL

Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869

Effective date: 20160520