US20020013707A1 - System for developing word-pronunciation pairs - Google Patents
System for developing word-pronunciation pairs Download PDFInfo
- Publication number
- US20020013707A1 US20020013707A1 US09/216,111 US21611198A US2002013707A1 US 20020013707 A1 US20020013707 A1 US 20020013707A1 US 21611198 A US21611198 A US 21611198A US 2002013707 A1 US2002013707 A1 US 2002013707A1
- Authority
- US
- United States
- Prior art keywords
- transcription
- phonetic
- spelled
- pronunciation
- word input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0638—Interactive procedures
Definitions
- the present invention relates generally to speech recognition and speech synthesis systems. More particularly, the invention relates to developing word-pronunciation pairs.
- Phonetic pronunciation dictionaries are available for most of the major languages, although these dictionaries typically have a limited word coverage and do not adequately handle proper names, unusual and compound nouns, or foreign words. Publicly available dictionaries likewise fall short when used to obtain pronunciations for a dialect different from the one for which the system was trained or intended.
- the present invention provides a system and method for developing word-pronunciation pairs for use in a pronunciation dictionary.
- the invention provides a tool, which builds upon a window environment to provide a user-friendly methodology for defining, manipulating and storing the phonetic representation of word-pronunciation pairs in a pronunciation dictionary.
- the invention requires no specific linguistic or phonetic knowledge to produce the pronunciation lexicon. It utilizes various techniques to quickly provide the best phonetic representation of a given word along with different means for “fine tuning” this phonetic representation to achieve the desired pronunciation. Immediate feedback to validate word-pronunciation pairs is also provided by incorporating a text-to-speech synthesizer.
- Applications will quickly become apparent as developments expand in areas where exceptions to the rules of pronunciation are common, such as streets, cities, proper names and other specialized terminology.
- FIG. 1 is a block diagram illustrating the system and method of the present invention
- FIG. 2 illustrates an editing tool useful in implementing a system in accordance with the present invention
- FIG. 3 is a block diagram illustrating the presently preferred phoneticizer using decision trees
- FIG. 4 is a tree diagram illustrating a letter-only tree used in relation to the phoneticizer
- FIG. 5 is a tree diagram illustrating a mixed tree in accordance with the present invention.
- FIG. 6 is a block diagram illustrating a system for generating decision trees in accordance with the present invention.
- FIG. 7 is a flowchart showing a method for generating training data through an alignment process in accordance with the present invention.
- a word-pronunciation editor 10 for developing word-pronunciation pairs is depicted in FIG. 1.
- the editor 10 uses spelled word input 12 to develop word-pronunciation pairs that are in turn entered into a lexicon 14 .
- the lexicon 14 of the present invention is a word-pronunciation dictionary comprised of ordered pairs of words and one or more associated phonetic transcriptions. As will be more fully explained, the lexicon 14 can be updated by adding word-pronunciation pairs or by revising pronunciations of existing word-pronunciation pairs.
- a transcription generator 20 receives as input the spelled word 12 .
- spelled words 12 are entered via a keyboard, although spelled words may be input through any convenient means, including by voice entry or data file.
- the transcription generator 20 may be configured in a variety of different ways depending on the system requirements. In a first preferred embodiment of the present invention, transcription generator 20 accesses a baseline dictionary 22 or conventional letter-to-sound rules to produce a suggested phonetic transcription 23 .
- the former represent words in terms of the phonemes in human speech when the word is spoken, whereas the latter represents an atomic unit (called morphs) from which larger words are made.
- morphs an atomic unit
- a compound word such as “catwalk” may be treated morphemically as comprising the atomic units “cat” and “walk”.
- the transcription generator 20 may also include a morphemic component.
- an initial phonetic transcription of the spelled word 12 is derived through a lookup in the baseline dictionary 22 .
- conventional letter-to-sound rules are used to generate an initial phonetic transcription.
- a phoneticizer 24 may provide additional suggested pronunciations for the spelled word 12 .
- the phoneticizer 24 generates a list of suggested phonetic transcriptions 26 based on the spelled word input using a set of decision trees. Details of a suitable phoneticizer are provided below.
- Each transcription in the suggested list 26 has a numeric value by which it can be compared with other transcriptions in the suggested list 26 .
- these numeric scores are the byproduct of the transcription generation mechanism.
- each phonetic transcription has associated with it a confidence level score.
- This confidence level score represents the cumulative score of the individual probabilities associated with each phoneme.
- the leaf nodes of each decision tree in the phoneticizer 24 are populated with phonemes and their associated probabilities. These probabilities are numerically represented and can be used to generate a confidence level score.
- these confidence level scores are generally not displayed to the user, they are used to order the displayed list of n-best suggested transcriptions 26 as provided by the phoneticizer 24 .
- a user selection mechanism 28 allows the user to select a pronunciation from the list of suggested transcriptions 26 that matches the desired pronunciation.
- An automatic speech recognizer 30 is incorporated into the editor 10 for aiding the user in quickly selecting the desired pronunciation from the list of suggested transcriptions 26 .
- the speech recognizer 30 may be used to reorder the list of suggested transcriptions 26 .
- the speech recognizer 30 extracts phonetic information from a speech input signal 32 , which corresponds to the spelled word input 12 .
- Suitable sources of speech include: live human speech, audio recordings, speech databases, and speech synthesizers.
- the speech recognizer 30 uses the speech signal 32 to reorder the list of suggested transcriptions 26 , such that the transcription which most closely corresponds to the speech input signal 32 is placed at the top of the list of suggested transcriptions 26 .
- a graphical user interface 40 is the tool by which a user selects and manipulates the phonetic transcriptions provided by the transcription generator 20 and the phoneticizer 24 .
- the spelled word input 12 is placed into a spelling field 42 .
- a phonetic transcription of the word 12 is provided by the baseline dictionary 22 , then its corresponding phonetic representation defaults into the phonemes field 48 ; otherwise, conventional letter-to-sound rules are used to populate the phonemes field 48 .
- the phonemic transcription displayed in the phonemes field 48 is hyphenated to demark the syllables which make up the word. In this way, a user can directly edit the individual syllables of the phoneme transcription in the phonemes field 48 .
- the spelled word input 12 may be selected from a word list 44 as provided by a word source file (e.g., a dictionary source). Highlighting any word in the word list 44 places that word in the spelling field 42 and its corresponding phonetic transcription in the phonemes field 48 .
- a list of n-best suggested phonetic transcriptions 26 is generated by phoneticizer 24 based upon the spelled word input 12 . If the pronunciation in the phonemes field 48 is unsatisfactory, then the user preferably selects one of these phonetic transcriptions (that closely matches the desired pronunciation) to populate the phonemes field 48 .
- desired word input may be spoken by the user. This speech input is converted into a spelled word by the speech recognizer 30 which is in turn translated into a phonetic transcription as described above.
- the user can specify in the language selection box 46 an operative language for the word-pronunciation editor 10 .
- the editor 10 automatically functions in a mode that corresponds to the selected language.
- the transcription generator 20 will access a dictionary that corresponds to the selected language, thereby displaying a phonetic transcription for the word input 12 in the selected language.
- the phoneticizer 24 , the speech recognizer 30 and the text-to-speech synthesizer 36 may also need to access input files and/or training data that correspond to the selected language.
- the user language selection may also alter the appearance of the user interface. In this way, the editor 10 facilitates the development of word-pronunciation pairs in the users native language.
- the word-pronunciation editor 10 provides various means for manipulating syllabic portions of the phonetic transcription displayed in the phonemes field 48 .
- a phonemic editor 34 (as shown in FIG. 1) provides the user a number of options for modifying an individual syllable of the phonetic transcription. For instance, stress (or emphasis) buttons 50 line up underneath the syllables in phonemes field 48 . In this way, the user can select these buttons 50 to alter the stress applied to the syllable, thereby modifying the pronunciation of the word. Most often mispronunciation is a factor of the wrong vowel being used in a syllable. The user can also use the vowel step through button 52 and/or the vowel table list 54 to select different vowels to substitute for those appearing in the selected syllable of the phonemes field 48 .
- the phonemic editor 34 the user speaks an individual syllable into a microphone (not shown) and the original text spelling that corresponds to its pronunciation is provided in the sounds like field 56 .
- a corresponding phonemic representation of the speech input also replaces this selected syllable in the phonetic transcription. It should be noted that the speech input corresponding to an individual syllable is first translated into the corresponding text spelling by the speech recognizer 30 . The phonemic editor 34 then converts this text spelling into the corresponding phonemic representation.
- one or more selected syllabic portions of the pronunciation may be replaced with a word known to sound similar to the desired pronunciation.
- the phonemic editor 38 presents the user with a menu of words based on the spoken vowel sounds and the user selects the word that corresponds to the desired vowel pronunciation of the syllable. If during the editing process the user becomes dissatisfied with the pronunciation displayed in the phonemes field 48 , then the phonetic transcription can be reset to its original state by selecting the reset button 56 .
- a speaker icon 58 By clicking on a speaker icon 58 , the user may also test the current pronunciation displayed in the phonemes field 48 .
- a text-to-speech synthesizer 36 generates audible speech data 37 from the current pronunciation found in the phonemes field 48 .
- Generating audible speech data from a phonetic transcription is well known to one skilled in the art.
- a storage mechanism 38 can be initiated (via the save button 60 ) to update the desired word-pronunciation pair in lexicon 14 .
- FIG. 3 An exemplary embodiment of phoneticizer 24 is shown in FIG. 3 to illustrate the principles of generating multiple pronunciations based on the spelled form of a word.
- most attempts at spelled word-to-pronunciation techniques transcription have relied solely upon the letters themselves.
- letter-only pronunciation generators yield satisfactory results; for others (particularly English), the results may be unsatisfactory.
- a letter-only pronunciation generator would have great difficulty properly pronouncing the word bible.
- the letter-only system would likely pronounce the word “BIB-L”, much as a grade school child learning to read might do.
- the fault in conventional systems lies in the inherent ambiguity imposed by the pronunciation rules of many languages.
- the English language for example, has hundreds of different of pronunciation rules making it difficult and computationally expensive to approach the problem on a word-by-word basis.
- the presently preferred phoneticizer 24 is a pronunciation generator employing two stages, the first stage employing a set of letter-only decision trees 72 and the second, optional stage, employing a set of mixed-decision trees 74 .
- the first stage employing a set of letter-only decision trees 72
- the second, optional stage employing a set of mixed-decision trees 74 .
- An input sequence 76 such as the sequence of letters B-I-B-L-E, is fed to a dynamic programming phoneme sequence generator 78 .
- the sequence generator 78 uses the letter-only trees 72 to generate a list of pronunciations 80 , representing possible pronunciation candidates of the spelled word input sequence.
- the sequence generator 78 sequentially examines each letter in the sequence, applying the decision tree associated with that letter to select a phoneme pronunciation for that letter based on probability data contained in the letter-only tree.
- the set of letter-only decision trees includes a decision tree for each a letter in the alphabet.
- FIG. 4 shows an example of a letter-only decision tree for the letter E.
- the decision tree comprises a plurality of internal nodes (illustrated as ovals in the Figure), and a plurality of leaf nodes (illustrated as rectangles in the Figure).
- Each internal node is populated with a yes-no question. Yes-no questions are questions that can be answered either yes or no. In the letter-only tree these questions are directed to the given letter (in this case the letter E), and its neighboring letters in the input sequence. Note in FIG. 4 that each internal node branches either left or right, depending on whether the answer to the associated question is yes or no.
- the leaf nodes are populated with probability data that associate possible phoneme pronunciations with numeric values representing the probability that the particular phoneme represents the correct pronunciation of the given letter. For example, the notation “iy 0.51” means “the probability of phoneme ‘iy’ in this leaf is 0.51.”
- the null phoneme, i.e., silence, is represented by the symbol ‘ ⁇ ’.
- the sequence generator 78 uses the letter-only decision trees 72 to construct one or more pronunciation hypotheses that are stored in list 80 .
- each pronunciation has associated with it a numerical score arrived at by combining the probability scores of the individual phonemes selected using the decision tree 72 .
- Word pronunciations may be scored by constructing a matrix of possible combinations and then using dynamic programming to select the n-best candidates.
- the n-best candidates may be selected using a substitution technique that first identifies the most probable transcription candidate and then generates additional candidates through iterative substitution as follows:
- the pronunciation with the highest probability score is selected first by multiplying the respective scores of the highest-scoring phonemes (identified by examining the leaf nodes), and then using this selection as the most probably candidate or first-best word candidate. Additional (n-best) candidates are then selected, by examining the phoneme data in the leaf nodes again to identify the phoneme not previously selected, that has the smallest difference from an initially selected phoneme. This minimally-different phoneme is then substituted for the initially selected one to thereby generate the second-best word candidate. The above process may be repeated iteratively until the desired number of n-best candidates have been selected. List 80 may be sorted in descending score order so that the pronunciation judged the best by the letter-only analysis appears first in the list.
- a letter-only analysis will frequently produce poor results. This is because the letter-only analysis has no way of determining at each letter what phoneme will be generated by subsequent letters. Thus, a letter-only analysis can generate a high scoring pronunciation that actually would not occur in natural speech. For example, the proper name, Achilles, would likely result in a pronunciation that phoneticizes both “ll's”:ah-k-ih-l-l-iy-z. In natural speech, the second “l” is actually silent: ah-k-ih-l-iy-z.
- the sequence generator using letter-only trees has no mechanism to screen out word pronunciations that would never occur in natural speech.
- a mixed-tree score estimator 82 uses the set of mixed-decision trees 74 to assess the viability of each pronunciation in list 80 .
- the score estimator works by sequentially examining each letter in the input sequence along with the phonemes assigned to each letter by sequence generator 78 .
- the set of mixed trees has a mixed tree for each letter of the alphabet.
- An exemplary mixed tree is shown in FIG. 5.
- the mixed tree has internal nodes and leaf nodes. The internal nodes are illustrated as ovals and the leaf nodes as rectangles in FIG. 5. The internal nodes are each populated with a yes-no question and the leaf nodes are each populated with probability data.
- the internal nodes of the mixed tree can contain two different classes of questions.
- An internal node can contain a question about a given letter and its neighboring letters in the sequence, or it can contain a question about the phoneme associated with that letter and neighboring phonemes corresponding to that sequence.
- the decision tree is thus mixed, in that it contains mixed classes of questions.
- the abbreviations used in FIG. 5 are similar to those used in FIG. 4, with some additional abbreviations.
- the symbol L represents a question about a letter and its neighboring letters.
- the symbol P represents a question about a phoneme and its neighboring phonemes.
- the abbreviations CONS and SYL are phoneme classes, namely consonant and syllabic.
- the numbers in the leaf nodes give phoneme probabilities as they did in the letter-only trees.
- the mixed-tree score estimator rescores each of the pronunciations in list 80 based on the mixed-tree questions and using the probability data in the lead nodes of the mixed trees. If desired, the list of pronunciations may be stored in association with the respective score as in list 84 . If desired, list 84 can be sorted in descending order so that the first listed pronunciation is the one with the highest score.
- the pronunciation occupying the highest score position in list 80 will be different from the pronunciation occupying the highest score position in list 84 . This occurs because the mixed-tree score estimator, using the mixed trees 74 , screens out those pronunciations that do not contain self-consistent phoneme sequences or otherwise represent pronunciations that would not occur in natural speech.
- the system for generating the letter-only trees and the mixed trees is illustrated in FIG. 6.
- the tree generator 120 employs a tree-growing algorithm that operates upon a predetermined set of training data 122 supplied by the developer of the system.
- the training data 122 comprise aligned letter, phoneme pairs that correspond to known proper pronunciations of words.
- the training data 122 may be generated through the alignment process illustrated in FIG. 7.
- FIG. 7 illustrates an alignment process being performed on an exemplary word BIBLE.
- the spelled word 124 and its pronunciation 126 are fed to a dynamic programming alignment module 128 which aligns the letters of the spelled word with the phonemes of the corresponding pronunciation. Note in the illustrated example the final E is silent.
- the letter phoneme pairs are then stored as data 122 .
- the tree generator 120 works in conjunction with three additional components: a set of possible yes-no questions 130 , a set of rules 132 for selecting the best questions for each node or for deciding if the node should be a lead node, and a pruning method 133 to prevent over-training.
- the set of possible yes-no questions may include letter questions 134 and phoneme questions 136 , depending on whether a letter-only tree or a mixed tree is being grown. When growing a letter-only tree, only letter questions 134 are used; when growing a mixed tree both letter questions 134 and phoneme questions 136 are used.
- the rules for selecting the best question to populate at each node in the presently preferred embodiment are designed to follow the Gini criterion.
- Other splitting criteria can be used instead.
- the Gini criterion is used to select a question from the set of possible yes-no questions 130 and to employ a stopping rule that decides when a node is a leaf node.
- the Gini criterion employs a concept called “impurity.” Impurity is always a non-negative number.
- Gini impurity may be defined as follows. If C is the set of classes to which data items can belong, and T is the current tree node, let f(1
- i ⁇ ( T ) ⁇ j , k ⁇ C , j ⁇ ⁇ _ ⁇ ⁇ k ⁇ f ⁇ ( j
- T ) 1 ⁇ ⁇ j ⁇ [ f ⁇ ( j
- the items that answer “yes” to Q 1 include four examples of “iy” and one example of “ ⁇ ” (the other five items answer “no” to Q 1 .)
- the items that answer “yes” to Q 2 include three examples of “iy” and three examples of “eh” (the other four items answer “no” to Q 2 ).
- FIG. 6 diagrammatically compares these two cases.
- the Gini criterion answers which question the system should choose for this node, Q 1 or Q 2 .
- the Gini criterion for choosing the correct question is: find the question in which the drop in impurity in going from parent nodes to children nodes is maximized.
- the rule set 132 declares a best question for a node to be that question which brings about the greatest drop in impurity in going from the parent node to its children.
- the tree generator applies the rules 132 to grow a decision tree of yes-no questions selected from set 130 .
- the generator will continue to grow the tree until the optimal-sized tree has been grown.
- Rules 132 include a set of stopping rules that will terminate tree growth when the tree is grown to a pre-determined size. In the preferred embodiment the tree is grown to a size larger than ultimately desired.
- pruning methods 133 are used to cut back the tree to its desired size.
- the pruning method may implement the Breiman technique as described in the reference cited above.
- the tree generator thus generates sets of letter-only trees, shown generally at 140 or mixed trees, shown generally at 150 , depending on whether the set of possible yes-no questions 130 includes letter-only questions alone or in combination with phoneme questions.
- the corpus of training data 122 comprises letter, phoneme pairs, as discussed above. In growing letter-only trees, only the letter portions of these pairs are used in populating the internal nodes. Conversely, when growing mixed trees, both the letter and phoneme components of the training data pairs may be used to populate internal nodes. In both instances the phoneme portions of the pairs are used to populate the leaf nodes. Probability data associated with the phoneme data in the lead nodes are generated by counting the number of occurrences a given phoneme is aligned with a given letter over the training data corpus.
- the editor 10 is adaptive or self-learning.
- One or more spelled word-pronunciation pairs are used to update lexicon 14 as well as to supply new training data upon which the phoneticizer 24 may be retrained or updated. This can be accomplished by using the word-pronunciation pairs as new training data 122 for generating revised decision trees in accordance with the above-described method. Therefore, the self-learning embodiment improves its phonetic transcription generation over time, resulting in even higher quality transcriptions.
Abstract
An editing tool is provided for developing word-pronunciation pairs based on a spelled word input. The editing tool includes a transcription generator that receives the spelled word input from the user and generates a list of suggested phonetic transcriptions. The editor displays the list of suggested phonetic transcriptions to the user and provides a mechanism for selecting the desired pronunciation from the list of suggested phonetic transcriptions. The editing tool further includes a speech recognizer to aid the user in selecting the desired pronunciation from the list of suggested phonetic transcriptions based on speech data input that corresponds to the spelled word input, and a syllable editor that enables the user to manipulate a syllabic part of a selected pronunciation. Lastly, the desired pronunciation can be tested at any point through the use of a text-to-speech synthesizer that generates audible speech data for the selected phonetic transcription.
Description
- The present invention relates generally to speech recognition and speech synthesis systems. More particularly, the invention relates to developing word-pronunciation pairs.
- Computer-implemented and automated speech technology today involves a confluence of many areas of expertise, ranging from linguistics and psychoacoustics, to digital signal processing and computer science. The traditionally separate problems of text-to-speech (TTS) synthesis and automatic speech recognition (ASR) actually present many opportunities to share technology. Traditionally, however, speech recognition and speech synthesis has been addressed as entirely separate disciplines, relying very little on the benefits that cross-pollination could have on both disciplines.
- We have discovered techniques, described in this document for combining speech recognition and speech synthesis technologies to the mutual advantage of both disciplines in generating pronunciation dictionaries. Having a good pronunciation dictionary is key to both text-to-speech and automatic speech recognition applications. In the case of text-to-speech, the dictionary serves as the source of pronunciation for words entered by graphemic or spelled input. In automatic speech recognition applications, the dictionary serves as the lexicon of words that are known by the system. When training the speech recognition system, this lexicon identifies how each word is phonetically spelled, so that the speech models may be properly trained for each of the words
- In both speech synthesis and speech recognition applications, the quality and performance of the application may be highly dependent on the accuracy of the pronunciation dictionary. Typically, it is expensive and time consuming to develop a good pronunciation dictionary, because the only way to obtain accurate data has heretofore been through use of professional linguists, preferably a single one to guarantee consistency. The linguist painstakingly steps through each word and provides its phonetic transcription.
- Phonetic pronunciation dictionaries are available for most of the major languages, although these dictionaries typically have a limited word coverage and do not adequately handle proper names, unusual and compound nouns, or foreign words. Publicly available dictionaries likewise fall short when used to obtain pronunciations for a dialect different from the one for which the system was trained or intended.
- Currently available dictionaries also rarely match all of the requirements of a given system. Some systems (such as text-to-speech systems) need high accuracy; whereas other systems (such as some automatic speech recognition systems) can tolerate lower accuracy, but may require multiple valid pronunciations for each word. In general, the diversity in system requirements compounds the problem. Because there is no “one size fits all” pronunciation dictionary, the construction of good, application-specific dictionaries remains expensive.
- The present invention provides a system and method for developing word-pronunciation pairs for use in a pronunciation dictionary. The invention provides a tool, which builds upon a window environment to provide a user-friendly methodology for defining, manipulating and storing the phonetic representation of word-pronunciation pairs in a pronunciation dictionary. Unlike other phonetic transcription tools, the invention requires no specific linguistic or phonetic knowledge to produce the pronunciation lexicon. It utilizes various techniques to quickly provide the best phonetic representation of a given word along with different means for “fine tuning” this phonetic representation to achieve the desired pronunciation. Immediate feedback to validate word-pronunciation pairs is also provided by incorporating a text-to-speech synthesizer. Applications will quickly become apparent as developments expand in areas where exceptions to the rules of pronunciation are common, such as streets, cities, proper names and other specialized terminology.
- For a more complete understanding of the invention, its objects and advantages refer to the following specification and to the accompanying drawings.
- FIG. 1 is a block diagram illustrating the system and method of the present invention;
- FIG. 2 illustrates an editing tool useful in implementing a system in accordance with the present invention;
- FIG. 3 is a block diagram illustrating the presently preferred phoneticizer using decision trees;
- FIG. 4 is a tree diagram illustrating a letter-only tree used in relation to the phoneticizer;
- FIG. 5 is a tree diagram illustrating a mixed tree in accordance with the present invention;
- FIG. 6 is a block diagram illustrating a system for generating decision trees in accordance with the present invention; and
- FIG. 7 is a flowchart showing a method for generating training data through an alignment process in accordance with the present invention.
- A word-pronunciation editor10 for developing word-pronunciation pairs is depicted in FIG. 1. The editor 10 uses spelled word input 12 to develop word-pronunciation pairs that are in turn entered into a
lexicon 14. Thelexicon 14 of the present invention is a word-pronunciation dictionary comprised of ordered pairs of words and one or more associated phonetic transcriptions. As will be more fully explained, thelexicon 14 can be updated by adding word-pronunciation pairs or by revising pronunciations of existing word-pronunciation pairs. - A
transcription generator 20 receives as input the spelled word 12. For illustration purposes it will be assumed that spelled words 12 are entered via a keyboard, although spelled words may be input through any convenient means, including by voice entry or data file. Thetranscription generator 20 may be configured in a variety of different ways depending on the system requirements. In a first preferred embodiment of the present invention,transcription generator 20 accesses abaseline dictionary 22 or conventional letter-to-sound rules to produce a suggestedphonetic transcription 23. - In the description presented here, a distinction is made between phonetic transcriptions and morpheme transcriptions. The former represent words in terms of the phonemes in human speech when the word is spoken, whereas the latter represents an atomic unit (called morphs) from which larger words are made. For instance, a compound word such as “catwalk” may be treated morphemically as comprising the atomic units “cat” and “walk”. In an alternative embodiment, the
transcription generator 20 may also include a morphemic component. - In operation, an initial phonetic transcription of the spelled word12 is derived through a lookup in the
baseline dictionary 22. In the event no pronunciation is found for the spelled word, conventional letter-to-sound rules are used to generate an initial phonetic transcription. If the resulting pronunciation is unsatisfactory to the user, aphoneticizer 24 may provide additional suggested pronunciations for the spelled word 12. Thephoneticizer 24 generates a list of suggestedphonetic transcriptions 26 based on the spelled word input using a set of decision trees. Details of a suitable phoneticizer are provided below. - Each transcription in the suggested
list 26 has a numeric value by which it can be compared with other transcriptions in the suggestedlist 26. Typically, these numeric scores are the byproduct of the transcription generation mechanism. For example, when the decision tree-basedphoneticizer 24 is used, each phonetic transcription has associated with it a confidence level score. This confidence level score represents the cumulative score of the individual probabilities associated with each phoneme. As the reader will see from the description below, the leaf nodes of each decision tree in thephoneticizer 24 are populated with phonemes and their associated probabilities. These probabilities are numerically represented and can be used to generate a confidence level score. Although these confidence level scores are generally not displayed to the user, they are used to order the displayed list of n-best suggestedtranscriptions 26 as provided by thephoneticizer 24. - A
user selection mechanism 28 allows the user to select a pronunciation from the list of suggestedtranscriptions 26 that matches the desired pronunciation. - An
automatic speech recognizer 30 is incorporated into the editor 10 for aiding the user in quickly selecting the desired pronunciation from the list of suggestedtranscriptions 26. By using the confidence level score associated with each of the suggested transcriptions, thespeech recognizer 30 may be used to reorder the list of suggestedtranscriptions 26. The speech recognizer 30 extracts phonetic information from aspeech input signal 32, which corresponds to the spelled word input 12. Suitable sources of speech include: live human speech, audio recordings, speech databases, and speech synthesizers. Thespeech recognizer 30 then uses thespeech signal 32 to reorder the list of suggestedtranscriptions 26, such that the transcription which most closely corresponds to thespeech input signal 32 is placed at the top of the list of suggestedtranscriptions 26. - As shown in FIG. 2, a
graphical user interface 40 is the tool by which a user selects and manipulates the phonetic transcriptions provided by thetranscription generator 20 and thephoneticizer 24. Initially, the spelled word input 12 is placed into aspelling field 42. If a phonetic transcription of the word 12 is provided by thebaseline dictionary 22, then its corresponding phonetic representation defaults into the phonemes field 48; otherwise, conventional letter-to-sound rules are used to populate the phonemes field 48. The phonemic transcription displayed in the phonemes field 48 is hyphenated to demark the syllables which make up the word. In this way, a user can directly edit the individual syllables of the phoneme transcription in the phonemes field 48. - Alternatively, the spelled word input12 may be selected from a
word list 44 as provided by a word source file (e.g., a dictionary source). Highlighting any word in theword list 44 places that word in thespelling field 42 and its corresponding phonetic transcription in the phonemes field 48. As previously discussed, a list of n-best suggestedphonetic transcriptions 26 is generated byphoneticizer 24 based upon the spelled word input 12. If the pronunciation in the phonemes field 48 is unsatisfactory, then the user preferably selects one of these phonetic transcriptions (that closely matches the desired pronunciation) to populate the phonemes field 48. Lastly, it is also envisioned that desired word input may be spoken by the user. This speech input is converted into a spelled word by thespeech recognizer 30 which is in turn translated into a phonetic transcription as described above. - At any time, the user can specify in the
language selection box 46 an operative language for the word-pronunciation editor 10. In response, the editor 10 automatically functions in a mode that corresponds to the selected language. For instance, thetranscription generator 20 will access a dictionary that corresponds to the selected language, thereby displaying a phonetic transcription for the word input 12 in the selected language. To function properly, thephoneticizer 24, thespeech recognizer 30 and the text-to-speech synthesizer 36 may also need to access input files and/or training data that correspond to the selected language. It is also envisioned that the user language selection may also alter the appearance of the user interface. In this way, the editor 10 facilitates the development of word-pronunciation pairs in the users native language. - Regardless of the language selection, the word-pronunciation editor10 provides various means for manipulating syllabic portions of the phonetic transcription displayed in the phonemes field 48. A phonemic editor 34 (as shown in FIG. 1) provides the user a number of options for modifying an individual syllable of the phonetic transcription. For instance, stress (or emphasis)
buttons 50 line up underneath the syllables in phonemes field 48. In this way, the user can select thesebuttons 50 to alter the stress applied to the syllable, thereby modifying the pronunciation of the word. Most often mispronunciation is a factor of the wrong vowel being used in a syllable. The user can also use the vowel step throughbutton 52 and/or thevowel table list 54 to select different vowels to substitute for those appearing in the selected syllable of the phonemes field 48. - In one embodiment of the
phonemic editor 34, the user speaks an individual syllable into a microphone (not shown) and the original text spelling that corresponds to its pronunciation is provided in the sounds like field 56. When the user has selected a particular syllable of the phonetic transcription in the phonemes field 48, then a corresponding phonemic representation of the speech input also replaces this selected syllable in the phonetic transcription. It should be noted that the speech input corresponding to an individual syllable is first translated into the corresponding text spelling by thespeech recognizer 30. Thephonemic editor 34 then converts this text spelling into the corresponding phonemic representation. In this way, one or more selected syllabic portions of the pronunciation may be replaced with a word known to sound similar to the desired pronunciation. Alternatively, thephonemic editor 38 presents the user with a menu of words based on the spoken vowel sounds and the user selects the word that corresponds to the desired vowel pronunciation of the syllable. If during the editing process the user becomes dissatisfied with the pronunciation displayed in the phonemes field 48, then the phonetic transcription can be reset to its original state by selecting the reset button 56. - By clicking on a
speaker icon 58, the user may also test the current pronunciation displayed in the phonemes field 48. Returning to FIG. 1, a text-to-speech synthesizer 36 generatesaudible speech data 37 from the current pronunciation found in the phonemes field 48. Generating audible speech data from a phonetic transcription is well known to one skilled in the art. Once the user has completed editing the phonetic transcription, astorage mechanism 38 can be initiated (via the save button 60) to update the desired word-pronunciation pair inlexicon 14. - Phoneticizer
- An exemplary embodiment of
phoneticizer 24 is shown in FIG. 3 to illustrate the principles of generating multiple pronunciations based on the spelled form of a word. Heretofore, most attempts at spelled word-to-pronunciation techniques transcription have relied solely upon the letters themselves. For some languages, letter-only pronunciation generators yield satisfactory results; for others (particularly English), the results may be unsatisfactory. For example, a letter-only pronunciation generator would have great difficulty properly pronouncing the word bible. Based on the sequence of letters only, the letter-only system would likely pronounce the word “BIB-L”, much as a grade school child learning to read might do. The fault in conventional systems lies in the inherent ambiguity imposed by the pronunciation rules of many languages. The English language, for example, has hundreds of different of pronunciation rules making it difficult and computationally expensive to approach the problem on a word-by-word basis. - Therefore, the presently preferred
phoneticizer 24 is a pronunciation generator employing two stages, the first stage employing a set of letter-only decision trees 72 and the second, optional stage, employing a set of mixed-decision trees 74. Depending on the language and the application, we may implement only the first stage (taking as output the pronunciations shown at 80), or implement both stages and take the pronunciations output at 84. Aninput sequence 76, such as the sequence of letters B-I-B-L-E, is fed to a dynamic programmingphoneme sequence generator 78. Thesequence generator 78 uses the letter-onlytrees 72 to generate a list ofpronunciations 80, representing possible pronunciation candidates of the spelled word input sequence. - The
sequence generator 78 sequentially examines each letter in the sequence, applying the decision tree associated with that letter to select a phoneme pronunciation for that letter based on probability data contained in the letter-only tree. Preferably, the set of letter-only decision trees includes a decision tree for each a letter in the alphabet. FIG. 4 shows an example of a letter-only decision tree for the letter E. The decision tree comprises a plurality of internal nodes (illustrated as ovals in the Figure), and a plurality of leaf nodes (illustrated as rectangles in the Figure). Each internal node is populated with a yes-no question. Yes-no questions are questions that can be answered either yes or no. In the letter-only tree these questions are directed to the given letter (in this case the letter E), and its neighboring letters in the input sequence. Note in FIG. 4 that each internal node branches either left or right, depending on whether the answer to the associated question is yes or no. - Abbreviations are used in FIG. 4 as follows: numbers in questions, such as “+1” or “−1” refer to positions in the spelling relative to the current letter. For example, “+1L==‘R’?” means “Is the letter after the current letter (which, in this case, is the letter E) an R?” The abbreviations CONS and VOW represent classes of letters, namely consonants and vowels. The absence of a neighboring letter, or null letter, is represented by the symbol −, which is used as a filler or placeholder where aligning certain letters with corresponding phoneme pronunciations. The symbol # denotes a word boundary.
- The leaf nodes are populated with probability data that associate possible phoneme pronunciations with numeric values representing the probability that the particular phoneme represents the correct pronunciation of the given letter. For example, the notation “iy0.51” means “the probability of phoneme ‘iy’ in this leaf is 0.51.” The null phoneme, i.e., silence, is represented by the symbol ‘−’.
- The sequence generator78 (FIG. 3) thus uses the letter-
only decision trees 72 to construct one or more pronunciation hypotheses that are stored inlist 80. Preferably, each pronunciation has associated with it a numerical score arrived at by combining the probability scores of the individual phonemes selected using thedecision tree 72. Word pronunciations may be scored by constructing a matrix of possible combinations and then using dynamic programming to select the n-best candidates. Alternatively, the n-best candidates may be selected using a substitution technique that first identifies the most probable transcription candidate and then generates additional candidates through iterative substitution as follows: - The pronunciation with the highest probability score is selected first by multiplying the respective scores of the highest-scoring phonemes (identified by examining the leaf nodes), and then using this selection as the most probably candidate or first-best word candidate. Additional (n-best) candidates are then selected, by examining the phoneme data in the leaf nodes again to identify the phoneme not previously selected, that has the smallest difference from an initially selected phoneme. This minimally-different phoneme is then substituted for the initially selected one to thereby generate the second-best word candidate. The above process may be repeated iteratively until the desired number of n-best candidates have been selected.
List 80 may be sorted in descending score order so that the pronunciation judged the best by the letter-only analysis appears first in the list. - As noted above, a letter-only analysis will frequently produce poor results. This is because the letter-only analysis has no way of determining at each letter what phoneme will be generated by subsequent letters. Thus, a letter-only analysis can generate a high scoring pronunciation that actually would not occur in natural speech. For example, the proper name, Achilles, would likely result in a pronunciation that phoneticizes both “ll's”:ah-k-ih-l-l-iy-z. In natural speech, the second “l” is actually silent: ah-k-ih-l-iy-z. The sequence generator using letter-only trees has no mechanism to screen out word pronunciations that would never occur in natural speech.
- The second stage of the
phoneticizer 24 addresses the above problem. A mixed-tree score estimator 82 uses the set of mixed-decision trees 74 to assess the viability of each pronunciation inlist 80. The score estimator works by sequentially examining each letter in the input sequence along with the phonemes assigned to each letter bysequence generator 78. Like the set of letter-only trees, the set of mixed trees has a mixed tree for each letter of the alphabet. An exemplary mixed tree is shown in FIG. 5. Like the letter-only tree, the mixed tree has internal nodes and leaf nodes. The internal nodes are illustrated as ovals and the leaf nodes as rectangles in FIG. 5. The internal nodes are each populated with a yes-no question and the leaf nodes are each populated with probability data. Although the tree structure of the mixed tree resembles that of the letter-only tree, there is one important difference. The internal nodes of the mixed tree can contain two different classes of questions. An internal node can contain a question about a given letter and its neighboring letters in the sequence, or it can contain a question about the phoneme associated with that letter and neighboring phonemes corresponding to that sequence. The decision tree is thus mixed, in that it contains mixed classes of questions. - The abbreviations used in FIG. 5 are similar to those used in FIG. 4, with some additional abbreviations. The symbol L represents a question about a letter and its neighboring letters. The symbol P represents a question about a phoneme and its neighboring phonemes. For example, the question “+1L==‘D’?” means “Is the letter in the +1 position a ‘D’?” The abbreviations CONS and SYL are phoneme classes, namely consonant and syllabic. For example, the question “+1P==CONS?” means “Is the phoneme in the +1 position a consonant?” The numbers in the leaf nodes give phoneme probabilities as they did in the letter-only trees.
- The mixed-tree score estimator rescores each of the pronunciations in
list 80 based on the mixed-tree questions and using the probability data in the lead nodes of the mixed trees. If desired, the list of pronunciations may be stored in association with the respective score as inlist 84. If desired,list 84 can be sorted in descending order so that the first listed pronunciation is the one with the highest score. - In many instances, the pronunciation occupying the highest score position in
list 80 will be different from the pronunciation occupying the highest score position inlist 84. This occurs because the mixed-tree score estimator, using themixed trees 74, screens out those pronunciations that do not contain self-consistent phoneme sequences or otherwise represent pronunciations that would not occur in natural speech. - The system for generating the letter-only trees and the mixed trees is illustrated in FIG. 6. At the heart of the decision tree generation system is
tree generator 120. Thetree generator 120 employs a tree-growing algorithm that operates upon a predetermined set oftraining data 122 supplied by the developer of the system. Typically thetraining data 122 comprise aligned letter, phoneme pairs that correspond to known proper pronunciations of words. Thetraining data 122 may be generated through the alignment process illustrated in FIG. 7. FIG. 7 illustrates an alignment process being performed on an exemplary word BIBLE. The spelledword 124 and itspronunciation 126 are fed to a dynamicprogramming alignment module 128 which aligns the letters of the spelled word with the phonemes of the corresponding pronunciation. Note in the illustrated example the final E is silent. The letter phoneme pairs are then stored asdata 122. - Returning to FIG. 6, the
tree generator 120 works in conjunction with three additional components: a set of possible yes-no questions 130, a set ofrules 132 for selecting the best questions for each node or for deciding if the node should be a lead node, and apruning method 133 to prevent over-training. - The set of possible yes-no questions may include letter questions134 and
phoneme questions 136, depending on whether a letter-only tree or a mixed tree is being grown. When growing a letter-only tree, onlyletter questions 134 are used; when growing a mixed tree both letter questions 134 andphoneme questions 136 are used. - The rules for selecting the best question to populate at each node in the presently preferred embodiment are designed to follow the Gini criterion. Other splitting criteria can be used instead. For more information regarding splitting criteria reference Breiman, Friedman et al, “Classification and Regression Trees.” Essentially, the Gini criterion is used to select a question from the set of possible yes-no questions130 and to employ a stopping rule that decides when a node is a leaf node. The Gini criterion employs a concept called “impurity.” Impurity is always a non-negative number. It is applied to a node such that a node containing equal proportions of all possible categories has maximum impurity and a node containing only one of the possible categories has a zero impurity (the minimum possible value). There are several functions that satisfy the above conditions. These depend upon the counts of each category within a node Gini impurity may be defined as follows. If C is the set of classes to which data items can belong, and T is the current tree node, let f(1|T) be the proportion of training data items in node T that belong to
class 1, f(2|T) the proportion of items belonging to class 2, etc. - To illustrate by example, assume the system is growing a tree for the letter “E.” In a given node T of that tree, the system may, for example, have 10 examples of how “E” is pronounced in words. In 5 of these examples, “E” is pronounced “iy” (the sound “ee” in “cheeze); in 3 of the examples “E” is pronounced “eh” (the sound of “e” in “bed”); and in the remaining 2 examples, “E” is “−” (i.e., silent as in “e” in “maple”).
- Assume the system is considering two possible yes-no questions, Q1 and Q2 that can be applied to the 10 examples. The items that answer “yes” to Q1 include four examples of “iy” and one example of “−” (the other five items answer “no” to Q1.) The items that answer “yes” to Q2 include three examples of “iy” and three examples of “eh” (the other four items answer “no” to Q2). FIG. 6 diagrammatically compares these two cases.
- The Gini criterion answers which question the system should choose for this node, Q1 or Q2. The Gini criterion for choosing the correct question is: find the question in which the drop in impurity in going from parent nodes to children nodes is maximized. This impurity drop ΔT is defined as Δl=i(T)−pyes* i(yes)−pno*i(no), where Pyes is the proportion of items going to the “yes” child and pno is the proportion of items going to the “no” child.
-
- Δl for Q1 is thus:
- i(T)−p yes(Q 1)=1−0.82−0.22=0.32
- i(T)−p no(Q 1)=1−0.22−0.62=0.56
- So Δl (Q1)=0.62−0.5*0.32−0.5*0.56=0.18.
- For Q2, we have l(yes, Q2)=1−0.52−0.52=0.5, and for i(no, Q2)=(same)=0.5. So, Δl(Q2)=0.6−(0.6)*(0.5)−(0.4)*(0.5)=0.12. In this case, Q1 gave the greatest drop in impurity. It will therefore be chosen instead of Q2.
- The rule set132 declares a best question for a node to be that question which brings about the greatest drop in impurity in going from the parent node to its children.
- The tree generator applies the
rules 132 to grow a decision tree of yes-no questions selected from set 130. The generator will continue to grow the tree until the optimal-sized tree has been grown.Rules 132 include a set of stopping rules that will terminate tree growth when the tree is grown to a pre-determined size. In the preferred embodiment the tree is grown to a size larger than ultimately desired. Then pruningmethods 133 are used to cut back the tree to its desired size. The pruning method may implement the Breiman technique as described in the reference cited above. - The tree generator thus generates sets of letter-only trees, shown generally at140 or mixed trees, shown generally at 150, depending on whether the set of possible yes-no questions 130 includes letter-only questions alone or in combination with phoneme questions. The corpus of
training data 122 comprises letter, phoneme pairs, as discussed above. In growing letter-only trees, only the letter portions of these pairs are used in populating the internal nodes. Conversely, when growing mixed trees, both the letter and phoneme components of the training data pairs may be used to populate internal nodes. In both instances the phoneme portions of the pairs are used to populate the leaf nodes. Probability data associated with the phoneme data in the lead nodes are generated by counting the number of occurrences a given phoneme is aligned with a given letter over the training data corpus. - In one embodiment of the present invention, the editor10 is adaptive or self-learning. One or more spelled word-pronunciation pairs are used to update
lexicon 14 as well as to supply new training data upon which thephoneticizer 24 may be retrained or updated. This can be accomplished by using the word-pronunciation pairs asnew training data 122 for generating revised decision trees in accordance with the above-described method. Therefore, the self-learning embodiment improves its phonetic transcription generation over time, resulting in even higher quality transcriptions. - The foregoing discloses and describes merely exemplary embodiments of the present invention. One skilled in the art will readily recognize from such discussion, and from accompanying drawings and claims, that various changes, modifications, and variations can be made therein without the departing from the spirit and scope of the present invention.
Claims (30)
1. A system for developing word-pronunciation pairs based on a spell word input, comprising:
a transcription generator receptive of the spelled word input for generating a phonetic transcription that corresponds to the spelled word input, said phonetic transcription being segmented into syllabic portions;
a phonemic editor connected to said transcription generator for displaying and editing syllabic portions of said phonetic transcription; and
a storage mechanism for updating a lexicon with the spelled word input and said phonetic transcription, thereby developing the desired word-pronunciation pair.
2. The system of claim 1 wherein said transcription generator accesses a dictionary to generate said phonetic transcription, the dictionary storing phonetic transcription data corresponding to a plurality of spelled words;
3. The system of claim 1 wherein said transcription generator using letter-to-sound rules to produce said phonetic transcription.
4. The system of claim 1 wherein said phonetic transcription further includes accentuation data for the spelled word input.
5. The system of claim 4 wherein said dictionary storing accentuation data corresponding to each of the plurality of spelled words and said phonemic editor being operative to display and edit the accentuation data associated with said phonetic transcription.
6. The system of claim 1 wherein said phonemic editor provides a language selection mechanism and said transcription generator being connected to a plurality of dictionaries each of which stores phonetic transcription data in a different language, whereby said transcription generator invokes one of said plurality of dictionaries to produce a phonetic transcription that corresponds to the language from said language selection mechanism.
7. The system of claim 1 further includes a phoneticizer receptive of the spelled word input for producing a plurality of scored phonetic transcriptions, said phoneticizer employing decision trees to produce said plurality of scored phonetic transcriptions.
8. The system of claim 7 wherein at least one of said decision trees based on the sequence of letters and phonemes in the spelled word input.
9. The system of claim 7 further includes a pronunciation selection mechanism connected to said phonemic editor for selecting at least one of said plurality of scored phonetic transcriptions, said phonemic editor displaying each of said plurality of scored phonetic transcriptions.
10. The system of claim 1 wherein the spelled word input and said phonetic transcription stored in said lexicon being used to retrain said transcription generator.
11. The system of claim 9 wherein said pronunciation selection mechanism provides at least one of said plurality of scored phonetic transcriptions for updating said decision trees.
12. The system of claim 7 further includes a speech recognizer connected to said phonemic editor and receptive of speech data corresponding to the spelled word input for rescoring said plurality of scored phonetic transcriptions based on said speech data.
13. The system of claim 1 further includes a speech recognizer receptive of speech data corresponding to the spelled word input and being operative to produce the spelled word input, whereby said transcription generator receptive of the spelled word input from said speech recognizer.
14. The system of claim 1 further includes a speech recognizer receptive of speech data for producing a sounds-like word corresponding to the speech data, such that said phonemic editor being operative to provide a sounds-like phonetic transcription that corresponds to the sounds-like word and replace at least one syllabic portion of said phonetic transcription with said sounds-like phonetic transcription.
15. The system of claim 1 further includes a text-to-speech synthesizer connected to said phonemic editor and receptive of said phonetic transcription for generating speech data.
16. A system for developing word-pronunciation pairs based on a spelled word input, comprising:
a dictionary for storing phonetic transcription data corresponding to a plurality of spelled words;
a transcription generator connected to said dictionary and receptive of the spelled word input for producing a phonetic transcription that corresponds to the spelled word input, said phonetic transcription being segmented into syllabic portions; and
a phonemic editor connected to said transcription generator for displaying and editing syllabic portions of said phonetic transcription, thereby developing the desired word-pronunciation pair.
17. The system of claim 16 wherein said transcription generator being operative to produce said phonetic transcription using letter-to-sound rules.
18. The system of claim 16 further includes a storage mechanism for updating a lexicon with the spelled word and said phonetic transcription.
19. The system of claim 16 wherein said phonetic transcription further includes accentuation data for the spelled word input.
20. The system of claim 19 wherein said dictionary storing accentuation data corresponding to each of the plurality of spelled words and said phonemic editor being operative to display and edit the accentuation data associated with said phonetic transcription.
21. The system of claim 16 wherein the spelled word and said phonetic transcription being used to retrain said transcription generator.
22. The system of claim 16 wherein said phonemic editor provides a language selection mechanism and said transcription generator being connected to a plurality of dictionaries each of which stores phonetic transcription data in a different language, whereby said transcription generator invokes one of said plurality of dictionaries to produce a phonetic transcription that corresponds to the language from said language selection mechanism.
23. The system of claim 16 further includes a phoneticizer receptive of the spelled word input for producing a plurality of scored phonetic transcriptions, said phoneticizer employing decision trees to produce said plurality of scored phonetic transcriptions.
24. The system of claim 23 wherein at least one of said decision trees based on the sequence of letters and phonemes in the spelled word input.
25. The system of claim 23 further includes a pronunciation selection mechanism connected to said phonemic editor for selecting at least one of said plurality of scored phonetic transcriptions, said phonemic editor displaying each of said plurality of scored phonetic transcriptions.
26. The system of claim 25 wherein said pronunciation selection mechanism provides at least one of said plurality of scored phonetic transcriptions for updating said decision trees.
27. The system of claim 23 further includes a speech recognizer connected to said phonemic editor and receptive of speech data corresponding to the spelled word input for rescoring said plurality of scored phonetic transcriptions based on said speech data.
28. The system of claim 16 further includes a speech recognizer receptive of speech data corresponding to the spelled word input and being operative to produce the spelled word input, whereby said transcription generator receptive of the spelled word input of said speech recognizer.
29. The system of claim 16 further includes a speech recognizer receptive of speech data for producing a sounds-like word corresponding to the speech data, such that said phonemic editor being operative to provide a sounds-like phonetic transcription that corresponds to the sounds-like word and replace at least one syllabic portion of said phonetic transcription with said sounds-like phonetic transcription.
30. The system of claim 16 further includes a text-to-speech synthesizer connected to said phonemic editor and receptive of said phonetic transcription for generating speech data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/216,111 US6363342B2 (en) | 1998-12-18 | 1998-12-18 | System for developing word-pronunciation pairs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/216,111 US6363342B2 (en) | 1998-12-18 | 1998-12-18 | System for developing word-pronunciation pairs |
Publications (2)
Publication Number | Publication Date |
---|---|
US20020013707A1 true US20020013707A1 (en) | 2002-01-31 |
US6363342B2 US6363342B2 (en) | 2002-03-26 |
Family
ID=22805742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/216,111 Expired - Fee Related US6363342B2 (en) | 1998-12-18 | 1998-12-18 | System for developing word-pronunciation pairs |
Country Status (1)
Country | Link |
---|---|
US (1) | US6363342B2 (en) |
Cited By (138)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143544A1 (en) * | 2001-03-29 | 2002-10-03 | Koninklijke Philips Electronic N.V. | Synchronise an audio cursor and a text cursor during editing |
US20020143525A1 (en) * | 2001-03-27 | 2002-10-03 | International Business Machines Corporation | Method of decoding telegraphic speech |
US20030177005A1 (en) * | 2002-03-18 | 2003-09-18 | Kabushiki Kaisha Toshiba | Method and device for producing acoustic models for recognition and synthesis simultaneously |
US20030229494A1 (en) * | 2002-04-17 | 2003-12-11 | Peter Rutten | Method and apparatus for sculpting synthesized speech |
US20040034524A1 (en) * | 2002-08-14 | 2004-02-19 | Nitendra Rajput | Hybrid baseform generation |
US20050055197A1 (en) * | 2003-08-14 | 2005-03-10 | Sviatoslav Karavansky | Linguographic method of compiling word dictionaries and lexicons for the memories of electronic speech-recognition devices |
US20050064374A1 (en) * | 1998-02-18 | 2005-03-24 | Donald Spector | System and method for training users with audible answers to spoken questions |
US20050273337A1 (en) * | 2004-06-02 | 2005-12-08 | Adoram Erell | Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition |
US20060135215A1 (en) * | 2004-12-16 | 2006-06-22 | General Motors Corporation | Management of multilingual nametags for embedded speech recognition |
US20060149457A1 (en) * | 2004-12-16 | 2006-07-06 | Ross Steven J | Method and system for phonebook transfer |
DE102005021526A1 (en) * | 2005-05-10 | 2006-11-23 | Siemens Ag | Method and device for entering characters in a data processing system |
US20070016420A1 (en) * | 2005-07-07 | 2007-01-18 | International Business Machines Corporation | Dictionary lookup for mobile devices using spelling recognition |
US20070073541A1 (en) * | 2001-11-12 | 2007-03-29 | Nokia Corporation | Method for compressing dictionary data |
FR2892555A1 (en) * | 2005-10-24 | 2007-04-27 | France Telecom | SYSTEM AND METHOD FOR VOICE SYNTHESIS BY CONCATENATION OF ACOUSTIC UNITS |
US20070150291A1 (en) * | 2005-12-26 | 2007-06-28 | Canon Kabushiki Kaisha | Information processing apparatus and information processing method |
EP1835488A1 (en) | 2006-03-17 | 2007-09-19 | Svox AG | Text to speech synthesis |
US20070233490A1 (en) * | 2006-04-03 | 2007-10-04 | Texas Instruments, Incorporated | System and method for text-to-phoneme mapping with prior knowledge |
US7836412B1 (en) | 2004-12-03 | 2010-11-16 | Escription, Inc. | Transcription editing |
US20120203553A1 (en) * | 2010-01-22 | 2012-08-09 | Yuzo Maruta | Recognition dictionary creating device, voice recognition device, and voice synthesizer |
US20130090921A1 (en) * | 2011-10-07 | 2013-04-11 | Microsoft Corporation | Pronunciation learning from user correction |
US8438029B1 (en) * | 2012-08-22 | 2013-05-07 | Google Inc. | Confidence tying for unsupervised synthetic speech adaptation |
US8478186B2 (en) | 2010-05-10 | 2013-07-02 | King Fahd University Of Petroleum And Minerals | Educational system and method for testing memorization |
US8504369B1 (en) * | 2004-06-02 | 2013-08-06 | Nuance Communications, Inc. | Multi-cursor transcription editing |
WO2013130878A3 (en) * | 2012-03-02 | 2013-11-07 | Apple Inc. | Systems and methods for name pronunciation |
US20140074470A1 (en) * | 2012-09-11 | 2014-03-13 | Google Inc. | Phonetic pronunciation |
US20150012261A1 (en) * | 2012-02-16 | 2015-01-08 | Continetal Automotive Gmbh | Method for phonetizing a data list and voice-controlled user interface |
US20150348542A1 (en) * | 2012-12-28 | 2015-12-03 | Iflytek Co., Ltd. | Speech recognition method and system based on user personalized information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160133246A1 (en) * | 2014-11-10 | 2016-05-12 | Yamaha Corporation | Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program recorded thereon |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9747891B1 (en) * | 2016-05-18 | 2017-08-29 | International Business Machines Corporation | Name pronunciation recommendation |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US20190043382A1 (en) * | 2014-11-04 | 2019-02-07 | Knotbird LLC | System and methods for transforming language into interactive elements |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
WO2019208859A1 (en) * | 2018-04-27 | 2019-10-31 | 주식회사 시스트란인터내셔널 | Method for generating pronunciation dictionary and apparatus therefor |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
CN110797049A (en) * | 2019-10-17 | 2020-02-14 | 科大讯飞股份有限公司 | Voice evaluation method and related device |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
CN112466335A (en) * | 2020-11-04 | 2021-03-09 | 吉林体育学院 | English pronunciation quality evaluation method based on accent prominence |
EP3790001A1 (en) * | 2019-09-09 | 2021-03-10 | Beijing Xiaomi Mobile Software Co., Ltd. | Speech information processing method, device and storage medium |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
EP3848927A1 (en) * | 2015-05-13 | 2021-07-14 | Google LLC | Speech recognition for keywords |
US11211052B2 (en) * | 2017-11-02 | 2021-12-28 | Huawei Technologies Co., Ltd. | Filtering model training method and speech recognition method |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
Families Citing this family (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6694055B2 (en) | 1998-07-15 | 2004-02-17 | Microsoft Corporation | Proper name identification in chinese |
US7080005B1 (en) * | 1999-07-19 | 2006-07-18 | Texas Instruments Incorporated | Compact text-to-phone pronunciation dictionary |
JP3476007B2 (en) * | 1999-09-10 | 2003-12-10 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Recognition word registration method, speech recognition method, speech recognition device, storage medium storing software product for registration of recognition word, storage medium storing software product for speech recognition |
CN1300018A (en) * | 1999-10-05 | 2001-06-20 | 株式会社东芝 | Book reading electronic machine, edition system, storage medium and information providing system |
US6678409B1 (en) * | 2000-01-14 | 2004-01-13 | Microsoft Corporation | Parameterized word segmentation of unsegmented text |
US6757362B1 (en) * | 2000-03-06 | 2004-06-29 | Avaya Technology Corp. | Personal virtual assistant |
DE10042943C2 (en) * | 2000-08-31 | 2003-03-06 | Siemens Ag | Assigning phonemes to the graphemes generating them |
JP4116233B2 (en) * | 2000-09-05 | 2008-07-09 | パイオニア株式会社 | Speech recognition apparatus and method |
US6973427B2 (en) * | 2000-12-26 | 2005-12-06 | Microsoft Corporation | Method for adding phonetic descriptions to a speech recognition lexicon |
US7107215B2 (en) * | 2001-04-16 | 2006-09-12 | Sakhr Software Company | Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study |
GB0113581D0 (en) * | 2001-06-04 | 2001-07-25 | Hewlett Packard Co | Speech synthesis apparatus |
US7043431B2 (en) * | 2001-08-31 | 2006-05-09 | Nokia Corporation | Multilingual speech recognition system using text derived recognition models |
US7099828B2 (en) * | 2001-11-07 | 2006-08-29 | International Business Machines Corporation | Method and apparatus for word pronunciation composition |
US7315820B1 (en) * | 2001-11-30 | 2008-01-01 | Total Synch, Llc | Text-derived speech animation tool |
US6990445B2 (en) * | 2001-12-17 | 2006-01-24 | Xl8 Systems, Inc. | System and method for speech recognition and transcription |
US20030115169A1 (en) * | 2001-12-17 | 2003-06-19 | Hongzhuan Ye | System and method for management of transcribed documents |
KR100467590B1 (en) * | 2002-06-28 | 2005-01-24 | 삼성전자주식회사 | Apparatus and method for updating a lexicon |
US6999918B2 (en) * | 2002-09-20 | 2006-02-14 | Motorola, Inc. | Method and apparatus to facilitate correlating symbols to sounds |
US7827034B1 (en) | 2002-11-27 | 2010-11-02 | Totalsynch, Llc | Text-derived speech animation tool |
DE10304229A1 (en) * | 2003-01-28 | 2004-08-05 | Deutsche Telekom Ag | Communication system, communication terminal and device for recognizing faulty text messages |
JP2004303148A (en) * | 2003-04-01 | 2004-10-28 | Canon Inc | Information processor |
US7720683B1 (en) * | 2003-06-13 | 2010-05-18 | Sensory, Inc. | Method and apparatus of specifying and performing speech recognition operations |
US8577681B2 (en) * | 2003-09-11 | 2013-11-05 | Nuance Communications, Inc. | Pronunciation discovery for spoken words |
US20050114131A1 (en) * | 2003-11-24 | 2005-05-26 | Kirill Stoimenov | Apparatus and method for voice-tagging lexicon |
US20050177369A1 (en) * | 2004-02-11 | 2005-08-11 | Kirill Stoimenov | Method and system for intuitive text-to-speech synthesis customization |
US8200475B2 (en) | 2004-02-13 | 2012-06-12 | Microsoft Corporation | Phonetic-based text input method |
WO2005109399A1 (en) * | 2004-05-11 | 2005-11-17 | Matsushita Electric Industrial Co., Ltd. | Speech synthesis device and method |
JP4652737B2 (en) * | 2004-07-14 | 2011-03-16 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Word boundary probability estimation device and method, probabilistic language model construction device and method, kana-kanji conversion device and method, and unknown word model construction method, |
US7869999B2 (en) * | 2004-08-11 | 2011-01-11 | Nuance Communications, Inc. | Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis |
US7340390B2 (en) * | 2004-10-27 | 2008-03-04 | Nokia Corporation | Mobile communication terminal and method therefore |
US7778821B2 (en) * | 2004-11-24 | 2010-08-17 | Microsoft Corporation | Controlled manipulation of characters |
US20080140398A1 (en) * | 2004-12-29 | 2008-06-12 | Avraham Shpigel | System and a Method For Representing Unrecognized Words in Speech to Text Conversions as Syllables |
US20060198562A1 (en) * | 2005-03-04 | 2006-09-07 | California Innovations Inc. | Foldable insulated bag with trailing member |
US8099281B2 (en) * | 2005-06-06 | 2012-01-17 | Nunance Communications, Inc. | System and method for word-sense disambiguation by recursive partitioning |
GB2428853A (en) * | 2005-07-22 | 2007-02-07 | Novauris Technologies Ltd | Speech recognition application specific dictionary |
US20070016421A1 (en) * | 2005-07-12 | 2007-01-18 | Nokia Corporation | Correcting a pronunciation of a synthetically generated speech object |
JP2007024960A (en) * | 2005-07-12 | 2007-02-01 | Internatl Business Mach Corp <Ibm> | System, program and control method |
US7693716B1 (en) * | 2005-09-27 | 2010-04-06 | At&T Intellectual Property Ii, L.P. | System and method of developing a TTS voice |
US7742919B1 (en) * | 2005-09-27 | 2010-06-22 | At&T Intellectual Property Ii, L.P. | System and method for repairing a TTS voice database |
US7711562B1 (en) * | 2005-09-27 | 2010-05-04 | At&T Intellectual Property Ii, L.P. | System and method for testing a TTS voice |
US7742921B1 (en) | 2005-09-27 | 2010-06-22 | At&T Intellectual Property Ii, L.P. | System and method for correcting errors when generating a TTS voice |
US7630898B1 (en) | 2005-09-27 | 2009-12-08 | At&T Intellectual Property Ii, L.P. | System and method for preparing a pronunciation dictionary for a text-to-speech voice |
TWI340330B (en) * | 2005-11-14 | 2011-04-11 | Ind Tech Res Inst | Method for text-to-pronunciation conversion |
US7801722B2 (en) * | 2006-05-23 | 2010-09-21 | Microsoft Corporation | Techniques for customization of phonetic schemes |
JP4427530B2 (en) * | 2006-09-21 | 2010-03-10 | 株式会社東芝 | Speech recognition apparatus, program, and speech recognition method |
US8078451B2 (en) * | 2006-10-27 | 2011-12-13 | Microsoft Corporation | Interface and methods for collecting aligned editorial corrections into a database |
JP4446313B2 (en) * | 2006-12-15 | 2010-04-07 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Technology for searching for new words to be registered in a dictionary for speech processing |
US8027834B2 (en) * | 2007-06-25 | 2011-09-27 | Nuance Communications, Inc. | Technique for training a phonetic decision tree with limited phonetic exceptional terms |
US20090083035A1 (en) * | 2007-09-25 | 2009-03-26 | Ritchie Winson Huang | Text pre-processing for text-to-speech generation |
US7472061B1 (en) | 2008-03-31 | 2008-12-30 | International Business Machines Corporation | Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciations |
US8165881B2 (en) * | 2008-08-29 | 2012-04-24 | Honda Motor Co., Ltd. | System and method for variable text-to-speech with minimized distraction to operator of an automotive vehicle |
US20100057465A1 (en) * | 2008-09-03 | 2010-03-04 | David Michael Kirsch | Variable text-to-speech for automotive application |
US7952114B2 (en) * | 2008-09-23 | 2011-05-31 | Tyco Electronics Corporation | LED interconnect assembly |
US20110161829A1 (en) * | 2009-12-24 | 2011-06-30 | Nokia Corporation | Method and Apparatus for Dictionary Selection |
US20110184723A1 (en) * | 2010-01-25 | 2011-07-28 | Microsoft Corporation | Phonetic suggestion engine |
US20120303368A1 (en) * | 2011-05-27 | 2012-11-29 | Ting Ma | Number-assistant voice input system, number-assistant voice input method for voice input system and number-assistant voice correcting method for voice input system |
US8930189B2 (en) * | 2011-10-28 | 2015-01-06 | Microsoft Corporation | Distributed user input to text generated by a speech to text transcription service |
US9348479B2 (en) | 2011-12-08 | 2016-05-24 | Microsoft Technology Licensing, Llc | Sentiment aware user interface customization |
US9378290B2 (en) | 2011-12-20 | 2016-06-28 | Microsoft Technology Licensing, Llc | Scenario-adaptive input method editor |
CN104428734A (en) | 2012-06-25 | 2015-03-18 | 微软公司 | Input method editor application platform |
US8959109B2 (en) | 2012-08-06 | 2015-02-17 | Microsoft Corporation | Business intelligent in-document suggestions |
JP6122499B2 (en) | 2012-08-30 | 2017-04-26 | マイクロソフト テクノロジー ライセンシング,エルエルシー | Feature-based candidate selection |
CN105580004A (en) | 2013-08-09 | 2016-05-11 | 微软技术许可有限责任公司 | Input method editor providing language assistance |
KR20150027465A (en) * | 2013-09-04 | 2015-03-12 | 한국전자통신연구원 | Method and apparatus for generating multiple phoneme string for foreign proper noun |
RU2632137C2 (en) | 2015-06-30 | 2017-10-02 | Общество С Ограниченной Ответственностью "Яндекс" | Method and server of transcription of lexical unit from first alphabet in second alphabet |
US10468015B2 (en) | 2017-01-12 | 2019-11-05 | Vocollect, Inc. | Automated TTS self correction system |
CN110600038B (en) * | 2019-08-23 | 2022-04-05 | 北京工业大学 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4831654A (en) | 1985-09-09 | 1989-05-16 | Wang Laboratories, Inc. | Apparatus for making and editing dictionary entries in a text to speech conversion system |
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
US5212730A (en) * | 1991-07-01 | 1993-05-18 | Texas Instruments Incorporated | Voice recognition of proper names using text-derived recognition models |
EP0562138A1 (en) * | 1992-03-25 | 1993-09-29 | International Business Machines Corporation | Method and apparatus for the automatic generation of Markov models of new words to be added to a speech recognition vocabulary |
US5384893A (en) * | 1992-09-23 | 1995-01-24 | Emerson & Stern Associates, Inc. | Method and apparatus for speech synthesis based on prosodic analysis |
CA2119397C (en) * | 1993-03-19 | 2007-10-02 | Kim E.A. Silverman | Improved automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation |
US5432948A (en) * | 1993-04-26 | 1995-07-11 | Taligent, Inc. | Object-oriented rule-based text input transliteration system |
US5794197A (en) * | 1994-01-21 | 1998-08-11 | Micrsoft Corporation | Senone tree representation and evaluation |
GB2290684A (en) * | 1994-06-22 | 1996-01-03 | Ibm | Speech synthesis using hidden Markov model to determine speech unit durations |
US5832434A (en) * | 1995-05-26 | 1998-11-03 | Apple Computer, Inc. | Method and apparatus for automatic assignment of duration values for synthetic speech |
JP2927706B2 (en) * | 1995-06-12 | 1999-07-28 | 松下電器産業株式会社 | Similar character string expansion method, search method and their devices |
US6092044A (en) * | 1997-03-28 | 2000-07-18 | Dragon Systems, Inc. | Pronunciation generation in speech recognition |
US5933804A (en) * | 1997-04-10 | 1999-08-03 | Microsoft Corporation | Extensible speech recognition system that provides a user with audio feedback |
US6163769A (en) * | 1997-10-02 | 2000-12-19 | Microsoft Corporation | Text-to-speech using clustered context-dependent phoneme-based units |
US6230131B1 (en) * | 1998-04-29 | 2001-05-08 | Matsushita Electric Industrial Co., Ltd. | Method for generating spelling-to-pronunciation decision tree |
US6029132A (en) * | 1998-04-30 | 2000-02-22 | Matsushita Electric Industrial Co. | Method for letter-to-sound in text-to-speech synthesis |
US6016471A (en) * | 1998-04-29 | 2000-01-18 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word |
US6078885A (en) * | 1998-05-08 | 2000-06-20 | At&T Corp | Verbal, fully automatic dictionary updates by end-users of speech synthesis and recognition systems |
US6233553B1 (en) * | 1998-09-04 | 2001-05-15 | Matsushita Electric Industrial Co., Ltd. | Method and system for automatically determining phonetic transcriptions associated with spelled words |
-
1998
- 1998-12-18 US US09/216,111 patent/US6363342B2/en not_active Expired - Fee Related
Cited By (202)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8202094B2 (en) * | 1998-02-18 | 2012-06-19 | Radmila Solutions, L.L.C. | System and method for training users with audible answers to spoken questions |
US20050064374A1 (en) * | 1998-02-18 | 2005-03-24 | Donald Spector | System and method for training users with audible answers to spoken questions |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US20020143525A1 (en) * | 2001-03-27 | 2002-10-03 | International Business Machines Corporation | Method of decoding telegraphic speech |
US6772116B2 (en) * | 2001-03-27 | 2004-08-03 | International Business Machines Corporation | Method of decoding telegraphic speech |
US8706495B2 (en) | 2001-03-29 | 2014-04-22 | Nuance Communications, Inc. | Synchronise an audio cursor and a text cursor during editing |
US20020143544A1 (en) * | 2001-03-29 | 2002-10-03 | Koninklijke Philips Electronic N.V. | Synchronise an audio cursor and a text cursor during editing |
US8380509B2 (en) | 2001-03-29 | 2013-02-19 | Nuance Communications Austria Gmbh | Synchronise an audio cursor and a text cursor during editing |
US8117034B2 (en) | 2001-03-29 | 2012-02-14 | Nuance Communications Austria Gmbh | Synchronise an audio cursor and a text cursor during editing |
US20070073541A1 (en) * | 2001-11-12 | 2007-03-29 | Nokia Corporation | Method for compressing dictionary data |
US20030177005A1 (en) * | 2002-03-18 | 2003-09-18 | Kabushiki Kaisha Toshiba | Method and device for producing acoustic models for recognition and synthesis simultaneously |
GB2391143A (en) * | 2002-04-17 | 2004-01-28 | Rhetorical Systems Ltd | Method and apparatus for scultping synthesized speech |
US20030229494A1 (en) * | 2002-04-17 | 2003-12-11 | Peter Rutten | Method and apparatus for sculpting synthesized speech |
US7206738B2 (en) * | 2002-08-14 | 2007-04-17 | International Business Machines Corporation | Hybrid baseform generation |
US20040034524A1 (en) * | 2002-08-14 | 2004-02-19 | Nitendra Rajput | Hybrid baseform generation |
US20050055197A1 (en) * | 2003-08-14 | 2005-03-10 | Sviatoslav Karavansky | Linguographic method of compiling word dictionaries and lexicons for the memories of electronic speech-recognition devices |
US20050273337A1 (en) * | 2004-06-02 | 2005-12-08 | Adoram Erell | Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition |
US8504369B1 (en) * | 2004-06-02 | 2013-08-06 | Nuance Communications, Inc. | Multi-cursor transcription editing |
WO2005122140A1 (en) * | 2004-06-02 | 2005-12-22 | Intel Corporation | Synthesizing audible response to an utterance in speaker-independent voice recognition |
US7836412B1 (en) | 2004-12-03 | 2010-11-16 | Escription, Inc. | Transcription editing |
US9632992B2 (en) | 2004-12-03 | 2017-04-25 | Nuance Communications, Inc. | Transcription editing |
US8028248B1 (en) | 2004-12-03 | 2011-09-27 | Escription, Inc. | Transcription editing |
CN1790483B (en) * | 2004-12-16 | 2010-12-08 | 通用汽车有限责任公司 | Management method and system of multilingual nametags for embedded speech recognition |
US20060149457A1 (en) * | 2004-12-16 | 2006-07-06 | Ross Steven J | Method and system for phonebook transfer |
US7596370B2 (en) * | 2004-12-16 | 2009-09-29 | General Motors Corporation | Management of nametags in a vehicle communications system |
US7711358B2 (en) * | 2004-12-16 | 2010-05-04 | General Motors Llc | Method and system for modifying nametag files for transfer between vehicles |
US20060135215A1 (en) * | 2004-12-16 | 2006-06-22 | General Motors Corporation | Management of multilingual nametags for embedded speech recognition |
DE102005021526A1 (en) * | 2005-05-10 | 2006-11-23 | Siemens Ag | Method and device for entering characters in a data processing system |
US20070016420A1 (en) * | 2005-07-07 | 2007-01-18 | International Business Machines Corporation | Dictionary lookup for mobile devices using spelling recognition |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
WO2007048891A1 (en) * | 2005-10-24 | 2007-05-03 | France Telecom | System and method for synthesizing speech by concatenating acoustic units |
FR2892555A1 (en) * | 2005-10-24 | 2007-04-27 | France Telecom | SYSTEM AND METHOD FOR VOICE SYNTHESIS BY CONCATENATION OF ACOUSTIC UNITS |
US8032382B2 (en) | 2005-12-26 | 2011-10-04 | Canon Kabushiki Kaisha | Information processing apparatus and information processing method |
EP1811369A3 (en) * | 2005-12-26 | 2009-07-08 | Canon Kabushiki Kaisha | Speech information processing apparatus and speech information processing method |
EP1811369A2 (en) * | 2005-12-26 | 2007-07-25 | Canon Kabushiki Kaisha | Speech information processing apparatus and speech information processing method |
US20070150291A1 (en) * | 2005-12-26 | 2007-06-28 | Canon Kabushiki Kaisha | Information processing apparatus and information processing method |
US20090076819A1 (en) * | 2006-03-17 | 2009-03-19 | Johan Wouters | Text to speech synthesis |
EP1835488A1 (en) | 2006-03-17 | 2007-09-19 | Svox AG | Text to speech synthesis |
US7979280B2 (en) | 2006-03-17 | 2011-07-12 | Svox Ag | Text to speech synthesis |
US20070233490A1 (en) * | 2006-04-03 | 2007-10-04 | Texas Instruments, Incorporated | System and method for text-to-phoneme mapping with prior knowledge |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US20120203553A1 (en) * | 2010-01-22 | 2012-08-09 | Yuzo Maruta | Recognition dictionary creating device, voice recognition device, and voice synthesizer |
US9177545B2 (en) * | 2010-01-22 | 2015-11-03 | Mitsubishi Electric Corporation | Recognition dictionary creating device, voice recognition device, and voice synthesizer |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US8478186B2 (en) | 2010-05-10 | 2013-07-02 | King Fahd University Of Petroleum And Minerals | Educational system and method for testing memorization |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US20130090921A1 (en) * | 2011-10-07 | 2013-04-11 | Microsoft Corporation | Pronunciation learning from user correction |
US9640175B2 (en) * | 2011-10-07 | 2017-05-02 | Microsoft Technology Licensing, Llc | Pronunciation learning from user correction |
US9405742B2 (en) * | 2012-02-16 | 2016-08-02 | Continental Automotive Gmbh | Method for phonetizing a data list and voice-controlled user interface |
US20150012261A1 (en) * | 2012-02-16 | 2015-01-08 | Continetal Automotive Gmbh | Method for phonetizing a data list and voice-controlled user interface |
KR101670150B1 (en) * | 2012-03-02 | 2016-10-27 | 애플 인크. | Systems and methods for name pronunciation |
JP2015512062A (en) * | 2012-03-02 | 2015-04-23 | アップル インコーポレイテッド | Name pronunciation system and method |
CN104380373A (en) * | 2012-03-02 | 2015-02-25 | 苹果公司 | Systems and methods for name pronunciation |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
KR20140136969A (en) * | 2012-03-02 | 2014-12-01 | 애플 인크. | Systems and methods for name pronunciation |
EP3147897A1 (en) * | 2012-03-02 | 2017-03-29 | Apple Inc. | Systems and methods for name pronunciation |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
WO2013130878A3 (en) * | 2012-03-02 | 2013-11-07 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US8438029B1 (en) * | 2012-08-22 | 2013-05-07 | Google Inc. | Confidence tying for unsupervised synthetic speech adaptation |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US20140074470A1 (en) * | 2012-09-11 | 2014-03-13 | Google Inc. | Phonetic pronunciation |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9564127B2 (en) * | 2012-12-28 | 2017-02-07 | Iflytek Co., Ltd. | Speech recognition method and system based on user personalized information |
US20150348542A1 (en) * | 2012-12-28 | 2015-12-03 | Iflytek Co., Ltd. | Speech recognition method and system based on user personalized information |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US20190043382A1 (en) * | 2014-11-04 | 2019-02-07 | Knotbird LLC | System and methods for transforming language into interactive elements |
US10896624B2 (en) * | 2014-11-04 | 2021-01-19 | Knotbird LLC | System and methods for transforming language into interactive elements |
US20160133246A1 (en) * | 2014-11-10 | 2016-05-12 | Yamaha Corporation | Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program recorded thereon |
US9711123B2 (en) * | 2014-11-10 | 2017-07-18 | Yamaha Corporation | Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program recorded thereon |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
EP3848927A1 (en) * | 2015-05-13 | 2021-07-14 | Google LLC | Speech recognition for keywords |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9747891B1 (en) * | 2016-05-18 | 2017-08-29 | International Business Machines Corporation | Name pronunciation recommendation |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11211052B2 (en) * | 2017-11-02 | 2021-12-28 | Huawei Technologies Co., Ltd. | Filtering model training method and speech recognition method |
WO2019208859A1 (en) * | 2018-04-27 | 2019-10-31 | 주식회사 시스트란인터내셔널 | Method for generating pronunciation dictionary and apparatus therefor |
US11270693B2 (en) | 2019-09-09 | 2022-03-08 | Beijing Xiaomi Mobile Software Co., Ltd. | Speech information processing method, device and storage medium |
EP3790001A1 (en) * | 2019-09-09 | 2021-03-10 | Beijing Xiaomi Mobile Software Co., Ltd. | Speech information processing method, device and storage medium |
CN110797049A (en) * | 2019-10-17 | 2020-02-14 | 科大讯飞股份有限公司 | Voice evaluation method and related device |
CN112466335A (en) * | 2020-11-04 | 2021-03-09 | 吉林体育学院 | English pronunciation quality evaluation method based on accent prominence |
Also Published As
Publication number | Publication date |
---|---|
US6363342B2 (en) | 2002-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6363342B2 (en) | System for developing word-pronunciation pairs | |
US6233553B1 (en) | Method and system for automatically determining phonetic transcriptions associated with spelled words | |
US6029132A (en) | Method for letter-to-sound in text-to-speech synthesis | |
US6016471A (en) | Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word | |
US8069045B2 (en) | Hierarchical approach for the statistical vowelization of Arabic text | |
JP3481497B2 (en) | Method and apparatus using a decision tree to generate and evaluate multiple pronunciations for spelled words | |
US6792407B2 (en) | Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
Watts | Unsupervised learning for text-to-speech synthesis | |
EP0917129A2 (en) | Method and apparatus for adapting a speech recognizer to the pronunciation of an non native speaker | |
Stöber et al. | Speech synthesis using multilevel selection and concatenation of units from large speech corpora | |
Ngugi et al. | Swahili text-to-speech system | |
Kominek | Tts from zero: Building synthetic voices for new languages | |
Rahate et al. | An experimental technique on text normalization and its role in speech synthesis | |
Kaur et al. | BUILDING AText-TO-SPEECH SYSTEM FOR PUNJABI LANGUAGE | |
JPH03245192A (en) | Method for determining pronunciation of foreign language word | |
IMRAN | ADMAS UNIVERSITY SCHOOL OF POST GRADUATE STUDIES DEPARTMENT OF COMPUTER SCIENCE | |
Kato et al. | Multilingualization of Speech Processing | |
El Ouahabi et al. | Toward an automatic speech recognition system for amazigh‑tarifit | |
Lenzo et al. | Rapid-deployment text-to-speech in the DIPLOMAT system. | |
FalDessai | Development of a Text to Speech System for Devanagari Konkani | |
Togawa et al. | Voice-activated word processor with automatic learning for dynamic optimization of syllable-templates | |
Stergar et al. | Labeling of symbolic prosody breaks for the slovenian language | |
Rozinaj | Towards More Intelligent Speech Interface | |
Boves | The ESPRIT project polyglot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHAW, RHONDA;KUHN, ROLAND;PEARSON, STEVE;REEL/FRAME:009682/0179 Effective date: 19981215 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20060326 |