CA1256562A - Speech recognition method - Google Patents

Speech recognition method

Info

Publication number
CA1256562A
CA1256562A CA000528993A CA528993A CA1256562A CA 1256562 A CA1256562 A CA 1256562A CA 000528993 A CA000528993 A CA 000528993A CA 528993 A CA528993 A CA 528993A CA 1256562 A CA1256562 A CA 1256562A
Authority
CA
Canada
Prior art keywords
frequence
adaptation
obtaining
frequences
probabilistic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000528993A
Other languages
French (fr)
Inventor
Akihiro Kuroda
Masafumi Nishimura
Kazuhide Sugawara
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of CA1256562A publication Critical patent/CA1256562A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Abstract

ABSTRACT

Speaker adaptation is provided which easily enables a person to use a Hidden Markov model type recognizer previously trained by other particular person or persons. During training process, parameters of Markov models are calculated iteratively for example using Forward-Backward algorithm.
The adaptation comprises storing and utilizing the interme-diate results or probabilistic frequences of the last iteration. During the adaptation process, parameters are calculated by interpolation of the weighted sum of the stored probabilistic frequences and the ones obtained using new training data.

Description

iS~

SPEECH RECOGNITION METHOD

Detailed Description of the Invention Field of the _nvention The present invention relates to a speech recognition method using Markov models and more particularly to a speech recognition method wherein speaker adaptation can be easily performed.

Prior Art In a speech recognition using Markov models, a speech is recognized from probablistic view points. In one method, for example, a Markov model is established for each word.
Generally for each Markov model, a plurality of states and transitions between the states are defined, and for the transitions, occurrence probabilities and output probabilities of labels or symbols are assigned. An unknown speech is converted into a label string, and thereafter a probability of each word Markov model outputting the label string is determined based on the transition occurrence probabilities and the label output probabilities which are hereafter referred to parameters, and then the word Markov model having the highest probability of producing the label string is obtained. The recognition is 0~

~L~S~

performed according to this result. In the speech recognition using Markov models, the parameters can be estimated statis-tically so that a recognition score is improved.

The details of the above recognition technique are described in the following articles.

(1) "A Maximum Likelihood Approach to Continuous Speech Recognition" (IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-Vol.5, No.2, pp.
179-190, 1983, Lalit R.Bahl, Frederick Jelinek and Robert L. Mercer)
(2) "Continuous Speech Recognition by Statistical Methods"
(Proceedings of the IEEE Vol. 64, 1976, ppo532-556 Frederick Jelinek)
(3) "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition" (The Bell System Technical Journal Vol.64, No.4, 1983, April, S.E.Levinson, L.R.Rabiner and M.M. Sondhi).
A speech recognition using Markov models however needs a tremendous amount of speech data and the training thereof re~uires much time. Furthermore a system trained with a certain speaker often does not get sufficient recognition scores for other speakers. Inspite of the same speaker, when there is a long time between the training and the recognitlon, that is, there is a difference between the two circumstances, only poor recognition can be achieved.

~:56S~i2 Problems to be Solved by the Invention As a consequence of the foregoing difficulties in the prior art, it is an object of the present invention to provide a speech recognition method wherein a trained system can be adapted for a different circumstance and the adaptation can be done more easily.

Brief Description of the Drawings Fig. 1 is a block diagram illustrating one embodiment of the invention, Fig. 2 is a diagram for describing the invention, Fig. 3 is a flow chart describing the operation of the labeling block 5 of the example shown in Fig. 1, Fig. 4 is a flow chart describing the operation of the training block 8 of the example shown in Fig. 1, Fig. 5, Fig. 6 and Fig. 7 are diagrams for describing the flow of the operation shown in Fig. 4, Fig. 8 is a diagram for describing the operation of the adapting block of the example shown in Fig. 1, Fig. 9 is a flow chart describing the modified example of the example shown in Fig~ 1.

8 .... training block 9 .... adapting block Summary of the Invention In order to accomplish the above object, according the present invention, frequences of events, which ha~e been used for estimating parameters of Markov models during initial training, are stored. Frequences of events in adaptation data are next determined referring to the parameters of the Markov models. Then new parameters are estimated utilizing frequences of the two types of events.

Fig. 2 shows one example of trellis. In Fig. 2, the traverse axis indicates passing time and the ordinate axis indicates states of a Markov model. An inputted label string is shown as w], w2....wl along the time axis. The state of the Markov model, while time passes,is changing from the initial state I to the final state F along the path. The broken line shows the whole paths. In this case the frequence of passing from i-th state to j-th s-tate and at the same time outputting a label k, c*(i,j,k), that is, the frequence of passing through the paths indicated by the arrows in Fig. 2 and outputting a label k is determined from the parameters p(i,j,k). Here p(i,j,k) is defined as the probability of passing through from i to j and outputting k. By the way the frequence of the Markov model being at the state i, S*(i), as shown by the arc, is obtained by summing up C*(i,j,k) for each j and each k. From the definition of frequences C*~i,j,k) and S*(i), the new parameters P'(i,j,k) can be obtained according to the following estimation equation.

P'(i,j,k)= C*(i,j,k)/S*(i) ~.2~6~

Iterating the above estimation can result in the parame~
Po(i,j,k) accurately reflecting the training data. Here t suffix Zero indicates that the value is after training, and likewise S0* and C0* are the values after training.

According to the present invention, Eor adaptation, frequences C1*(i,j,k) and S1*(i) about adaptation speech data is obtained using parameter Po(i,j,k). And then the new parameter after adaptation P1(i,j,k) is determined as follows.

Pl(i,j,k)= (;'~)Co*(i,j,k~+(l-~)Cl*(i,j,k) /
(~) So* (i) + (1-~) Sl Wherein 0 = =1 This means that the frequences needed for estimation are determined by interpolation. This interpolation can make parameters Po(i,j,k) obtained by the initial training, adapted for different recognition circumstance.

Furthermore according to the present invention, based on the equation of Co*(i~j~k)=Po(i~j~k)xSo*(i)~ the following estimation can be taken.

Pl(i,j,k) = (~)Po(i,j,k).So*(i)~ )Cl*(i,j,k) /
(j1~) So* (i) + (1-~) Sl* (i) In this case the frequence Co*(i,j,k) doesn't need to be stored.

~A9-~6-002 -5-~25~i~62 When initial training data is quite different from adaptation data, it is desirable to make use of the following ins~ead of Po(i,j,k).

~ )Po(i,j,k)+~e 0~

Here e is a certain small constant number,and 1/(the number of the labels)x(the number of the branches) is actually preferable.

In the preferred embodiment to be hereinafter described, probabilities of each passing through from one state to another state and outputting a label are used as probabilistic parameters of Markov models, though transition occurrence probabilities and label outputting probabilities may be defined separately and used as parameters.

Embodiments of the Invention Now, referring to the drawings, the present invention will be explained below with respect to an embodiment thereof which is applied to a word recognition system.

In Fig. 1 illustrating the embodiment as a whole, inputted speech data is supplied to an analog/digital (A/D) converter 3 through a microphone 1 and an amplifier 2 to be converted into digital data, which is then supplied to a feature extracting block 4. The feature extracting block 4 can be a array processor of Floating Point Systems Inc. In the feature extracting block 4, .~A9-86-002 -6-~25i~i562 speech data is at first descret-Fourier- transformed every ten milli seconds using twenty milli second wide window, and is then outputted at each channel of a 20 channel band pass filter and is subsequently provided to the next stage, or labelling block 5. This block 5 performs labelling referring to a label prototype dictionary 6, who's prototypes have been produced by clustering. The number of them is 128.

The labelling is for example performed as shown in Fig. 3, in which X is the inputted feature, Yi is the feature of the i-th prototype, N is the number of the all prototypes(=128), dist(X, Yi) is the Euclid distance between X and Yi, and m is the minimum value among previous dist(X,Yi)'s. m is initialized to a very large number. As shown in the figure inputted features X7 S are in turn compared with each feature prototype, and for each inputted feature the most like prototype, that is, the prototype having the shortest distance is outputted as an observed label or label number P. As described above, the labelling block 5 outputs a label string with duration of ten milli sec. between consec-utive labels.

The label string from the labelling block 5 is provided to one of a training block 8, an adapting block 9 and a recog-nition block 10 through a switching block 7. The detailed description about the operation of the training block 8 and the adapting block 9 will be given later referring to Fig. 4 and the following figures. During initial training the switching block 7 is ~L2~ i2' switched to the training block 8 to provide the label string thereto. The training block 8 determines parameter values of a parameter table 11 by training Markov models using the label string. During adaptation the switching block 7 i5 switched to the adapting block 9, which adapts the parameter values of the parameter table 11 based on the label string.
During recognition the switching block 7 is switched to the recognition block 10, which recognizes an inputted speech based on the label string and the parameter table. The recognition block 10 can be designed according to E'orward calculation or Vitervi algorithms which should be referred to the above article (2) in detail. The output of the recognition block 10 is provided to a workstation 12 and is for example displayed on its monitor screen.

In Fig. 1 the blocks surrounded by the broken line is in fact implemented in software on a host computer. An IBM
3083 processor is used as the host computer, and CMS and PL/1 are used as an operation system and a language .-respectively. The above blocks can be implemented in hardware.

The operation of the training block 8 will be next described in detail.

In Fig. 4 showing the procedure of the initial training, each word Markov model is first defined, step 13. In this embodiment the number of words is 200. A word Markov model is such as shown in Fig. 5. In this figure small solid circles indicate states, and arrows show transitions, The number of the states including the initial state I and the final state F is 8.
There are three types of transitions, that is, transitions to the next states tN, transitions skipping one state tS, and transitions looping the same state tL. The number of the labels assigned to one word is about 40 to 50, and the label string of the word is matched against from the initial state to the final state, looping sometimes and skipping sometimes To define the Markov models means to establish the parameter table of Fig. 1 tentatively. In particular, for each word a table format as shown in Fig. 6 is assigned and the parameters P(i,j,k) are initialized. The parameter P(i,j,k) means a probability that a transition from the state i to the state j occurs in a Markov model, outputting a label k.
Furthermore in this initialization, the parameters are set so that a transition to the next state, a looping transition and a skipping transition occurs at probabilities of 0.9, 0.05 and 0.05 respectively and so that on each transition all labels are produced at equal probability that is 1/128.

After defining word Markov models, initial training data is inputted, step 14, which data has been obtained by speaking 200 words to be recognized five times. Five utterances of a word have been put together and each utterance has been prepared to show which word it responds to, and in which order it was spoken. Here let U=(u1,u2,.~.,u5) to indicate a set of utterances of one specific word and let un=wnl, wn2,.O.,wnln to indicate each utterance un. wnl... indicate here observed labels.

~A9-86-002 -9-~%5i6~2 After completing the input of the initial training data, then Forward ealculation and Baekward calculation are performed, step 15. Though the following procedure i5 performed for each word, for eonvenienee of description, eonsideration is only given to a set of utterances of one word. In Forward ealeulation and Baekward ealeulation the following forward value f(i,x,n) and backward value b(i,x,n) are ealeulated.

f(i,x,n): the frequenee that the model reaehes the state i at time x after starting from the initial state at time o for the label string un.
b(i,xrn): the frequenee that the model reaehes back the state i at the time x after starting from the final state at time ln for the label string un.

Forward calculation and Backward calculation are easily performed sequentially using the following equations.

Forward Caleulation For x=0, fli,O,n)= 1 if i=1 0 otherwise For 1~=x<=ln, f(i,x,n) = ~2 ~f(i-k, x-l, n).Pt_1(i-k,i,wn~)~
k=0 ~A9-86-002 -10-wherein Pt 1 is the parameter stored in the parameter table at that time, k is determined depending on the Markov model, and in the embodiment k=O,l,or 2.

Backward Calculation For x=ln, b(i,x,n)= 1 if i=E, 8 for the case 0 otherwise For C=x<=ln, b(i~ x, n) = 2~b(i~k, x+l, n).Pt_1(i, i~k, wnx+l)~
k=0 wherein E is the number of the states of the Markov model.

After completing the Forward and Backward calculations, the frequence that the model passes from the state i to the state j outputting the label k for the label string un, count(i,j,k,n) is then determined based on the forward value f(i,x,n) and the backward value b(i,k,n) for the label string un, step 16. Count(i,j,k,n) is determined according to the following equations.

count(i,j,k,n) ln = ~ ~(wnx,k) .f(i,x-l,n) Ob(j,x,n~ .Pt_l(i,j,Wnx)) x=l wherein delta(wnx t k)- 1 if wnx=k 0 otherwise s~t The above expression can be easily understood referring to Fig. 7, which shows trellis for matching the Markov model of this embodiment against the label string un(= wnl wn2 ...w ln). un(wnx) is also shown along the time axis. When wnx-k, that is, delta(wnx,k)=l, the wnx is circled. Let's consider the path accompanied by an arrow and extending from the state i, state 3 in Fig. 7 to the state j, state 4 in Fig. 7 at the observing time x, at which the label wn occurs. Both ends of the path are indicated by small solid circles in Fig. 7. In this case the probability that the Markov model outputs k=wnx is Pt_lti~j,wnx).
the frequence that the Markov model extends from the initial state I to the solid lattice point of the state i and the time x-l as shown by the broken line f is represented by the forward value f(i,x-l,n), and on the other hand the frequence that it reaches back from the final state F to the solid lattice point of the state j and the time x as shown by the broken line b is represented by the backward value b(j,x,n).
The frequence that k=wn is outputted on the path p is therefore as follows.

f(i,x-l,n).b(j,x,n).P(i,j,wn ) Count(i,j,k,n) is such as obtained by summing up the frequences of circled labels, which corresponds to the operation of delta(wnx,k), and it is obvious that it is expressed using the above expression. Namely, ~256~

count(i,j,k,n) ln = ~ (&(w ,k)-f(i,x-l,n3.b(j,x,n).Pt_l(i,j,wnx)) x=l After obtaining count(i,j,k,n) for each label string u (n=l to 5), the frequence over a set of label strings, U, that is,whole training data for a certain word,Ct(i,j,k) is obtained, step 17. It should be noted that label strings ur~
are different from each other and that the frequences of the label strings un, or total probabilities of the label strings Tn are different from each other. Frequence count(i,j,k,n) should be therefore normalized by the total probability Tn. Here T =f(E,l ,n) and E=8 in this case.

The frequence over the whole training data of the word to be recognized, Ct(i,j,k) is determined as follows.

S _ ~ ~ count(i, j, k, n) n=l Tn Next the frequence that Markov model is at the state i over the training data for the word to be recognized, St(i) is determined likewise based on count(i,j,k,n), step 18.

St(i) =5 1 (~1~ count(i, j, k, n)) n=l Tn jk Based on the frequences Ct(i,j,k) and St(i), the next parameter Pt~1(i,j,k) is estimated as follows, step 19.

Pt(i,j,k) = Ct(i,i,k) St (i) The above estimation process,or procedure of steps 14 through 19 is repeated predetermined number of times for example five times to complete the training of the target words,step 20. For other words the same training is performed.

After completing the training, the final parameter Po(i,j,k) is determined on the parameter table, Fig. 1, to be used for speech recognition that is followed. The frequence which has been used for the last round of the estimation, So(i), is also storedO This frequence So(i) is to be used for the adaptation, which will be hereinafter described.

The operation of the adapting block 9 will be next described referring to Fig. 8. In Fig. 8 parts havlng counterparts in Fig. 4 are given the corresponding reference numbers, and detailed description thereof will not be given.

In Fig. 8 adaptation data is inputted first, Step 14A, which data has been obtained by the speaker who is going to input his speech, uttering once for each word to be recognized.
After this the operations shown in the steps 15A through 18A
are performed in the same way as in the case of the above mentioned training. Then two frequences which are to be JA~-86-002 -14-~6~

used for estimation are obtained respectively by interpolation~
In the end the new parameter Pl(i,j,k) is then obtained as follows, Step 21.

(~) Co(i,j,k) ~ A)Cl (i,j,k) ~) So (i) ~ ) Sl (i) wherein 0 = =1 In this example the adaptation process is performed only once, though that process may be performed more than twice.
Actually Co (i,j,k) is equal to Po(i,j,k).So(i), so the following expression is used for the estimation of Pl(i,j,k).

(~) Po(i,j,k).So(i) + (l-~)Cl (i,j,k) p (~) So (i) + (1-~) Sl ~i) "a" of count (i,j,k, ) in Fig. 8 shows that this frequence is one about the label string for the adaptation data.

After performing those steps above mentioned, the adaptation is completed. From now on the speech of the speaker for whom the adaptation has been done is recognized at a high score.

According to this embodiment the system can be adapted for a different circumstance with small data and short training time.

~256~

Additionally optimization of the system can be achieved by adjusting the interior division ratio, according to the quality of the adaptation data such as reliability.

Assuming that the numbers of Markov models, branches, and labels are respectively X, Y, Z, then the the number of data increased by S(i) is X. On the other hand the number of data increased by Po(i,j,k) is XYZ. Therefore the number of data increased by this adaptation is very small, as follows.

Z/XYZ=1/YZ

This embodiment also has an advantage that a part of software or hardware can be made common to the adaptation process and the initial training process because both processes have many identical steps.

Furthermore the adaptation can be performed again for example for the word wrongly recognized because adaptation can be performed for each word. Needless to say, adaption for a word may not be performed until the word is wrongly recognized.

The modification of the above mentioned embodiment will be next described. Adaptation can be performed well through this modification whçn the quality of the adaptation data is quite different from that of the initial training data.

Fig. 9. shows the adaptation process of this modified example. In this figure parts having counterparts in Fig. 8 are given the corresponding reference numbers, and detailed description thereof will not be given.

In the modified example shown in Fig. 9. the interpolation using Po(i,j,k) is performed as followsl step 22, before the new frequences, Cl(i,j,k) and Sl(i) are obtained from the adaptation data.

PO(i~j~k)=C(l-~)Po(i~j~k)~e This means that the value obtained by interpolating the parameter Po~i,j,k) and a small value e with the interior division ratio ~ is utilized as the new parameter. In the training process during the adaptation process, how well parameters converge to actual values also depends heavily on initial values. Some paths which occurred rarely for initial training data may occur frequently for adaptation data. In this case adding a small number e to the parameter Po(i,j,k) provides a better convergence.

Effect of the Invention As described herein before, according to the present invention adaptation of a speech recognition system can be done with small data and short time. The required store capacity, the increase in the number of program steps and hardware components are respectively very small. Additionally optimization of ~s~

the system can be achieved by adjusting the interior division ratio according to the quality of the adaptation data.

Claims (3)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. Speech recognition method comprising: defining, for each recognition unit, a probabilistic model including a plurality of states, at least one transition each extending from a state to a state, and the probabilities of outputting each label in each of said transitions;

generating, a first label string for each of said recognition units from initial training data thereof;
iteration, for each of said recognition units, of updating the probabilities of the corresponding probabilistic model, by;

(a) obtaining a first frequence of each of said labels being output at each of said transition over the time in which the corresponding first label string is input into the corresponding probabilistic model;

(b) obtaining a second frequence of each of said states occurring over the time in which the corresponding first label string is inputted into the corresponding probabilistic model;

(c) obtaining each of the new probabilities of said corresponding probabilistic model by dividing the corresponding first frequence by the corresponding second frequence;

storing the first and second frequences obtained in the last step of said iteration;

generating, for each of the recognition units requiring adaptation, a second label string from adaptation data thereof;

obtaining, for each of said recognition units requiring adaptation a third frequence of each of said labels being outputted at each of said transitions over the time in which the corresponding second label string is inputted into the corresponding probabilistic model;

obtaining, for each of said recognition units requiring adaptation, a fourth frequence of each of said states occurring over the time in which the corresponding second label string is outputted into the corresponding probabilistic model;

obtaining fifth frequences by interpolation of the corresponding first and third counts;

obtaining sixth frequences by interpolation of the corresponding second and third frequences; and, obtaining each of adapted probabilities for said adaptation data by dividing the corresponding fifth frequence by the corresponding sixth count.
2. The method in accordance with Claim 1 wherein each of said first counts is stored indirectly as a product of the corresponding probability and the corresponding second frequence.
3. The method in accordance with Claim 1 or 2 wherein each of probabilities of the said probabilistic model into which adaptation data is to be inputted have been subjected to smoothing operation.
CA000528993A 1986-03-25 1987-02-04 Speech recognition method Expired CA1256562A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP65030/86 1986-03-25
JP61065030A JPS62231993A (en) 1986-03-25 1986-03-25 Voice recognition

Publications (1)

Publication Number Publication Date
CA1256562A true CA1256562A (en) 1989-06-27

Family

ID=13275169

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000528993A Expired CA1256562A (en) 1986-03-25 1987-02-04 Speech recognition method

Country Status (5)

Country Link
US (1) US4829577A (en)
EP (1) EP0243009B1 (en)
JP (1) JPS62231993A (en)
CA (1) CA1256562A (en)
DE (1) DE3773039D1 (en)

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01102599A (en) * 1987-10-12 1989-04-20 Internatl Business Mach Corp <Ibm> Voice recognition
DE3876379T2 (en) * 1987-10-30 1993-06-09 Ibm AUTOMATIC DETERMINATION OF LABELS AND MARKOV WORD MODELS IN A VOICE RECOGNITION SYSTEM.
US5072452A (en) * 1987-10-30 1991-12-10 International Business Machines Corporation Automatic determination of labels and Markov word models in a speech recognition system
JP2545914B2 (en) * 1988-02-09 1996-10-23 日本電気株式会社 Speech recognition method
JP2733955B2 (en) * 1988-05-18 1998-03-30 日本電気株式会社 Adaptive speech recognition device
JPH0293597A (en) * 1988-09-30 1990-04-04 Nippon I B M Kk Speech recognition device
US5027406A (en) * 1988-12-06 1991-06-25 Dragon Systems, Inc. Method for interactive speech recognition and training
JPH067348B2 (en) * 1989-04-13 1994-01-26 株式会社東芝 Pattern recognition device
US5509104A (en) * 1989-05-17 1996-04-16 At&T Corp. Speech recognition employing key word modeling and non-key word modeling
CA2015410C (en) * 1989-05-17 1996-04-02 Chin H. Lee Speech recognition employing key word modeling and non-key word modeling
US5220639A (en) * 1989-12-01 1993-06-15 National Science Council Mandarin speech input method for Chinese computers and a mandarin speech recognition machine
US5129001A (en) * 1990-04-25 1992-07-07 International Business Machines Corporation Method and apparatus for modeling words with multi-arc markov models
DE4024890A1 (en) * 1990-08-06 1992-02-13 Standard Elektrik Lorenz Ag ADAPTATION OF REFERENCE LANGUAGE PATTERNS TO ENVIRONMENTAL PRONOUNCEMENT VERSIONS
US5182773A (en) * 1991-03-22 1993-01-26 International Business Machines Corporation Speaker-independent label coding apparatus
US5278942A (en) * 1991-12-05 1994-01-11 International Business Machines Corporation Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data
US5544257A (en) * 1992-01-08 1996-08-06 International Business Machines Corporation Continuous parameter hidden Markov model approach to automatic handwriting recognition
JP2795058B2 (en) * 1992-06-03 1998-09-10 松下電器産業株式会社 Time series signal processing device
US5502774A (en) * 1992-06-09 1996-03-26 International Business Machines Corporation Automatic recognition of a consistent message using multiple complimentary sources of information
WO1994015330A1 (en) * 1992-12-18 1994-07-07 Sri International Method and apparatus for automatic evaluation of pronunciation
JPH0776880B2 (en) * 1993-01-13 1995-08-16 日本電気株式会社 Pattern recognition method and apparatus
US5627939A (en) 1993-09-03 1997-05-06 Microsoft Corporation Speech recognition system and method employing data compression
US5602963A (en) * 1993-10-12 1997-02-11 Voice Powered Technology International, Inc. Voice activated personal organizer
US5794197A (en) * 1994-01-21 1998-08-11 Micrsoft Corporation Senone tree representation and evaluation
US6061652A (en) * 1994-06-13 2000-05-09 Matsushita Electric Industrial Co., Ltd. Speech recognition apparatus
US5805771A (en) * 1994-06-22 1998-09-08 Texas Instruments Incorporated Automatic language identification method and system
US5805772A (en) * 1994-12-30 1998-09-08 Lucent Technologies Inc. Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization
US5864810A (en) * 1995-01-20 1999-01-26 Sri International Method and apparatus for speech recognition adapted to an individual speaker
US5710866A (en) * 1995-05-26 1998-01-20 Microsoft Corporation System and method for speech recognition using dynamically adjusted confidence measure
US5913193A (en) * 1996-04-30 1999-06-15 Microsoft Corporation Method and system of runtime acoustic unit selection for speech synthesis
US5937384A (en) * 1996-05-01 1999-08-10 Microsoft Corporation Method and system for speech recognition using continuous density hidden Markov models
US5806030A (en) * 1996-05-06 1998-09-08 Matsushita Electric Ind Co Ltd Low complexity, high accuracy clustering method for speech recognizer
US5835890A (en) * 1996-08-02 1998-11-10 Nippon Telegraph And Telephone Corporation Method for speaker adaptation of speech models recognition scheme using the method and recording medium having the speech recognition method recorded thereon
US6151575A (en) * 1996-10-28 2000-11-21 Dragon Systems, Inc. Rapid adaptation of speech models
US6349281B1 (en) * 1997-01-30 2002-02-19 Seiko Epson Corporation Voice model learning data creation method and its apparatus
US6212498B1 (en) 1997-03-28 2001-04-03 Dragon Systems, Inc. Enrollment in speech recognition
US6223156B1 (en) * 1998-04-07 2001-04-24 At&T Corp. Speech recognition of caller identifiers using location information
US6343267B1 (en) 1998-04-30 2002-01-29 Matsushita Electric Industrial Co., Ltd. Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques
US6263309B1 (en) 1998-04-30 2001-07-17 Matsushita Electric Industrial Co., Ltd. Maximum likelihood method for finding an adapted speaker model in eigenvoice space
US6163768A (en) 1998-06-15 2000-12-19 Dragon Systems, Inc. Non-interactive enrollment in speech recognition
US6233557B1 (en) 1999-02-23 2001-05-15 Motorola, Inc. Method of selectively assigning a penalty to a probability associated with a voice recognition system
US6463413B1 (en) * 1999-04-20 2002-10-08 Matsushita Electrical Industrial Co., Ltd. Speech recognition training for small hardware devices
US6526379B1 (en) 1999-11-29 2003-02-25 Matsushita Electric Industrial Co., Ltd. Discriminative clustering methods for automatic speech recognition
US6571208B1 (en) 1999-11-29 2003-05-27 Matsushita Electric Industrial Co., Ltd. Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training
US7216077B1 (en) * 2000-09-26 2007-05-08 International Business Machines Corporation Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
CA2397466A1 (en) 2001-08-15 2003-02-15 At&T Corp. Systems and methods for aggregating related inputs using finite-state devices and extracting meaning from multimodal inputs using aggregation
US7257575B1 (en) 2002-10-24 2007-08-14 At&T Corp. Systems and methods for generating markup-language based expressions from multi-modal and unimodal inputs
US7362892B2 (en) * 2003-07-02 2008-04-22 Lockheed Martin Corporation Self-optimizing classifier
JP4366652B2 (en) * 2004-04-23 2009-11-18 横河電機株式会社 Transmitter and duplexing method thereof
WO2009078256A1 (en) * 2007-12-18 2009-06-25 Nec Corporation Pronouncing fluctuation rule extraction device, pronunciation fluctuation rule extraction method and pronunciation fluctation rule extraction program
US9020816B2 (en) * 2008-08-14 2015-04-28 21Ct, Inc. Hidden markov model for speech processing with training method
US8473293B1 (en) * 2012-04-17 2013-06-25 Google Inc. Dictionary filtering using market data
EP2713367B1 (en) * 2012-09-28 2016-11-09 Agnitio, S.L. Speaker recognition
US10141009B2 (en) 2016-06-28 2018-11-27 Pindrop Security, Inc. System and method for cluster-based audio event detection
US9824692B1 (en) 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
US10325601B2 (en) 2016-09-19 2019-06-18 Pindrop Security, Inc. Speaker recognition in the call center
US10553218B2 (en) * 2016-09-19 2020-02-04 Pindrop Security, Inc. Dimensionality reduction of baum-welch statistics for speaker recognition
CA3036561C (en) 2016-09-19 2021-06-29 Pindrop Security, Inc. Channel-compensated low-level features for speaker recognition
US10397398B2 (en) 2017-01-17 2019-08-27 Pindrop Security, Inc. Authentication using DTMF tones
US11355103B2 (en) 2019-01-28 2022-06-07 Pindrop Security, Inc. Unsupervised keyword spotting and word discovery for fraud analytics
WO2020163624A1 (en) 2019-02-06 2020-08-13 Pindrop Security, Inc. Systems and methods of gateway detection in a telephone network
US11646018B2 (en) 2019-03-25 2023-05-09 Pindrop Security, Inc. Detection of calls from voice assistants

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4587670A (en) * 1982-10-15 1986-05-06 At&T Bell Laboratories Hidden Markov model speech recognition arrangement
US4593367A (en) * 1984-01-16 1986-06-03 Itt Corporation Probabilistic learning element
US4718094A (en) * 1984-11-19 1988-01-05 International Business Machines Corp. Speech recognition system
US4741036A (en) * 1985-01-31 1988-04-26 International Business Machines Corporation Determination of phone weights for markov models in a speech recognition system
US4748670A (en) * 1985-05-29 1988-05-31 International Business Machines Corporation Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor
US4759068A (en) * 1985-05-29 1988-07-19 International Business Machines Corporation Constructing Markov models of words from multiple utterances
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method

Also Published As

Publication number Publication date
JPS62231993A (en) 1987-10-12
EP0243009A1 (en) 1987-10-28
US4829577A (en) 1989-05-09
DE3773039D1 (en) 1991-10-24
EP0243009B1 (en) 1991-09-18
JPH0355837B2 (en) 1991-08-26

Similar Documents

Publication Publication Date Title
CA1256562A (en) Speech recognition method
US5050215A (en) Speech recognition method
EP0303022B1 (en) Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker
US5031217A (en) Speech recognition system using Markov models having independent label output sets
US5787396A (en) Speech recognition method
US4827521A (en) Training of markov models used in a speech recognition system
US5825978A (en) Method and apparatus for speech recognition using optimized partial mixture tying of HMM state functions
EP1557823B1 (en) Method of setting posterior probability parameters for a switching state space model
US20050119887A1 (en) Method of speech recognition using variational inference with switching state space models
Gong Stochastic trajectory modeling and sentence searching for continuous speech recognition
US5825977A (en) Word hypothesizer based on reliably detected phoneme similarity regions
JPH0962291A (en) Pattern adaptive method using describing length minimum reference
US6173076B1 (en) Speech recognition pattern adaptation system using tree scheme
Bacchiani et al. Design of a speech recognition system based on acoustically derived segmental units
JP3589044B2 (en) Speaker adaptation device
JP2982689B2 (en) Standard pattern creation method using information criterion
JPH08123469A (en) Phrase border probability calculating device and continuous speech recognition device utilizing phrase border probability
JP3571821B2 (en) Speech recognition device, dictionary of word components, and learning method of hidden Markov model
Aibar et al. Geometric pattern recognition techniques for acoustic-phonetic decoding of Spanish continuous speech
JPH07230295A (en) Speaker adaptive system
JP3105708B2 (en) Voice recognition device
Gupta et al. Noise robust acoustic signal processing using a Hybrid approach for speech recognition
JPH0764588A (en) Speech recognition device
Yu et al. A Multi-State NN/HMM Hybrid Method for High Performance Speech Recognition
Redol A Ny p) IS CRINI INATIVE TRAINING ALGORITHM FOR HIDDEN NIARKOV MODELS

Legal Events

Date Code Title Description
MKEX Expiry