CN102592595A - Voice recognition method and system - Google Patents

Voice recognition method and system Download PDF

Info

Publication number
CN102592595A
CN102592595A CN2012100734129A CN201210073412A CN102592595A CN 102592595 A CN102592595 A CN 102592595A CN 2012100734129 A CN2012100734129 A CN 2012100734129A CN 201210073412 A CN201210073412 A CN 201210073412A CN 102592595 A CN102592595 A CN 102592595A
Authority
CN
China
Prior art keywords
path
hot
speech
historical path
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100734129A
Other languages
Chinese (zh)
Other versions
CN102592595B (en
Inventor
潘青华
鹿晓亮
何婷婷
王智国
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN2012100734129A priority Critical patent/CN102592595B/en
Publication of CN102592595A publication Critical patent/CN102592595A/en
Application granted granted Critical
Publication of CN102592595B publication Critical patent/CN102592595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of voice recognition and discloses voice recognition method and system. The method comprises the following steps: a decoding recognition network is built; each voice signal frame of received voice signals is decoded according to the decoding recognition network, and in the decoding process, historical pathways of active nodes are excitated according to hot words so as to improve the accumulation probability of the historical pathways where the hot words are positioned; the active node with maximal cumulative probability is selected to serve as an optimal node until the last voice signal frame is decoded; and the optimal node is traced back to an optimal pathway and a corresponding word sequence from the decoding state. According to the invention, system parameter reassessment is avoided, and hot words and user personalized words can be recognized quickly and accurately, so that the recognition effect of the hot words is improved.

Description

Audio recognition method and system
Technical field
The present invention relates to the speech recognition technology field, particularly a kind of audio recognition method and system.
Background technology
Realize man-machine between hommization, intelligentized effectively mutual, make up man-machine communication's environment of efficient natural, become the active demand of current information technical application and development.In recent years, along with the develop rapidly of speech recognition technology, various online speech recognition application such as phonetic entry, phonetic search have received increasing concern.System based on the mass data training can satisfy the needs that phonetic entry commonly used is write in advance, and recognition accuracy is often higher when the phonetic entry content meets the distribution of original language model probability especially.Yet in practical application; Mobile Internet and social networks fast development are constantly producing new much-talked-about topic and corresponding focus vocabulary; Also there is the identification demand of different personalized vocabulary in different user, and like contact name etc., these focus vocabulary or personalized vocabulary are because ageing often occurrence frequency is lower in the language material of acquired original with specificity; Thereby the original language model often covers deficiency to such vocabulary, and then causes corresponding recognition system can not accurately discern such hot speech.
For this reason, often adopt the method for systematic parameter revaluation in the prior art, after the hot word material that will newly collect adds former corpus, train new language model again to improve recognition accuracy to new gain of heat speech.Yet in practical application, hot speech update frequency is often higher, and the revaluation of enough language material participation system parameters can't in time be collected by system, and then influence is to the recognition effect of hot speech.On the other hand; Again the training of language model and recognition system resource are (as based on WFST (Weighted Finite-State Transducers; The weighting FST) structure decoding recognition network) is often time-consuming more; Cost is bigger, can't realize the quick response to hot speech identification.
Summary of the invention
The embodiment of the invention provides a kind of audio recognition method and system, can't fast, accurately discern the technical matters of focus vocabulary and user individual vocabulary to solve prior art.
For this reason, the embodiment of the invention provides following technical scheme:
A kind of audio recognition method comprises:
Make up the decoding recognition network;
To the voice signal that receives, according to said decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot speech place according to hot speech;
After accomplishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node;
Recall the word sequence that obtains optimal path and correspondence through decoded state from said optimum node.
A kind of speech recognition system comprises:
The network struction unit is used for making up the decoding recognition network;
Decoding unit is used for the voice signal to receiving, and according to said decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit is used for the historical path of live-vertex being encouraged according to hot speech at the decode procedure of said decoding unit, to improve the accumulated history path probability in path, hot speech place;
Optimum node is confirmed the unit, is used for after said decoding unit is accomplished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit is used for recalling the word sequence that obtains optimal path and correspondence through decoded state from said optimum node.
Embodiment of the invention audio recognition method and system; Employing encourages the historical path of live-vertex based on hot speech coupling; To improve the accumulated history path probability in path, hot speech place, realized effective excitation to hot speech identification, improved the recognition effect of hot speech.Need not the systematic parameter revaluation; Can fast, accurately discern focus vocabulary and user individual vocabulary; For system supports the personalized dictionary or the individualized language model of customization a kind of feasible solution is provided, the user can realize the identification support to personalized vocabulary through the hot speech entry of simple update system support.
Description of drawings
In order to be illustrated more clearly in the technical scheme that the present invention implements; To do to introduce simply to the accompanying drawing of required use among the embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the process flow diagram of embodiment of the invention audio recognition method;
Fig. 2 is the synoptic diagram of a kind of decoding recognition network based on WFST in the embodiment of the invention;
Fig. 3 is the synoptic diagram of the hot speech dictionary of tree structure in the embodiment of the invention;
Fig. 4 is a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention;
Fig. 5 is a kind of process flow diagram that encourage in the historical path that according to hot speech dictionary decoding obtained in the embodiment of the invention;
Fig. 6 is a kind of realization flow figure that upgrades the accumulated history path probability in historical path in the embodiment of the invention according to the spreading result in historical path;
Fig. 7 is the another kind of realization flow figure that upgrades the accumulated history path probability in historical path in the embodiment of the invention according to the spreading result in historical path;
Fig. 8 is a kind of concrete synoptic diagram of hot speech dictionary in the embodiment of the invention;
Fig. 9 is the structural representation of embodiment of the invention speech recognition system;
Figure 10 is a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;
Figure 11 is the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;
Figure 12 is a kind of structural representation of excitation subelement in the embodiment of the invention;
Figure 13 is the another kind of structural representation of excitation subelement in the embodiment of the invention.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
For the scheme that makes the embodiment of the invention is more readily understood, and embody the difference with the existing voice identifying schemes better, at first audio recognition method basic in the prior art is done simple declaration below.
In the prior art; Normally the semantic network with language model is extended to the search network based on the model state layer through acoustic model and dictionary etc.; Promptly make up the decoding recognition network; When input speech signal is decoded, obtain new effective extensions path with respect to the accumulated history path probability of each acoustic model on current effective extensions path and language model then through each the frame voice signal that calculates input.When having searched for the last frame voice signal, obtain the optimal path of decoding subsequently, obtain corresponding word sequence through recalling from optimum node executing state with the historical path probability of cumulative maximum.
Embodiment of the invention audio recognition method and system; Adopt the mode of systematic parameter revaluation to improve recognition accuracy to prior art to new gain of heat speech; Can't fast, accurately discern the technical matters of focus vocabulary and user individual vocabulary; Current historical path is encouraged based on hot speech, thereby improve the accumulated history path probability in path, hot speech place, improved the recognition effect of hot speech.Need not the systematic parameter revaluation, can fast, accurately discern focus vocabulary and user individual vocabulary.
As shown in Figure 1, be the process flow diagram of embodiment of the invention audio recognition method, may further comprise the steps:
Step 101 makes up the decoding recognition network.
In embodiments of the present invention, said decoding recognition network can also can be made up through offline mode by the online structure of system, when system start-up, directly is written into, and to reduce system's operand and required memory, further improves decoding efficiency.
Step 102; To the voice signal that receives; According to said decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot speech place according to hot speech.
The process of utilizing said decoding recognition network that the voice signal of user's input is decoded is one and in this decoding recognition network, searches for optimal path, realizes the process of the conversion of speech-to-text.
Particularly, can at first deposit data buffer area in for the series of discrete energy value to the continuous speech signal sampling that receives.
Certainly, for the robustness of further raising system, can also carry out noise reduction process to the continuous speech signal that receives earlier.At first through short-time energy and short-time zero-crossing rate analysis to voice signal; Continuous voice signal is divided into independently voice snippet and non-voice segment; Carry out the voice enhancement process to cutting apart the voice snippet that obtains then, when carrying out the voice enhancement process, can be through methods such as Wiener filterings; Neighbourhood noise in the voice signal is further eliminated, to improve follow-up system this Signal Processing ability.
Consider and still can have the irrelevant redundant information of a large amount of speech recognitions in the voice signal after the noise reduction process; Directly operand and recognition accuracy are reduced to its identification; For this reason; Can extract identification efficient voice characteristic the speech energy signal after noise reduction process, and deposit in the characteristic buffer area.Particularly; Can extract MFCC (the Mel Frequency Cepstrum Coefficient of voice; The Mel frequency cepstral coefficient) characteristic, every frame speech data that the long 25ms frame of window is moved 10ms are done short-time analysis and are obtained MFCC parameter and single order thereof, second order difference, amount to 39 dimensions.That is to say, every frame voice signal is quantified as the characteristic sequence of one 39 dimensions.
Then, every frame voice signal is wherein decoded, obtain optimal path, thereby accomplish the decoding identifying according to said decoding recognition network.
In the prior art, the search procedure of optimal path is following: according to time sequencing from left to right, calculate the accumulated history path probability that every frame voice signal frame arrives each live-vertex in the decoding recognition network.
Particularly, for every frame voice signal frame that needs are investigated, can at first calculate in the current decoding recognition network all live-vertexs with respect to the historical path and the accumulated history path probability of this voice signal frame.
Such as, for a current frame voice signal, corresponding phonetic feature sequence is: { O 1, O 2..., O t, t phonetic feature O constantly wherein tChange the path probability of live-vertex j over to Promptly from live-vertex i to this node j might historical path the probability maximum value calculation following:
Figure BDA0000144773320000052
Wherein, i all live-vertexs that link to each other with live-vertex j in the recognition network of representing to decode;
Figure BDA0000144773320000053
Expression (t-1) is characteristic O constantly T-1Drop on the historical path probability on the live-vertex i; a IjThe transition probability of expression from node i to node j, and b j(o t) expression t moment characteristic O tLikelihood probability corresponding to node j.
The accumulated history path probability of live-vertex j for all with node path that live-vertex j links to each other in have the path score of cumulative maximum path probability.That is to say, in the cumulative path probability that calculates live-vertex j, also known the last node of live-vertex j, and then known the historical path of live-vertex j.
Then, obtain next frame voice signal frame, and expand decoding backward from the historical path of satisfying the systemic presupposition condition.After to last frame voice signal frame decoding; The live-vertex that wherein has the historical path probability of cumulative maximum is optimum node; Recall the historical path that obtains from this optimum node through decoded state and be optimal path, the word sequence on this optimal path is decoded result.
The vocabulary that can embody former corpus owing to the language model based on the mass data training well distributes, thereby the conventional vocabulary of major part is had recognition effect preferably.And focus vocabulary and user individual vocabulary is owing to have personalization, and probability is less in original language model, thereby its corresponding decoding path score is often on the low side, causes correctly discerning.
For this reason; In the present invention implements; Based on hot speech the historical path of live-vertex is encouraged, keep the time-to-live of hot speech in the searching route expansion, possibly be optimized with the path of hot speech coupling thereby make in the decoding recognition network; Improve the success ratio of hot speech coupling, hot speech identification correctness also can obtain corresponding raising.
Particularly, to possibly adopting different energisation modes with the path of hot speech coupling in the decoding recognition network, will specify in the back this.
Step 103, after accomplishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node.
Step 104 is recalled the word sequence that obtains optimal path and correspondence from said optimum node through decoded state.
Recall the historical path that obtains from said optimum node through decoded state and be optimal path.
This shows that embodiment of the invention audio recognition method adopts the historical path energized process based on hot speech coupling, the accumulated history path probability in path, hot speech place is optimized, improved the recognition effect of hot speech.Utilize embodiment of the invention audio recognition method; Need not the systematic parameter revaluation; Can fast, accurately discern focus vocabulary and user individual vocabulary; For system supports the personalized dictionary or the individualized language model of customization a kind of feasible solution is provided, the user can realize the identification support to personalized vocabulary through the hot speech dictionary of simple update system support.
The front is mentioned, and in embodiments of the present invention, said decoding recognition network can also can be made up through offline mode by the online structure of system, when system start-up, directly is written into, and to reduce system's operand and required memory, further improves decoding efficiency.
Particularly, the structure of said decoding recognition network can utilize structures such as preset acoustic model and language model.
Wherein, said acoustic model is mainly used in simulation character sound characteristics, specifically can adopt the field of speech recognition HMM based on transition probability and transmission probability (Hidden Markov Model, hidden Markov) model commonly used.Consider that in the large vocabulary continuous speech recognition, the quantity of vocabulary is too huge, if each character is all made up a HMM model, then model quantity is too many, is unfavorable for data storage and calculating.Therefore, in practical application, can make up the HMM model like syllable or phoneme unit only to basic pronunciation unit.Obviously acoustic model can also adopt the other technologies means, such as neural network etc., this embodiment of the invention is not done qualification.
Wherein, said language model is in order more effectively to characterize knowledge such as grammer and semanteme, to remedy the deficiency of acoustic model, to improve discrimination.Specifically can adopt field of speech recognition to use statistical language model always, utilize the mode descriptor of statistical probability and the relation between the speech, promptly suppose certain speech w kThe probability that occurs is only relevant with its preceding n-1 speech, is designated as
Figure BDA0000144773320000071
Obviously language model also can adopt the other technologies means, such as the words equity, this embodiment of the invention is not done qualification.
The structure of said decoding recognition network can adopt construction methods more of the prior art, utilizes acoustic model said language model expansion to be become the search network of model layer.Fig. 2 shows a kind of synoptic diagram of the decoding recognition network based on WFST.Certainly, can also adopt the decoding recognition network of other modes, as based on the dynamic decoder recognition network of historical speech tree copy etc.
In embodiments of the present invention, allow the self-defined at any time hot speech of user,, adapt to the demand that vocabulary constantly changes to enlarge the identification range of system.Particularly, after the user imports new hot speech, can said hot speech be saved in hot set of words.That is to say that said hot set of words can be a hot speech entry storehouse that dynamically updates.
In addition, when upgrading said hot speech dictionary, can also adopt the branch word algorithm; The existing dictionary that presets through using system carries out participle to the text that the user increases self-defining hot speech newly; Make each hot speech entry all be expressed as a sequence of being made up of the existing basic words unit of system, and the word segmentation result that obtains is saved in hot speech dictionary, this word segmentation result can be by tree structure management shown in Figure 3; The corresponding hot speech entry of each branch, specific as follows:
Hot speech A:Wordi1 Wordi2 Wordi3;
Hot speech B:Wordi1 Wordi4;
Hot speech C:Wordi5;
...;
Hot speech N:Wordij...Wordik.
This hot speech dictionary can be that a hot speech that dynamically updates divides dictionary.
Based on above-mentioned hot set of words or hot speech dictionary, to possibly can adopting different energisation modes with the path of hot speech coupling in the decoding recognition network, to specifying below this.
1. according to hot set of words the historical path that decoding obtains is optimized
As shown in Figure 4, be a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention.
In this flow process, give suitable score excitation when corresponding word sequence can constitute a hot speech entry in the historical path of live-vertex, to improve the priority in this history path.
This flow process specifically may further comprise the steps:
Step 401 is obtained the historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.
Step 402 judges according to said hot set of words whether contiguous word constitutes a hot speech on the said historical path; If then execution in step 403; Otherwise, execution in step 404.
Step 403 improves the accumulated history path probability in said historical path.
Such as, can take the method for constant excitation bonus point to improve the priority in path, hot speech place, so that path, hot speech place keeps more easily.The concrete numerical value of constant excitation bonus point can preestablish as required, and in general, the excitation bonus point is high more, and then hot speech matching degree is high more, the also corresponding raising of hot speech identification correctness.
The reference value that the constant excitation bonus point can be set on the basis of the hot speech discrimination of balance and other non-hot speech discrimination is 300, certainly, also can be other numerical value, and this embodiment of the invention is not done qualification.
Step 404 keeps the accumulated history path probability in said historical path.
Need to prove; Step 402 to step 404; Need carry out one by one the historical path of each bar that step 401 is obtained; The historical path that does not comprise hot speech in the historical path of all live-vertexs that promptly step 401 obtained keeps its path probability, and the historical path probability that comprises hot speech is encouraged.
Need to prove, in practical application, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly; Can adopt the method for reducing thresholding based on probability; At first add up the mxm. of current live-vertex accumulated history path probability, calculate the accumulated history path probability of each live-vertex and the difference of this mxm. then respectively, difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that is provided with in advance; And from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
In embodiments of the present invention; Said process need be after encourage according to flow process shown in Figure 4 in the historical path of all live-vertexs that the decoding to the current speech signal frame obtains; Accumulated history path probability according to current all live-vertexs (comprise excitation after live-vertex and unperturbed live-vertex) is confirmed new live-vertex, realizes the expansion of subsequent path.
In flow process shown in Figure 4, improve the priority in path, hot speech place through the method for constant excitation bonus point, realize optimization to path, hot speech place.
In practical application; Because hot speech often is made up of two or more participles; The above-mentioned mode that when complete coupling realized in hot speech entry, encourages may not reached the purpose of excitation too early owing to path, hot speech entry place by reduction, influence the accuracy rate of hot speech identification.For this reason, in embodiments of the present invention, can also adopt motivational techniques, improve constantly the priority in path, hot speech place through the mode that progressively encourages based on predicted path.
2. according to hot speech dictionary the historical path that decoding obtains is optimized
As shown in Figure 5, be a kind of process flow diagram that encourage in the historical path that according to hot speech dictionary decoding obtained in the embodiment of the invention, may further comprise the steps:
Step 501 is obtained the historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.
Step 502 judges in the historical path of said live-vertex neologisms whether occur; If then execution in step 503; Otherwise, execution in step 504.
Step 503 according to the spreading result of said neologisms with respect to the participle in the corresponding said hot speech dictionary on the said historical path, is upgraded the accumulated history path probability in said historical path.
Step 504 keeps the accumulated history path probability in said historical path.
Need to prove,, all need carry out the process of above-mentioned steps 502 to step 504 for the historical path of all live-vertexs of each frame voice signal frame.And, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex being optimized, the impossible path of deletion is to improve subsequent searches efficient.Particularly; Can adopt the method for reducing thresholding based on probability; At first add up the mxm. of current live-vertex accumulated history path probability, calculate the accumulated history path probability of each live-vertex and the difference of this mxm. then respectively, difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that is provided with in advance; And from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
Be different from above-mentioned when hot speech matees fully just to the excitation of historical path; Step 503 is carried out excitation to historical path when hot speech partly matees in decode procedure; Promptly upgrade the accumulated history path probability in historical path according to the spreading result in historical path; Through improving the path accumulated probability in advance, can guarantee the survival probability in path, hot speech place better.
The multiple diverse ways of can adopting of above-mentioned steps 503 realizes, such as:
As shown in Figure 6, be a kind of realization flow figure that upgrades the accumulated history path probability in historical path in the embodiment of the invention according to the spreading result in historical path, may further comprise the steps:
Step 601 judges whether neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path; If then execution in step 605; Otherwise, execution in step 602.
Step 602 judges whether said neologisms are the initial participle in the said hot speech dictionary; If then execution in step 603; Otherwise, execution in step 604.
Step 603 is carried out and the corresponding bonus point of said initial participle the accumulated history path probability in said historical path.
Step 604 keeps the accumulated history path probability in said historical path.
Step 605 is carried out and the corresponding bonus point of said follow-up participle the accumulated history path probability in said historical path.
As shown in Figure 7, be the another kind of realization flow figure that upgrades the accumulated history path probability in historical path in the embodiment of the invention according to the spreading result in historical path, may further comprise the steps:
Step 701 judges that neologisms belong to whether the above neologisms of historical path word sequence before is a complete thermal speech in the said hot speech dictionary; If then execution in step 705; Otherwise, execution in step 702.
Step 702 judges whether said neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path; If then execution in step 703; Otherwise, execution in step 704.
Step 703 is carried out and the corresponding bonus point of said follow-up participle the accumulated history path probability in said historical path.
Step 704, before cancelling to the bonus point of the accumulated history path probability in said historical path.
Step 705 judges whether said neologisms are the initial participle in the said hot speech dictionary; If then execution in step 706; Otherwise, execution in step 707.
Step 706 is carried out and the corresponding bonus point of said initial participle the accumulated history path probability in said historical path.
Step 707 keeps the accumulated history path probability in said historical path.
Utilize flow process shown in Figure 7, can further avoid by error non-hot speech being belonged to the excitation in historical path.
In practical application,, then in decoding, possibly can't guarantee that all paths that comprise hot speech can both come to light in addition if each live-vertex corresponding historical path is only kept a hot speech participle historical record.For example; Word1Word2Word3 classified as in the word order of hot speech A; And Word2Word4 classified as in the word order of hot speech B, when Word1Word2Word4 classified as in the input word order, when " Word2 " decoded, can preferentially match the Word2 participle of hot speech A; And ignored the Word2 participle of hot speech B, make word sequence Word1Word2Word4 finally not match on the hot speech B of Word2Word4.To this, in embodiments of the present invention, can also adopt historical path to keep the rationality that the historical method of a plurality of hot speech participle couplings improves hot speech coupling to single live-vertex.Promptly when " Word2 " decoded, not only keep the part coupling history of the hot speech A of " Word1Word2 ", the part of the hot speech B of reservation " Word2 " coupling is historical simultaneously, and to a plurality of excitation added values of the corresponding preservation in same historical path.Subsequently in subsequent decoding when definite a certain hot speech participle matching result can not continue expansion, before cancelling again to said historical path because the bonus point of the accumulated history path probability that hot speech participle produces.
In practical application, can path probability be encouraged according to the matching degree of decoding gained word and hot speech participle.Such as, can when neologisms are obtained in decoding, obtain the hot speech participle weight of system intialization, and the path is encouraged.Special, the weight of hot speech participle head and the tail character that can also be through hot speech dictionary for word segmentation is set, and other participle weights simply are set to 0 to simplify the process of motivation.Such as, suppose that current hot speech dictionary is as shown in Figure 8, constitute by " China, China, the people, people ".During decoding identification; If occur on the historical path of certain live-vertex of obtaining of decoding " in " perhaps when " people "; During promptly with the initial character of hot speech dictionary coupling; To said historical path give with corresponding hot speech participle " in " or " people " corresponding score value excitation, promptly the accumulated history path probability in said historical path is carried out the bonus point corresponding with said initial character.Subsequently, when subsequent expansion is carried out in path, hot speech participle place,, constitute " China " " China " if occur " state, China " or " people; " on the subsequent expansion path; " people " when the hot speech of " people " or portion of hot speech, then continue this path is given and " state "; " China ", " people ", " " corresponding score value excitation.Otherwise when path, said hot speech participle place expands to other non-hot speech words, the path, place is not encouraged, perhaps delete the excitation that the front increases.
It is thus clear that the audio recognition method of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority that hot speech belongs to the path through the mode that progressively encourages, realize optimization to path, hot speech place, improved the recognition effect of hot speech.
Correspondingly, the embodiment of the invention also provides a kind of speech recognition system, and is as shown in Figure 9, is a kind of structural representation of this system.
In this embodiment, said system comprises:
Network struction unit 901 is used for making up the decoding recognition network;
Decoding unit 902 is used for the voice signal to receiving, and according to said decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit 903 is used for the historical path of live-vertex being encouraged according to hot speech at the decode procedure of said decoding unit 902, to improve the accumulated history path probability in path, hot speech place;
Optimum node is confirmed unit 904, is used for after said decoding unit 902 is accomplished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit 905 is used for recalling the word sequence that obtains optimal path and correspondence through decoded state from said optimum node.
In embodiments of the present invention; Network struction unit 901 specifically can make up said decoding recognition network through online mode; Also can make up said decoding recognition network through offline mode; When system start-up, directly be written into the decoding recognition network that has made up, can reduce system's operand and required memory, further improve decoding efficiency.Network struction unit 901 specifically can utilize structures such as preset acoustic model and language model, is not described in detail at this.
In embodiments of the present invention; The process that decoding unit 902 utilizes said decoding recognition network that the voice signal of user's input is decoded is one and calculates the process that every frame voice signal frame arrives the accumulated history path probability of each live-vertex in the decoding recognition network; After decoding unit 902 is to each frame voice signal frame decoding; Can obtain the historical path and the accumulated history path probability of current all live-vertexs; Exciting unit 903 can adopt multitude of different ways according to hot speech the historical path of live-vertex to be encouraged, and improves the accumulated history path probability in path, hot speech place, specifically will specify in the back.
Need to prove, in practical application, before decoding unit 902 carries out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly; Can adopt the method for reducing thresholding based on probability; At first add up the mxm. of current live-vertex accumulated history path probability, calculate the accumulated history path probability of each live-vertex and the difference of this mxm. then respectively, difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that is provided with in advance; And from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
This shows that embodiment of the invention speech recognition system adopts the historical path energized process based on hot speech coupling, the accumulated history path probability in path, hot speech place is optimized, improved the recognition effect of hot speech.Utilize embodiment of the invention speech recognition system; Need not the systematic parameter revaluation; Can fast, accurately discern focus vocabulary and user individual vocabulary; For system supports the personalized dictionary or the individualized language model of customization a kind of feasible solution is provided, the user can realize the identification support to personalized vocabulary through the hot speech dictionary of simple update system support.
In embodiments of the present invention, allow the self-defined at any time hot speech of user,, adapt to the demand that vocabulary constantly changes to enlarge the identification range of system.
Shown in figure 10, be a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.
In this embodiment, said system also comprises: hot set of words updating block 911 is used to obtain the hot speech of user's input, and said hot speech is saved in hot set of words.This hot set of words can be a hot speech entry storehouse that dynamically updates.
Correspondingly, exciting unit 913 encourages the historical path of live-vertex according to said hot set of words in decode procedure.
In this embodiment, exciting unit 913 specifically comprises: obtain subelement 9131, judgment sub-unit 9132 and excitation subelement 9133.Wherein:
Obtain subelement 9131, be used to obtain the historical path and the accumulated history path probability of all live-vertexs that 702 pairs of current speech signal frames decodings of decoding unit obtain;
Judgment sub-unit 9132 is used for judging according to said hot set of words whether contiguous word constitutes a hot speech on the said historical path;
Excitation subelement 9133 is used for after said judgment sub-unit 9132 judges that contiguous word constitutes a hot speech on the said historical path, improving the accumulated history path probability in said historical path; After said judgment sub-unit 9132 judges that contiguous word can not constitute a hot speech on the said historical path, keep the accumulated history path probability in said historical path.When encouraging, can take the method for constant excitation bonus point to improve the priority in path, hot speech place, so that path, hot speech place keeps more easily.The concrete numerical value of constant excitation bonus point can preestablish as required, and in general, the excitation bonus point is high more, and then hot speech matching degree is high more, the also corresponding raising of hot speech identification correctness.
The speech recognition system of this embodiment gives suitable score excitation when corresponding word sequence can constitute a hot speech entry in the historical path of live-vertex, realizes the optimization to path, hot speech place, has improved the recognition effect of hot speech.
In practical application; Because hot speech often is made up of two or more participles; The above-mentioned mode that when complete coupling realized in hot speech entry, encourages may not reached the purpose of excitation too early owing to path, hot speech entry place by reduction, influence the accuracy rate of hot speech identification.For this reason, in another embodiment of speech recognition system of the present invention, exciting unit can also adopt the motivational techniques based on predicted path, improves constantly the priority in path, hot speech place through the mode that progressively encourages.
Shown in figure 11, be the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.
In this embodiment, said system also comprises: hot speech acquiring unit 921 and cutting unit 922, and wherein, hot speech acquiring unit 921 is used to obtain the hot speech of user's input; Cutting unit 922 is used for the hot speech that said hot speech acquiring unit obtains is carried out the text participle, and the participle that obtains is saved in hot speech dictionary.
Correspondingly, exciting unit 923 encourages the historical path of live-vertex according to said hot speech dictionary in decode procedure.
In this embodiment, exciting unit 923 specifically comprises: obtain subelement 9231, neologisms judgment sub-unit 9232 and excitation subelement 9233.Wherein:
Obtain subelement 9231, be used to obtain the historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Neologisms judgment sub-unit 9232 is used for judging whether the historical path of said live-vertex neologisms occur;
Excitation subelement 9233; Be used for after said neologisms judgment sub-unit 9232 judges that neologisms appear in the historical path of said live-vertex; According to the spreading result of said neologisms, upgrade the accumulated history path probability in said historical path with respect to the participle in the corresponding said hot speech dictionary on the said historical path; Otherwise the accumulated history path probability that keeps said historical path.
In this embodiment; The excitation subelement 9133 that is different among above-mentioned Figure 10 just encourages historical path when hot speech matees fully; Excitation subelement 9233 is carried out excitation to historical path when hot speech partly matees in decode procedure; Promptly upgrade the accumulated history path probability in historical path,, can guarantee the survival probability in path, hot speech place better through improving the path accumulated probability in advance according to the spreading result in historical path.Said excitation subelement 9233 can have multiple implementation, describes in detail for example below.
Shown in figure 12, be a kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.
In this embodiment, said excitation subelement comprises:
First judgment sub-unit 121 is used to judge whether said neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path;
First handles subelement 122; Be used for after said first judgment sub-unit 121 judges that said neologisms are the follow-up participle of the hot speech participle of correspondence on the said historical path, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said follow-up participle;
Second judgment sub-unit 123 is used for after said first judgment sub-unit 121 judges that said neologisms are not the follow-up participle of the hot speech participle of correspondence on the said historical path, judging whether said neologisms are the initial participle in the said hot speech dictionary;
Second handles subelement 124, is used for after said second judgment sub-unit 123 judges that said neologisms are the initial participle of said hot speech dictionary, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said initial participle; Otherwise the accumulated history path probability that keeps said historical path.
Shown in figure 13, be the another kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.
In this embodiment, said excitation subelement comprises:
Hot speech judgment sub-unit 131 is used for judging that said neologisms belong to whether the above neologisms of historical path word sequence before is a complete thermal speech of said hot speech dictionary;
The 3rd judgment sub-unit 132; Be used for when said hot speech judgment sub-unit 131 judges that word sequence before said neologisms belong to the above neologisms of historical path is complete thermal speech of said hot speech dictionary, judging whether said neologisms are the initial participles in the said hot speech dictionary;
The 3rd handles subelement 133, is used for after said the 3rd judgment sub-unit 132 judges that said neologisms are the initial participle of said hot speech dictionary, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said initial participle; Otherwise the accumulated history path probability that keeps said historical path;
The 4th judgment sub-unit 134; Be used for when said hot speech judgment sub-unit 131 judges that word sequence before said neologisms belong to the above neologisms of historical path is not complete thermal speech of said hot speech dictionary, judge that whether said neologisms are the follow-up participles of hot speech participle corresponding on the said historical path;
Manages subelement 135 everywhere; Be used for after said the 4th judgment sub-unit 134 judges that said neologisms are the follow-up participle of the hot speech participle of correspondence on the said historical path, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in said historical path.
Certainly, in practical application, said excitation subelement can also have other implementation, and this embodiment of the invention is not done qualification.Such as; It is historical to keep a plurality of hot speech participle couplings to the historical path of single live-vertex; In subsequent decoding when definite a certain hot speech participle matching result can not continue expansion; Before cancelling again to said historical path because the bonus point of the accumulated history path probability that hot speech participle produces further improves the rationality of hot speech coupling with this.
The speech recognition system of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority that hot speech belongs to the path through the mode that progressively encourages, and realizes the optimization to path, hot speech place, has improved the recognition effect of hot speech.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses all is the difference with other embodiment.Especially, for system embodiment, because it is basically similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.System embodiment described above only is schematically, and wherein said unit and module as the separating component explanation can or can not be physically to separate also.In addition, the purpose that can also select wherein some or all of unit and module to realize the present embodiment scheme according to the needs of reality.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
More than disclosedly be merely preferred implementation of the present invention; But the present invention is not limited thereto; Any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvement and the retouching of being done under the principle of the invention prerequisite.

Claims (14)

1. an audio recognition method is characterized in that, comprising:
Make up the decoding recognition network;
To the voice signal that receives, according to said decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot speech place according to hot speech;
After accomplishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node;
Recall the word sequence that obtains optimal path and correspondence through decoded state from said optimum node.
2. the method for claim 1 is characterized in that, said method also comprises: obtain the hot speech of user's input, and said hot speech is saved in hot set of words;
Said in decode procedure, according to hot speech the historical path of live-vertex the excitation comprises: in decode procedure, according to said hot set of words the historical path of live-vertex is encouraged.
3. method as claimed in claim 2 is characterized in that, said in decode procedure, according to said hot set of words the historical path of live-vertex the excitation comprises:
Obtain the historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judge according to said hot set of words whether contiguous word constitutes a hot speech on the said historical path;
If then improve the accumulated history path probability in said historical path;
If, then do not keep the accumulated history path probability in said historical path.
4. the method for claim 1 is characterized in that, said method also comprises: obtain the hot speech of user's input, said hot speech is carried out the text participle, and the participle that obtains is saved in hot speech dictionary;
Said in decode procedure, according to hot speech the historical path of live-vertex the excitation comprises: in decode procedure, according to said hot speech dictionary the historical path of live-vertex is encouraged.
5. method as claimed in claim 4 is characterized in that, said in decode procedure, according to said hot speech dictionary the historical path of live-vertex the excitation comprises:
Obtain the historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judge in the historical path of said live-vertex and neologisms whether occur;
If, upgrade the accumulated history path probability in said historical path then according to the spreading result of said neologisms with respect to the participle in the corresponding said hot speech dictionary on the said historical path;
If, then do not keep the accumulated history path probability in said historical path.
6. method as claimed in claim 5 is characterized in that, said according to the spreading result of said neologisms with respect to the participle in the corresponding said hot speech dictionary on the said historical path, the accumulated history path probability of upgrading said historical path comprises:
Judge whether said neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path;
If then the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said follow-up participle;
If, do not judge then whether said neologisms are the initial participle in the said hot speech dictionary;
If then the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said initial participle.
If, then do not keep the accumulated history path probability in said historical path.
7. method as claimed in claim 5 is characterized in that, said according to the spreading result of said neologisms with respect to the participle in the corresponding said hot speech dictionary on the said historical path, the accumulated history path probability of upgrading said historical path comprises:
If the word sequence that said neologisms belong to before the above neologisms of historical path is a complete thermal speech in the said hot speech dictionary, judge then whether said neologisms are the initial participle in the said hot speech dictionary;
If then the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said initial participle;
If, then do not keep the accumulated history path probability in said historical path;
If the word sequence that said neologisms belong to before the above neologisms of historical path is not a complete thermal speech in the said hot speech dictionary, judge then whether said neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path;
If then the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said follow-up participle;
If not, before then cancelling to the bonus point of the accumulated history path probability in said historical path.
8. a speech recognition system is characterized in that, comprising:
The network struction unit is used for making up the decoding recognition network;
Decoding unit is used for the voice signal to receiving, and according to said decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit is used for the historical path of live-vertex being encouraged according to hot speech at the decode procedure of said decoding unit, to improve the accumulated history path probability in path, hot speech place;
Optimum node is confirmed the unit, is used for after said decoding unit is accomplished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit is used for recalling the word sequence that obtains optimal path and correspondence through decoded state from said optimum node.
9. system as claimed in claim 8 is characterized in that, said system also comprises:
Hot set of words updating block is used to obtain the hot speech of user's input, and said hot speech is saved in hot set of words;
Said exciting unit encourages the historical path of live-vertex according to said hot set of words in said decoding unit decodes process.
10. system as claimed in claim 9 is characterized in that, said exciting unit comprises:
Obtain subelement, be used to obtain the historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judgment sub-unit is used for judging according to said hot set of words whether contiguous word constitutes a hot speech on the said historical path;
The excitation subelement is used for after said judgment sub-unit judges that contiguous word constitutes a hot speech on the said historical path, improving the accumulated history path probability in said historical path; After said judgment sub-unit judges that contiguous word can not constitute a hot speech on the said historical path, keep the accumulated history path probability in said historical path.
11. system as claimed in claim 8 is characterized in that, said system also comprises:
Hot speech acquiring unit is used to obtain the hot speech of user's input;
The cutting unit is used for the hot speech that said hot speech acquiring unit obtains is carried out the text participle, and the participle that obtains is saved in hot speech dictionary;
Said exciting unit encourages the historical path of live-vertex according to said hot speech dictionary in said decoding unit decodes process.
12. system as claimed in claim 11 is characterized in that, said exciting unit comprises:
Obtain subelement, be used to obtain the historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
The neologisms judgment sub-unit is used for judging whether the historical path of said live-vertex neologisms occur;
The excitation subelement; Be used for after said neologisms judgment sub-unit judges that neologisms appear in the historical path of said live-vertex; According to the spreading result of said neologisms, upgrade the accumulated history path probability in said historical path with respect to the participle in the corresponding said hot speech dictionary on the said historical path; Otherwise the accumulated history path probability that keeps said historical path.
13. system as claimed in claim 12 is characterized in that, said excitation subelement comprises:
First judgment sub-unit is used to judge whether said neologisms are follow-up participles of the hot speech participle of correspondence on the said historical path;
First handles subelement, is used for after said first judgment sub-unit judges that said neologisms are the follow-up participle of the hot speech participle of correspondence on the said historical path, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said follow-up participle;
Second judgment sub-unit is used for after said first judgment sub-unit judges that said neologisms are not the follow-up participle of the hot speech participle of correspondence on the said historical path, judging whether said neologisms are the initial participle in the said hot speech dictionary;
Second handles subelement, is used for after said second judgment sub-unit judges that said neologisms are the initial participle of said hot speech dictionary, and the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said initial participle; Otherwise the accumulated history path probability that keeps said historical path.
14. system as claimed in claim 12 is characterized in that, said excitation subelement comprises:
Hot speech judgment sub-unit is used for judging that said neologisms belong to whether the above neologisms of historical path word sequence before is a complete thermal speech of said hot speech dictionary;
The 3rd judgment sub-unit; Be used for judging that in said hot speech judgment sub-unit said neologisms belong to word sequence before the above neologisms of historical path when being complete thermal speech of said hot speech dictionary, judge whether said neologisms are the initial participles in the said hot speech dictionary;
The 3rd handles subelement, is used for after said the 3rd judgment sub-unit judges that said neologisms are the initial participle of said hot speech dictionary, and the accumulated history path probability in said historical path is carried out and the corresponding bonus point of said initial participle; Otherwise the accumulated history path probability that keeps said historical path;
The 4th judgment sub-unit; Be used for judging that in said hot speech judgment sub-unit said neologisms belong to word sequence before the above neologisms of historical path when not being complete thermal speech of said hot speech dictionary, judge that whether said neologisms are the follow-up participles of hot speech participle corresponding on the said historical path;
The manages subelement everywhere, is used for after said the 4th judgment sub-unit judges that said neologisms are the follow-up participle of hot speech participle corresponding on the said historical path, the accumulated history path probability in said historical path being carried out and the corresponding bonus point of said follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in said historical path.
CN2012100734129A 2012-03-19 2012-03-19 Voice recognition method and system Active CN102592595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100734129A CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100734129A CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Publications (2)

Publication Number Publication Date
CN102592595A true CN102592595A (en) 2012-07-18
CN102592595B CN102592595B (en) 2013-05-29

Family

ID=46481136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100734129A Active CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Country Status (1)

Country Link
CN (1) CN102592595B (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065630A (en) * 2012-12-28 2013-04-24 安徽科大讯飞信息科技股份有限公司 User personalized information voice recognition method and user personalized information voice recognition system
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103971686A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
CN106847265A (en) * 2012-10-18 2017-06-13 谷歌公司 For the method and system that the speech recognition using search inquiry information is processed
CN107146610A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of determination method and device of user view
CN108074562A (en) * 2016-11-11 2018-05-25 株式会社东芝 Speech recognition equipment, audio recognition method and storage medium
CN108682415A (en) * 2018-05-23 2018-10-19 广州视源电子科技股份有限公司 voice search method, device and system
CN108735201A (en) * 2018-06-29 2018-11-02 广州视源电子科技股份有限公司 Continuous speech recognition method, apparatus, equipment and storage medium
CN109213777A (en) * 2017-06-29 2019-01-15 杭州九阳小家电有限公司 A kind of voice-based recipe processing method and system
CN109389970A (en) * 2018-12-28 2019-02-26 合肥凯捷技术有限公司 A kind of speech analysis recognition methods
CN109524017A (en) * 2018-11-27 2019-03-26 北京分音塔科技有限公司 A kind of the speech recognition Enhancement Method and device of user's custom words
CN109902306A (en) * 2019-03-12 2019-06-18 珠海格力电器股份有限公司 A kind of audio recognition method, device, storage medium and speech ciphering equipment
CN110110294A (en) * 2019-03-26 2019-08-09 北京捷通华声科技股份有限公司 A kind of method, apparatus and readable storage medium storing program for executing of dynamic inversely decoding
CN110164416A (en) * 2018-12-07 2019-08-23 腾讯科技(深圳)有限公司 A kind of audio recognition method and its device, equipment and storage medium
CN110349569A (en) * 2019-07-02 2019-10-18 苏州思必驰信息科技有限公司 The training and recognition methods of customized product language model and device
CN111028830A (en) * 2019-12-26 2020-04-17 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111063353A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Client processing method allowing user-defined voice interactive content and user terminal
CN111081226A (en) * 2018-10-18 2020-04-28 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device
CN111354347A (en) * 2018-12-21 2020-06-30 中国科学院声学研究所 Voice recognition method and system based on self-adaptive hot word weight
CN111354348A (en) * 2018-12-21 2020-06-30 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111402895A (en) * 2020-06-08 2020-07-10 腾讯科技(深圳)有限公司 Voice processing method, voice evaluating method, voice processing device, voice evaluating device, computer equipment and storage medium
CN111508478A (en) * 2020-04-08 2020-08-07 北京字节跳动网络技术有限公司 Speech recognition method and device
CN111583909A (en) * 2020-05-18 2020-08-25 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
WO2021051514A1 (en) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Speech identification method and apparatus, computer device and non-volatile storage medium
CN112634904A (en) * 2020-12-22 2021-04-09 北京有竹居网络技术有限公司 Hot word recognition method, device, medium and electronic equipment
CN113096648A (en) * 2021-03-20 2021-07-09 杭州知存智能科技有限公司 Real-time decoding method and device for speech recognition
CN113223504A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Acoustic model training method, device, equipment and storage medium
CN113436614A (en) * 2021-07-02 2021-09-24 科大讯飞股份有限公司 Speech recognition method, apparatus, device, system and storage medium
CN113450803A (en) * 2021-06-09 2021-09-28 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer equipment and readable storage medium
WO2021189624A1 (en) * 2020-03-27 2021-09-30 京东数字科技控股有限公司 Method and apparatus for decoding voice data, computer device and storage medium
CN113516967A (en) * 2021-08-04 2021-10-19 青岛信芯微电子科技股份有限公司 Voice recognition method and device
KR20220035222A (en) * 2019-11-25 2022-03-21 아이플라이텍 캄파니 리미티드 Speech recognition error correction method, related devices, and readable storage medium
CN117351944A (en) * 2023-12-06 2024-01-05 科大讯飞股份有限公司 Speech recognition method, device, equipment and readable storage medium
CN117437909A (en) * 2023-12-20 2024-01-23 慧言科技(天津)有限公司 Speech recognition model construction method based on hotword feature vector self-attention mechanism
CN111354348B (en) * 2018-12-21 2024-04-26 北京搜狗科技发展有限公司 Data processing method and device for data processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0535929A2 (en) * 1991-09-30 1993-04-07 Kurzweil Applied Intelligence, Inc. Speech recognition system
US5345537A (en) * 1990-12-19 1994-09-06 Fujitsu Limited Network reformer and creator
US20030004721A1 (en) * 2001-06-27 2003-01-02 Guojun Zhou Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
US20080154600A1 (en) * 2006-12-21 2008-06-26 Nokia Corporation System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition
CN101739437A (en) * 2009-11-26 2010-06-16 杭州鑫方软件有限公司 Implementation method for network sound-searching unit and specific device thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345537A (en) * 1990-12-19 1994-09-06 Fujitsu Limited Network reformer and creator
EP0535929A2 (en) * 1991-09-30 1993-04-07 Kurzweil Applied Intelligence, Inc. Speech recognition system
US20030004721A1 (en) * 2001-06-27 2003-01-02 Guojun Zhou Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
US20080154600A1 (en) * 2006-12-21 2008-06-26 Nokia Corporation System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition
CN101739437A (en) * 2009-11-26 2010-06-16 杭州鑫方软件有限公司 Implementation method for network sound-searching unit and specific device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晓瑞等: "一个面向广播语音识别的语言模型自适应框架", 《中文信息学报》 *

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847265A (en) * 2012-10-18 2017-06-13 谷歌公司 For the method and system that the speech recognition using search inquiry information is processed
US9564127B2 (en) 2012-12-28 2017-02-07 Iflytek Co., Ltd. Speech recognition method and system based on user personalized information
CN103903619B (en) * 2012-12-28 2016-12-28 科大讯飞股份有限公司 A kind of method and system improving speech recognition accuracy
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103065630A (en) * 2012-12-28 2013-04-24 安徽科大讯飞信息科技股份有限公司 User personalized information voice recognition method and user personalized information voice recognition system
CN103065630B (en) * 2012-12-28 2015-01-07 科大讯飞股份有限公司 User personalized information voice recognition method and user personalized information voice recognition system
WO2014117577A1 (en) * 2013-01-30 2014-08-07 Tencent Technology (Shenzhen) Company Limited Method and system for automatic speech recognition
US9472190B2 (en) 2013-01-30 2016-10-18 Tencent Technology (Shenzhen) Company Limited Method and system for automatic speech recognition
CN103971686A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
CN103971686B (en) * 2013-01-30 2015-06-10 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
CN108074562B (en) * 2016-11-11 2021-12-03 株式会社东芝 Speech recognition apparatus, speech recognition method, and storage medium
CN108074562A (en) * 2016-11-11 2018-05-25 株式会社东芝 Speech recognition equipment, audio recognition method and storage medium
CN107146610B (en) * 2017-04-10 2021-06-15 易视星空科技无锡有限公司 Method and device for determining user intention
CN107146610A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of determination method and device of user view
CN109213777A (en) * 2017-06-29 2019-01-15 杭州九阳小家电有限公司 A kind of voice-based recipe processing method and system
CN108682415A (en) * 2018-05-23 2018-10-19 广州视源电子科技股份有限公司 voice search method, device and system
CN108735201A (en) * 2018-06-29 2018-11-02 广州视源电子科技股份有限公司 Continuous speech recognition method, apparatus, equipment and storage medium
CN111081226B (en) * 2018-10-18 2024-02-13 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device
CN111081226A (en) * 2018-10-18 2020-04-28 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device
CN109524017A (en) * 2018-11-27 2019-03-26 北京分音塔科技有限公司 A kind of the speech recognition Enhancement Method and device of user's custom words
CN110164416B (en) * 2018-12-07 2023-05-09 腾讯科技(深圳)有限公司 Voice recognition method and device, equipment and storage medium thereof
CN110164416A (en) * 2018-12-07 2019-08-23 腾讯科技(深圳)有限公司 A kind of audio recognition method and its device, equipment and storage medium
CN111354348B (en) * 2018-12-21 2024-04-26 北京搜狗科技发展有限公司 Data processing method and device for data processing
CN111354347A (en) * 2018-12-21 2020-06-30 中国科学院声学研究所 Voice recognition method and system based on self-adaptive hot word weight
CN111354348A (en) * 2018-12-21 2020-06-30 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111354347B (en) * 2018-12-21 2023-08-15 中国科学院声学研究所 Speech recognition method and system based on self-adaptive hotword weight
CN109389970A (en) * 2018-12-28 2019-02-26 合肥凯捷技术有限公司 A kind of speech analysis recognition methods
CN109902306B (en) * 2019-03-12 2021-02-02 珠海格力电器股份有限公司 Voice recognition method, device, storage medium and voice equipment
CN109902306A (en) * 2019-03-12 2019-06-18 珠海格力电器股份有限公司 A kind of audio recognition method, device, storage medium and speech ciphering equipment
CN110110294B (en) * 2019-03-26 2021-02-02 北京捷通华声科技股份有限公司 Dynamic reverse decoding method, device and readable storage medium
CN110110294A (en) * 2019-03-26 2019-08-09 北京捷通华声科技股份有限公司 A kind of method, apparatus and readable storage medium storing program for executing of dynamic inversely decoding
CN110349569A (en) * 2019-07-02 2019-10-18 苏州思必驰信息科技有限公司 The training and recognition methods of customized product language model and device
WO2021051514A1 (en) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Speech identification method and apparatus, computer device and non-volatile storage medium
KR20220035222A (en) * 2019-11-25 2022-03-21 아이플라이텍 캄파니 리미티드 Speech recognition error correction method, related devices, and readable storage medium
KR102648306B1 (en) * 2019-11-25 2024-03-15 아이플라이텍 캄파니 리미티드 Speech recognition error correction method, related devices, and readable storage medium
CN111028830B (en) * 2019-12-26 2022-07-15 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111028830A (en) * 2019-12-26 2020-04-17 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111063353A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Client processing method allowing user-defined voice interactive content and user terminal
WO2021189624A1 (en) * 2020-03-27 2021-09-30 京东数字科技控股有限公司 Method and apparatus for decoding voice data, computer device and storage medium
CN111508478A (en) * 2020-04-08 2020-08-07 北京字节跳动网络技术有限公司 Speech recognition method and device
CN111583909B (en) * 2020-05-18 2024-04-12 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
WO2021232746A1 (en) * 2020-05-18 2021-11-25 科大讯飞股份有限公司 Speech recognition method, apparatus and device, and storage medium
JP7407968B2 (en) 2020-05-18 2024-01-04 アイフライテック カンパニー,リミテッド Speech recognition method, device, equipment and storage medium
CN111583909A (en) * 2020-05-18 2020-08-25 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
CN111402895A (en) * 2020-06-08 2020-07-10 腾讯科技(深圳)有限公司 Voice processing method, voice evaluating method, voice processing device, voice evaluating device, computer equipment and storage medium
WO2022134984A1 (en) * 2020-12-22 2022-06-30 北京有竹居网络技术有限公司 Hotword recognition method and apparatus, medium and electronic device
CN112634904A (en) * 2020-12-22 2021-04-09 北京有竹居网络技术有限公司 Hot word recognition method, device, medium and electronic equipment
CN113096648A (en) * 2021-03-20 2021-07-09 杭州知存智能科技有限公司 Real-time decoding method and device for speech recognition
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
CN113223504A (en) * 2021-04-30 2021-08-06 平安科技(深圳)有限公司 Acoustic model training method, device, equipment and storage medium
CN113450803A (en) * 2021-06-09 2021-09-28 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer equipment and readable storage medium
CN113450803B (en) * 2021-06-09 2024-03-19 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer device and readable storage medium
CN113436614B (en) * 2021-07-02 2024-02-13 中国科学技术大学 Speech recognition method, device, equipment, system and storage medium
CN113436614A (en) * 2021-07-02 2021-09-24 科大讯飞股份有限公司 Speech recognition method, apparatus, device, system and storage medium
CN113516967A (en) * 2021-08-04 2021-10-19 青岛信芯微电子科技股份有限公司 Voice recognition method and device
CN117351944B (en) * 2023-12-06 2024-04-12 科大讯飞股份有限公司 Speech recognition method, device, equipment and readable storage medium
CN117351944A (en) * 2023-12-06 2024-01-05 科大讯飞股份有限公司 Speech recognition method, device, equipment and readable storage medium
CN117437909B (en) * 2023-12-20 2024-03-05 慧言科技(天津)有限公司 Speech recognition model construction method based on hotword feature vector self-attention mechanism
CN117437909A (en) * 2023-12-20 2024-01-23 慧言科技(天津)有限公司 Speech recognition model construction method based on hotword feature vector self-attention mechanism

Also Published As

Publication number Publication date
CN102592595B (en) 2013-05-29

Similar Documents

Publication Publication Date Title
CN102592595B (en) Voice recognition method and system
CN102376305B (en) Speech recognition method and system
WO2017076222A1 (en) Speech recognition method and apparatus
JP6550068B2 (en) Pronunciation prediction in speech recognition
CN103903619B (en) A kind of method and system improving speech recognition accuracy
CN106157953B (en) Continuous speech recognition method and system
US9070367B1 (en) Local speech recognition of frequent utterances
KR101237799B1 (en) Improving the robustness to environmental changes of a context dependent speech recognizer
CN108899013B (en) Voice search method and device and voice recognition system
CN110223673B (en) Voice processing method and device, storage medium and electronic equipment
CN102280106A (en) VWS method and apparatus used for mobile communication terminal
US11132509B1 (en) Utilization of natural language understanding (NLU) models
CN111862942B (en) Method and system for training mixed speech recognition model of Mandarin and Sichuan
CN103065630A (en) User personalized information voice recognition method and user personalized information voice recognition system
KR20210150497A (en) Context biasing for speech recognition
CN105304080A (en) Speech synthesis device and speech synthesis method
CN104157285A (en) Voice recognition method and device, and electronic equipment
CN104036774A (en) Method and system for recognizing Tibetan dialects
CN103035243A (en) Real-time feedback method and system of long voice continuous recognition and recognition result
CN108735201A (en) Continuous speech recognition method, apparatus, equipment and storage medium
CN107403619A (en) A kind of sound control method and system applied to bicycle environment
CN102436816A (en) Method and device for decoding voice data
CN106875936A (en) Audio recognition method and device
CN112509560B (en) Voice recognition self-adaption method and system based on cache language model
CN108389575A (en) Audio data recognition methods and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: 230088 No. 616, Mount Huangshan Road, hi tech Development Zone, Anhui, Hefei

Patentee before: Anhui USTC iFLYTEK Co., Ltd.