CN104123934A - Speech composition recognition method and system - Google Patents

Speech composition recognition method and system Download PDF

Info

Publication number
CN104123934A
CN104123934A CN201410353819.6A CN201410353819A CN104123934A CN 104123934 A CN104123934 A CN 104123934A CN 201410353819 A CN201410353819 A CN 201410353819A CN 104123934 A CN104123934 A CN 104123934A
Authority
CN
China
Prior art keywords
signal
voice
characteristic parameters
short
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410353819.6A
Other languages
Chinese (zh)
Inventor
黄昭鸣
周林灿
李宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tai Ge Electronics (shanghai) Co Ltd
Original Assignee
Tai Ge Electronics (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tai Ge Electronics (shanghai) Co Ltd filed Critical Tai Ge Electronics (shanghai) Co Ltd
Priority to CN201410353819.6A priority Critical patent/CN104123934A/en
Publication of CN104123934A publication Critical patent/CN104123934A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a speech composition recognition method. The method includes obtaining a sample signal, filtering and denoising the sample signal, quantizing the sample signal into a binary sample signal through an A/D (Analog to Digital) converter, and extracting a speech signal including the speech from the binary sample signal; extracting acoustic characteristic parameters in the speech signal; selecting and training an acoustic model, and respectively estimating the parameter estimate of the acoustic model according to every acoustic characteristic parameter to obtain optimal model parameters corresponding to maximum likelihood values; performing speech composition recognition, collecting signals to be recognized, and calculating a probability value of every acoustic characteristic parameter of the recognition signal according to the optimal model parameters to obtain a recognition result. The speech composition recognition method can accurately recognize the combination of specific syllables and tones of monosyllables as well as the content of the speech. The invention further discloses a speech composition recognition system.

Description

A kind of structure voice recognition method and system thereof
Technical field
The present invention relates to speech recognition, especially design a kind of structure voice recognition method and system thereof.
Background technology
Structure sound is the basis of language production, by structure sound organ (as, lower jaw, lip, tongue, soft palate etc.) the coordinated movement of various economic factors produce.The least speech unit that the motion of structure sound produces is phoneme, and phonetics has defined phoneme and comprised vowel and consonant two classes.The structure sound recognition result of standard Chinese comprises two parts: the syllable that phoneme set is synthetic and tone.But structure sound recognition technology cannot accurately be identified by the identical syllable word tone that same tone does not form at present, and not take phoneme as unit identifies, and causes recognition result and is not suitable for speech language education.
In order to overcome the content that cannot accurately identify in voice of the prior art, cannot accurately identify by the identical syllable word tone that same tone does not form, and not take phoneme and identify as unit, cause recognition result and be not suitable for the defect of speech language education, proposed a kind of structure voice recognition method and system thereof.
Summary of the invention
The present invention proposes a kind of structure voice recognition method, comprise the steps: to obtain sample signal, described sample signal is carried out after filtering and noise reduction, described sample signal is changed and is quantified as binary sample signal by A/D, from described binary sample signal, extract the voice signal that comprises voice; Extract the acoustical characteristic parameters in described voice signal, described acoustical characteristic parameters is used for identifying syllable and tone; Selected and training acoustic model, calculate respectively the maximum likelihood probability value of acoustical characteristic parameters under hidden Markov model described in each, obtains the optimization model parameter corresponding to described maximum likelihood value; The identification of structure sound, gathers signal to be identified, and the probable value according to each acoustical characteristic parameters of signal to be identified described in described optimization model calculation of parameter, obtains recognition result.
In the described structure voice recognition method that the present invention proposes, the step of extracting the voice signal that comprises voice comprises: by described binary sample signal intercepting, be a plurality of frames; Calculate the mean value of the short-time autocorrelation function of at least one frame; The threshold rate of mistake in short-term that is used for judging present frame according to described mean value calculation; According to the described threshold rate of crossing in short-term, judge that described present frame is voiceless sound or voiced sound; Judge one by one all frames, until obtain voice signal while obtaining start frame and abort frame.
In the described structure voice recognition method that the present invention proposes, described short-time autocorrelation function is:
R ^ n ( k ) = Σ m = 0 N - 1 x n ( m ) x ′ n ( m + k ) ;
In formula, k represents that maximum-delay counts, R n(k) represent short-time autocorrelation function, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, x ' nthe three level quantized signals that represent voice signal, N represents the number of voice signal sampled point.
In the described structure voice recognition method that the present invention proposes, the described threshold rate of crossing is in short-term:
Z n = Σ m = n - N + 1 n { | sgn [ x n ( m ) - T ] - sgn [ x n ( m - 1 ) - T ] | + | sgn [ x n ( m ) + T ] - sgn [ x n ( m - 1 ) + T ] | } ;
Wherein, sgn ( x ) = 1 , x &GreaterEqual; 0 - 1 , x < 0
In formula, Z nrepresent to cross in short-term threshold rate, T represents the threshold value of setting, and is positive number, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, and N represents the number of voice signal sampled point, and n represents the sequence number of speech frame.
In the described structure voice recognition method that the present invention proposes, further comprise after extracting described voice signal: increase the weight of the high fdrequency component in described voice signal; Utilize window function to carry out windowing operation to described voice signal.
In the described structure voice recognition method that the present invention proposes, described acoustical characteristic parameters comprises front 12 rank Mel cepstrum coefficients and first order difference result and second order difference result, and the calculation procedure of described Mel cepstrum coefficient and first order difference result thereof and second order difference result comprises: the power spectrum that calculates described voice signal by fast fourier transform; Utilize Mel wave filter to calculate described power spectrum and obtain Mel frequency spectrum; By discrete cosine transform, calculate described Mel frequency spectrum and obtain Mel cepstral coefficients; Successively described Mel cepstral coefficients is carried out to the calculus of differences with the time, obtain first order difference result and second order difference result.
In the described structure voice recognition method that the present invention proposes, described acoustical characteristic parameters comprises logarithm energy in short-term, and the described energy of logarithm in short-term represents as following formula:
E = log &Sigma; n = 1 N s n 2 ;
In formula, s nrepresent voice signal discrete series, N represents total number of sampled point, and n represents sampled point sequence number,
In the described structure voice recognition method that the present invention proposes, the step that obtains described optimization model parameter comprises: average and the covariance of calculating described acoustical characteristic parameters; Average and covariance that the initial average of acoustic model and covariance are replaced with to described acoustical characteristic parameters; The model parameter of estimating described acoustic model, obtains estimates of parameters; Described estimates of parameters is replaced to the parameter in described acoustic model, calculate respectively the maximum likelihood value of acoustical characteristic parameters under hidden Markov model described in each, obtain the optimization model parameter corresponding to described maximum likelihood value.
In the described structure voice recognition method that the present invention proposes, according to Baum-Welch algorithm, estimation obtains described estimates of parameters.
In the described structure voice recognition method that the present invention proposes, the calculation procedure of described recognition result comprises: described signal to be identified is divided, obtained the word sequence that a plurality of words form; Extract a plurality of acoustical characteristic parameters of current word; According to described optimization model parameter, with hidden Markov model, calculate respectively the probable value of acoustical characteristic parameters described in each, using the acoustical characteristic parameters of described probable value maximum as the recognition result of described word; Calculate successively the recognition result to each word in described signal to be identified, obtain the recognition result of identification signal to be stated.
In the described structure voice recognition method that the present invention proposes, obtain further comprising after described recognition result: by described recognition result and the target sound contrast of setting in advance, obtain having dysarthric initial consonant, simple or compound vowel of a Chinese syllable and tone in described signal to be identified.
The invention allows for a kind of structure sound recognition system, comprising: voice acquisition device, it is for collecting sample signal and signal to be identified; Voice processing apparatus, it is for described sample signal and signal to be identified are carried out to data-switching and pre-service, and extracts respectively the acoustical characteristic parameters of described sample signal and described signal to be identified; Structure sound recognition device, it obtains optimization model parameter for the acoustical characteristic parameters training acoustic model according to described sample signal, and the acoustical characteristic parameters according to signal to be identified described in described optimization model calculation of parameter, obtains recognition result.
In the described structure sound recognition system that the present invention proposes, described structure sound recognition device is further used for described recognition result to judge, judges in described signal to be identified and has dysarthric initial consonant, simple or compound vowel of a Chinese syllable and tone.
Structure voice recognition method of the present invention not only can accurately be identified the content in voice, can also identify monosyllabic concrete syllable combination and tone thereof, can be used for the dysarthric assessment of speech and rehabilitation training, speech recognition and encryption, links up the fields such as auxiliary.
The present invention also can further assess dysarthrosis, by the structure sound sharpness to speech structure sound patient, assesses, and can judge patient normally concrete initial consonant, simple or compound vowel of a Chinese syllable and the tone of structure sound, provides concrete dysarthrosis type.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of structure voice recognition method of the present invention.
Fig. 2 is the process flow diagram that extracts Mel cepstrum coefficient.
Fig. 3 calculates the single order of Mel cepstrum coefficient and the schematic diagram of second order difference result.
Fig. 4 is the schematic diagram of the structure sound identification based on hidden Markov model.
Fig. 5 utilizes structure voice recognition method to judge dysarthric schematic flow sheet in embodiment.
Fig. 6 is the structural representation of structure sound recognition system of the present invention.
Embodiment
In conjunction with following specific embodiments and the drawings, the present invention is described in further detail.Implement process of the present invention, condition, experimental technique etc., except the content of mentioning specially below, be universal knowledege and the common practise of this area, the present invention is not particularly limited content.
What Fig. 1 showed is structure voice recognition method of the present invention, and it comprises the steps:
Obtain sample signal, sample signal is carried out after filtering and noise reduction, sample signal is changed and is quantified as binary sample signal by A/D, from binary sample signal, extract the voice signal that comprises voice;
Extract the acoustical characteristic parameters in voice signal, acoustical characteristic parameters comprises for identifying front 12 Mel cepstrum coefficients of syllable and tone and logarithm energy in short-term, and their first order difference, second order difference result, altogether 39 parameters.
Select and training acoustic model, according to each acoustical characteristic parameters, estimate respectively the estimates of parameters of acoustic model, estimates of parameters is replaced to the parameter in acoustic model, calculate respectively the maximum likelihood value of each acoustical characteristic parameters under hidden Markov model, obtain the optimization model parameter corresponding to maximum likelihood value;
The identification of structure sound, gathers signal to be identified, and the probable value according to each acoustical characteristic parameters of optimization model calculation of parameter signal to be identified, obtains recognition result.
Below each step in structure sound method of the present invention is elaborated.
(obtaining sample signal)
When obtaining sample signal or signal to be identified, because the accuracy of the product confrontation system identification of input speech signal has a significant impact, so have higher requirements for recording quality and the noiseproof feature of sample signal or signal to be identified.After obtaining sample signal, first sample signal is carried out to filtering.Filtering is to surpass sample frequency f in order to suppress sample signal medium frequency value s1/2 frequency component, to prevent that aliasing from disturbing, and also suppresses the interference of 50Hz alternating current frequency of operation simultaneously.The process of filtering can adopt interpolation bandpass filter to realize.
(bandpass filtering and A/D conversion)
Filtered sample signal is through sample frequency f sthe digitized sampling of=44100Hz, the discrete-time series of generation sample signal.This sequence is not still the form that computing machine can be identified, need to be quantified as binary signal by A/D conversion operations, can adopt the equal interval quantizing of 12, each sampling pulse in sample signal is converted into 12 bit binary number and for computing machine, process and identification.
(extraction voice signal)
For signal relevant to structure sound in recognition sample signal, need to from one section of sample signal, determine effective start-stop position of voice, definite starting point and terminal intercept out the data segment of efficient voice in coming.The present invention adopts and a kind ofly improvedly utilizes the maximal value of auto-correlation in short-term of voice signal and cross in short-term the end-point detection that mode that threshold rate combines realizes efficient voice.Below to the short-time autocorrelation function of introducing in the present invention with cross in short-term threshold rate and be described further.
(short-time autocorrelation function)
In this example, suppose that the time-domain expression after sample signal windowing is x (m), wherein, n frame signal expression formula is x n(m), frame length is N, and the short-time autocorrelation function of this frame voice signal is:
wherein, 0≤k≤K, K is that maximum delay is counted.
Short-time autocorrelation function is very obvious to the differentiation of voiced sound, voiceless sound and noise, the short-time autocorrelation function waveform of voiced sound has obvious quasi periodic, the short-time autocorrelation function of voiceless sound and noise also has larger difference, and the latter's waveform is similar to pulse shape more.Because the time domain waveform variation of voice is very fast, it is as far as possible little that the length N of institute's windowed function need to be chosen; Meanwhile, the periodicity (frame voice at least comprise the waveform of two periodic functions) in short-term of voice signal need to just can embody in the window function of sufficient length.For solving the demand of these two kinds of contradictions, the present invention has adopted the short-time autocorrelation function of revising, and adopts the window function of two different lengths, and to voice signal, windowing obtains x respectively nand x ' (m) n(m+k), ask its product, the delay that the length of two windows the differs maximum K that counts, its expression formula is:
R ^ n ( k ) = &Sigma; m = 0 N - 1 x n ( m ) x &prime; n ( m + k ) , 0≤k≤K;
In above formula, k represents that maximum-delay counts, R n(k) represent short-time autocorrelation function, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, x ' nthe three level quantized signals that represent voice signal, N represents the number of voice signal sampled point.Because the auto-correlation function value of each sampled point is calculated by N sampled point, avoid autocorrelation function because the increase of k value decays.
(crossing in short-term threshold rate)
What cross in short-term that threshold rate represents is that in frame voice, voice signal waveform, through the number of times of time transverse axis (signal amplitude is zero), now, certainly exists some sampled points to the positive and negative contrary sign of range value in signal, and this situation is exactly " zero passage ".N frame voice signal x n(m) the threshold rate of crossing is in short-term:
Z n = 1 2 &Sigma; m = 0 N - 1 | sgn [ x n ( m ) ] - sgn [ x n ( m - 1 ) ] | , sgn ( x ) = 1 , x &GreaterEqual; 0 - 1 . x < 0 ;
Research shows, the short-time energy of voiced sound and voiceless sound and short-time zero-crossing rate have obvious difference.But in practical application, noise may make signal produce false short-time zero-crossing rate, and therefore, this method is revised short-time zero-crossing rate, and a threshold value scope ± T is set near zero level, the expression formula that obtained in short-term threshold rate is:
Z n = &Sigma; m = n - N + 1 n { | sgn [ x n ( m ) - T ] - sgn [ x n ( m - 1 ) - T ] | + | sgn [ x n ( m ) + T ] - sgn [ x n ( m - 1 ) + T ] | } ;
In formula, Z nrepresent to cross in short-term threshold rate, T represents the threshold value of setting, and is positive number, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, and N represents the number of voice signal sampled point, and n represents the sequence number of speech frame.What this index reflected is the number of times that signal passes positive and negative threshold.If there is noise in signal, as long as the value of noise signal is no more than [T, T], just can avoid to a great extent the generation of false s Zero-crossing Number, improve the anti-noise ability of whole system.
(extracting voice signal embodiment)
Initial 5 frames of the sample signal of supposing the system input are all noises, calculate the peaked mean value of short-time autocorrelation function of these noises, using this value as crossing in short-term the threshold T of threshold rate.According to the threshold rate of crossing in short-term of present frame, judge that this frame is that speech frame is voiceless sound or voiced sound.Wherein, positive and negative threshold value ± T is set near zero level, crosses in short-term threshold rate Z nwith following formula, represent:
Z n = &Sigma; m = - &infin; &infin; { | sgn [ x ( n ) - T ] - sgn [ x ( n - 1 ) - T ] | + | sgn [ x ( n ) - T ] - sgn [ x ( n - 1 ) - T ] | } &omega; ( n - m )
In formula, Z nrepresent to cross in short-term threshold rate, T represents the threshold value of setting, and is positive number, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, and N represents the number of voice signal sampled point, and n represents the sequence number of speech frame.Because voiced energy concentrates on below 3kHz, and voiceless sound is similar to white noise, energy majority concentrates on upper frequency, therefore, the threshold rate of crossing in short-term of voiced sound and voiceless sound just exists very large difference, can be used as the foundation of the pure and impure type of differentiating each frame voice, what be voiceless sound crosses threshold rate in short-term far above voiced sound.
(pre-service-pre-emphasis)
The present invention also further carries out pre-service to voice signal.In pre-treatment step, add " pre-emphasis " to process.Object is to increase the energy of high fdrequency component, makes the frequency of whole voice signal become smooth, effectively improves signal to noise ratio (S/N ratio), to can use unified signal to noise ratio (S/N ratio) when signal is carried out to spectrum analysis and sound channel calculation of parameter, reduces difficulty in computation.Pre-emphasis is added on after the digital sample of voice signal, before characteristic parameter extraction, adopting order digital filter to form can improve the pre-emphasis digital filter of signal high fdrequency component with 6dB/ octave, the response equation of this digital filter is: H (z)=1-μ z -1.Wherein, one approaches μ value and is not more than 1, in this method, and value μ=0.97.
(pre-service-windowing and a minute frame)
The average power spectra of voice signal is subject to glottal excitation and mouth and nose radiation effect, high band falls by 6dB/ octave greatly more than 800Hz, and while extracting characteristic parameter from voice signal, the frequency spectrum that needs computing voice signal, frequency is higher, and corresponding composition is just less, and the difficulty of the Frequency spectrum ratio low frequency part of HFS is asked, therefore, carry out pre-emphasis processing.Object is to improve HFS, makes the frequency spectrum of signal become smooth, remains on low frequency in the whole frequency band of high frequency, can ask frequency spectrum by same signal to noise ratio (S/N ratio), is convenient to extract characteristic parameter.Because voice have quasi periodic in short-term, for utilize this specific character in voice signal is processed, better reflected signal feature, therefore, need to carry out windowing process to voice signal, and this is the important foundation that voice signal is processed in short-term.
Take within the scope of the order of magnitude that millisecond is unit, the certain physical characteristics of voice signal remains unchanged substantially, thereby so greatly simplify the computation process of voice signal being carried out to holistic approach by voice signal is carried out to short-time analysis, be also convenient to speech signal analysis to set up associated will analysis voice signal with the physiology course that voice produce.In voice signal, need to intercept one section of voice with stationarity in short-term as object, this section of voice are " frame " voice, and the duration of voice segments is exactly frame length, the value of frame length one at 10-30ms.Each frame voice is all to have certain fixed characteristic, and the flow process that minute frame is processed also just can be regarded as the analysis to one whole section of continuous speech frame by frame.Window function is for extracting the instrument of " speech frame " from continuous speech, being denoted as ω (n).The characteristic of window function is exactly by the whole zero setting of voice segments needing outside processing region, and to reach the object that extracts speech frame, this process is exactly to voice signal " minute frame ".A minute frame for voice is that voice signal expression formula s (n) is multiplied by window function ω (n), i.e. the voice signal expression formula of windowing is: s ω(n)=s (n) ω (n).Wherein, Hamming window function is the most conventional.Reason has two aspects: in time domain, the voice signal wave-shape amplitude that has added Hamming window can smoothly be reduced to zero, reduces the truncation effect that sharply declines and cause due to window two ends, causes frequency leakage; On frequency domain, the Frequency Response curve that has added the voice signal of Hamming window has more level and smooth low-pass characteristic, has reflected preferably the frequency characteristic in short-term of voice signal.The expression-form of the Hamming window function that N is ordered is:
In above formula, w (n) represents window function, and N represents the length of window function, and n represents the number of sampled point.
(acoustic model)
Acoustic model is a kind of feature parameter model in essence, is that the voice by some extract after characteristic parameter, uses the training algorithm of appointment to train rear generation.First the signal to be identified of input is extracted to feature parameter vector sequence, again this characteristic sequence is mated with acoustic model, by gap more between the two, calculate the feature vector sequence of signal to be identified and the distance between acoustic model, to obtain optimum recognition result.Structure voice recognition method of the present invention adopts based on hidden Markov model (Hidden Markov Model, HMM) probabilistic model method is carried out Model Matching and comparison, this is due to the phonetic feature variation that HMM can conform well and speaker causes, is very suitable for the identification of unspecified person.Acoustic model is an abstraction unit for HMM, so acoustic model also becomes to do acoustic model, and a common acoustic model will comprise a plurality of states, can be velocity of sound, syllable or half syllable etc.The training of acoustic model comprises the following steps:
(selected acoustic model)
In the present invention, optional acoustic model comprises half syllable acoustic model and syllable acoustic model.Because syllable acoustic model is done as a whole identification by initial consonant and simple or compound vowel of a Chinese syllable, cannot embody structure sound feature and the difference of initial consonant and simple or compound vowel of a Chinese syllable, so take in this embodiment, to adopt half syllable be example as acoustic model.Each syllable is when pronunciation, and the tensity that participates in the muscle of structure sound can experience crescendo, strong peak and diminuendo three phases, and corresponding sound has also correspondingly been divided into sound, neck sound and radio reception three parts.According to this rule, a syllable can be divided into a plurality of parts, forms half syllable unit.This element can be added that by the consonant of syllable initial position part vowel forms, and also can add that the consonant of ending forms by the part vowel after a syllable.Nearly more than 2000 half such syllable unit of English.In standard Chinese, served as the normally consonant (representing with C) of sound, served as the normally vowel (representing with V) of leading sound, radio reception does not limit vowel or consonant.The syllable structure citation form of Chinese speech has V, CV, VC, CVC etc., because half syllable unit combination type is less, during for acoustic model as recognition system, the software and hardware of system requires lower, and therefore, half syllable unit is the most frequently used a kind of acoustic model in Chinese speech identification.
(extract characteristic parameter-in short-term logarithm energy)
In this embodiment, corresponding to the acoustical characteristic parameters of the required extraction of acoustic model of half syllable, comprise front 12 Mel cepstrum coefficients and logarithm energy in short-term, and their first order difference, second order difference result, amount to 39 parameters.
Extract characteristic parameter for identifying speech syllable for logarithm energy and front 12 dimension Mel cepstrum coefficient (MFCC) and single order and second order difference results in short-term, amount to 13 and tie up parameters.Choose logarithm energy in short-term as one of characteristic parameter of speech recognition: in formula, s nrepresent voice signal discrete series, N represents total number of sampled point, and n represents sampled point sequence number, and E represents logarithm energy in short-term.The reason of choosing logarithm energy is, can distinguish voiceless sound and noiseless composition that amplitude is less, avoid using obscuring that linear energy parameter may cause, can also solve the excessive problem of linear energy calculation of parameter amount, better distinguish voiceless sound, voiced sound and noiseless composition.
(extracting characteristic parameter-Mel cepstrum coefficient, single order and second order difference result)
Extract the basic procedure of Mel cepstrum coefficient as shown in Figure 2.First, the speech frame of each windowed function is done to fast fourier transform (FFT) and obtain power spectrum; Again power spectrum is obtained to Mel frequency spectrum through Mel bank of filters, Mel bank of filters is actually one group of normalized V-belt bandpass filter, power spectrum is asked for to log spectrum.The principle of design of Mel bank of filters is smooth spectrum, highlights the resonance peak of voice signal, rationally reduces characteristic information amount; Then, Mel frequency spectrum is solved to cepstrum, do discrete cosine transform (DCT) and obtain Mel cepstrum, Mel cepstrum coefficient is a stack features vector,
in formula, c krepresent Mel cepstrum coefficient, m ithe value that represents i voice signal sampled point, k represents the sequence number of Mel cepstrum coefficient, and i represents the sequence number of sampled point, and N represents the number of Mel bank of filters intermediate cam ripple.In present embodiment, get N=20, and get k=2,3 ..., 13 o'clock c k12 results as MFCC coefficient.But, only adopt 13 dimension parameters can not meet the requirement that in practical situations, system is realized higher discrimination, therefore, on the basis of these parameters, further calculate above-mentioned 13 dimension parameters to the first order difference of time and second order difference result, obtain first order difference logarithm energy and first order difference MFCC and second order difference logarithm energy and second order difference MFCC, amount to 39 Wesys in the characteristic parameter of syllable identification, as shown in Figure 3.
(extracting characteristic parameter-fundamental frequency)
Extracting for identifying the characteristic parameter of tone is fundamental frequency (being called for short " fundamental frequency ").Select sum of magnitude difference square function (Sum Magnitude Difference Magnitude Squre Function, SMDSF) to carry out fundamental frequency extraction.This algorithm can carry out fundamental frequency accurately to the voice under any sample frequency and extract.The expression formula of SMDSF is as shown in the formula expression:
D S ( &tau; ) = &Sigma; j = 0 L - 1 [ s w 2 ( j + &tau; ) - s w 1 ( j ) ] 2 ;
In formula, discrete voice sequence s (j) windowing w 1(j), τ=0,1 .., L-1, L is the sampling number in every frame voice.In like manner, discrete voice sequence s (j) windowing w 2(j), window function w 1and w (j) 2(j) expression formula is respectively:
w 1 ( j ) = 1 , j = 0,1 , . . . , L - 1 0 , other ; w 2 ( j ) = 1 , j = 0,1 , . . . , 2 ( L - 1 ) 0 , other ;
For evaluating the aperiodicity of structure sound voice, also need SMDSF to be normalized, that is: wherein, τ=0,1 .., L-1, L is the sampling number in every frame voice.
For quasi-periodic signal, suppose that pitch period is P, D s(P) the aperiodicity composition energy and in signal is proportional.And proportional with signal gross energy.Therefore, value can embody composition energy and the ratio of signal gross energy non-periodic in signal. in pitch period P place value signal period property is not obvious, value larger; Signal period property is more remarkable, be worth less; For the periodic signal of standard, therefore, can be used as the acyclic tolerance of signal.
(training acoustic model)
Initial average and covariance using the average of above-mentioned gross acoustic features parameter and covariance as acoustic model, according to Baum-Welch algorithm, obtain after the estimation of model parameter, utilize estimates of parameters to replace original model parameter and re-start again estimation, calculate respectively the maximum likelihood value of each acoustical characteristic parameters under hidden Markov model, can obtain corresponding to the optimization model parameter under the maximum likelihood meaning of maximum likelihood value.
(identification of structure sound)
The structure sound identifying that the present invention is based on HMM is to realize the identification of mandarin structure sound with sequence labelling problem, is similar to decoding problem, and utilizing training parameter is the word sequence mark optimum of current input, the i.e. status switch of maximum probability.Structure sound identifying of the present invention as shown in Figure 4, be a kind of from left to right without redirect HMM model (hidden Markov model), hidden Markov model is used for describing the probability model of statistics of random processes characteristic, comprises Markov chain and one stochastic process two parts.Wherein, Markov chain is described the transfer of state with transition probability; One stochastic process is carried out the relation between description state and observation sequence with observed value probability.This from left to right without in redirect HMM model, status number equals the phoneme number of entry, that is to say the corresponding phoneme of each state.
The traditional approach of speech dysarthrosis assessment is the mode with subjective Auditory Perception by professionals such as speech rehabilitation teachers, and patient is sent and specifies the structure sound sharpness of the entry in assessment vocabulary to evaluate.Entry in dysarthrosis assessment vocabulary will cover the whole initial consonant of mandarin, simple or compound vowel of a Chinese syllable and tone conventionally, and each entry pronunciation is the combination of these initial consonants, simple or compound vowel of a Chinese syllable and tone.Vocabulary comprises 50 entries.Cover 21 initial consonants, 13 simple or compound vowel of a Chinese syllable and 4 tones, comprise 18 phoneme contrasts and 36 pairs of Minimal phoneme contrasts, can reflect each phoneme innate ability, each phoneme contrast ability and structure sound readability of patient.This method is used HMM to set up the core identification engine of structure sound recognition system.The entry number of supposing the system identification vocabulary is V, and the HMM of one of them word v is Φ v, each word has N state.While training the acoustic model of each word, use K different pronunciation.Sequence X=(the x of the characteristic parameter extracting from input voice of input 1, x 2..., x t) as observation sequence, after characteristic parameter extraction, calculating probability P (X| Φ under HMM v), finally in all entries, get entry v* that maximum likelihood probability is corresponding as recognition result, thereby identify initial consonant, simple or compound vowel of a Chinese syllable and the tone of structure sound.
(dysarthrosis assessment)
The present invention further utilizes standard acoustic model and the structure voice recognition method of having trained to realize dysarthric assessment.According to recognition result and the target sound contrast of setting in advance, structure sound sharpness to speech structure sound patient is assessed, can judge patient normally concrete initial consonant, simple or compound vowel of a Chinese syllable and the tone of structure sound, provide concrete dysarthrosis type (comprise phoneme omission, substitute and distort three kinds).
For example consult Fig. 5, target sound is set as initial consonant/b/, preset the corresponding word of this target sound for " bag/b ā o/ ", if input voice are " cat/m ā o/ ", in the entry drawing after structure voice recognition method of the present invention, known initial consonant/b/ quilt/m/ substitutes, and this situation is " substituting " obstacle; If input voice are " recessed/ā o/ ", initial consonant/b/ is missed when pronunciation, and this situation is " omission " obstacle, if the voice of input can not find the entry that Chinese is corresponding after structure voice recognition method of the present invention, explanation is exactly that structure sound is distorted obstacle.
As shown in Figure 6, a kind of structure sound of the present invention recognition system comprises voice acquisition device 1, voice processing apparatus 2 and structure sound recognition device 3.
Voice acquisition device 1 is an omnidirectional microphone, for collecting sample signal and signal to be identified.Voice processing apparatus 2 is connected with voice acquisition device 1, and it is for sample signal and signal to be identified are carried out to data-switching and pre-service, and extracts respectively the acoustical characteristic parameters of sample signal and signal to be identified.Structure sound recognition device 3 is connected with voice processing apparatus 2, for the acoustical characteristic parameters training acoustic model according to sample signal, obtains optimization model parameter, and the acoustical characteristic parameters according to optimization model calculation of parameter signal to be identified, obtains recognition result.Structure sound recognition device 3 of the present invention is further used for recognition result to judge, by recognition result and the target sound contrast of setting in advance, judges and in signal to be identified, has dysarthric initial consonant, simple or compound vowel of a Chinese syllable and tone.
Protection content of the present invention is not limited to above embodiment.Do not deviating under the spirit and scope of inventive concept, variation and advantage that those skilled in the art can expect are all included in the present invention, and take appending claims as protection domain.

Claims (13)

1. a structure voice recognition method, is characterized in that, comprises the steps:
Obtain sample signal, described sample signal is carried out after filtering and noise reduction, described sample signal is changed and is quantified as binary sample signal by A/D, from described binary sample signal, extract the voice signal that comprises voice;
Extract the acoustical characteristic parameters in described voice signal, described acoustical characteristic parameters is used for identifying syllable and tone;
Selected and training acoustic model, calculate respectively the maximum likelihood probability value of acoustical characteristic parameters under hidden Markov model described in each, obtains the optimization model parameter corresponding to described maximum likelihood value;
The identification of structure sound, gathers signal to be identified, and the probable value according to each acoustical characteristic parameters of signal to be identified described in described optimization model calculation of parameter, obtains recognition result.
2. structure voice recognition method as claimed in claim 1, is characterized in that, the step of extracting the voice signal that comprises voice comprises:
By described binary sample signal intercepting, be a plurality of frames;
Calculate the mean value of the short-time autocorrelation function of at least one frame;
The threshold rate of mistake in short-term that is used for judging present frame according to described mean value calculation;
According to the described threshold rate of crossing in short-term, judge that described present frame is voiceless sound or voiced sound;
Judge one by one all frames, until obtain voice signal while obtaining start frame and abort frame.
3. structure voice recognition method as claimed in claim 2, is characterized in that, described short-time autocorrelation function is:
In formula, k represents that maximum-delay counts, R n(k) represent short-time autocorrelation function, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, x ' nthe three level quantized signals that represent voice signal, N represents the number of voice signal sampled point.
4. structure voice recognition method as claimed in claim 2, is characterized in that, the described threshold rate of crossing is in short-term:
Wherein,
In formula, Z nrepresent to cross in short-term threshold rate, T represents the threshold value of setting, and is positive number, x nthe sampled point that represents voice signal, m represents the sequence number of sampled point, and N represents the number of voice signal sampled point, and n represents the sequence number of speech frame.
5. structure voice recognition method as claimed in claim 1, is characterized in that, further comprises after extracting described voice signal:
Increase the weight of the high fdrequency component in described voice signal;
Utilize window function to carry out windowing operation to described voice signal.
6. structure voice recognition method as claimed in claim 1, it is characterized in that, described acoustical characteristic parameters comprises Mel cepstrum coefficient and first order difference result and second order difference result, and the calculation procedure of described Mel cepstrum coefficient and first order difference result thereof and second order difference result comprises:
By fast fourier transform, calculate the power spectrum of described voice signal;
Utilize Mel wave filter to calculate described power spectrum and obtain Mel frequency spectrum;
By discrete cosine transform, calculate described Mel frequency spectrum and obtain Mel cepstral coefficients;
Successively described Mel cepstral coefficients is carried out to the calculus of differences with the time, obtain first order difference result and second order difference result.
7. structure voice recognition method as claimed in claim 1, is characterized in that, described acoustical characteristic parameters comprises logarithm energy in short-term, and the described energy of logarithm in short-term represents as following formula:
In formula, s nrepresent voice signal discrete series, N represents total number of sampled point, and n represents sampled point sequence number.
8. structure voice recognition method as claimed in claim 1, is characterized in that, the step that obtains described optimization model parameter comprises:
Calculate average and the covariance of described acoustical characteristic parameters;
Average and covariance that the initial average of acoustic model and covariance are replaced with to described acoustical characteristic parameters;
The model parameter of estimating described acoustic model, obtains estimates of parameters;
Described estimates of parameters is replaced to the parameter in described acoustic model, calculate respectively the maximum likelihood probability value of acoustical characteristic parameters under hidden Markov model described in each, obtain the optimization model parameter corresponding to described maximum likelihood value.
9. structure voice recognition method as claimed in claim 1, is characterized in that, according to Baum-Welch algorithm, estimation obtains described estimates of parameters.
10. structure voice recognition method as claimed in claim 1, is characterized in that, the calculation procedure of described recognition result comprises:
Described signal to be identified is divided, obtained the word sequence that a plurality of words form;
Extract a plurality of acoustical characteristic parameters of current word;
According to described optimization model parameter, with hidden Markov model, calculate respectively the probable value of acoustical characteristic parameters described in each, using the acoustical characteristic parameters of described probable value maximum as the recognition result of described word;
Calculate successively the recognition result to each word in described signal to be identified, obtain the recognition result of identification signal to be stated.
11. structure voice recognition methods as claimed in claim 1, is characterized in that, obtain further comprising after described recognition result:
By described recognition result and the target sound contrast of setting in advance, obtain having dysarthric initial consonant, simple or compound vowel of a Chinese syllable and tone in described signal to be identified.
12. 1 kinds of structure sound recognition systems, is characterized in that, comprise
Voice acquisition device, it is for collecting sample signal and signal to be identified;
Voice processing apparatus, it is for described sample signal and signal to be identified are carried out to data-switching and pre-service, and extracts respectively the acoustical characteristic parameters of described sample signal and described signal to be identified;
Structure sound recognition device, it obtains optimization model parameter for the acoustical characteristic parameters training acoustic model according to described sample signal, and the acoustical characteristic parameters according to signal to be identified described in described optimization model calculation of parameter, obtains recognition result.
13. structure sound recognition systems as claimed in claim 12, is characterized in that, described structure sound recognition device is further used for described recognition result to judge, judges in described signal to be identified and has dysarthric initial consonant, simple or compound vowel of a Chinese syllable and tone.
CN201410353819.6A 2014-07-23 2014-07-23 Speech composition recognition method and system Pending CN104123934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410353819.6A CN104123934A (en) 2014-07-23 2014-07-23 Speech composition recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410353819.6A CN104123934A (en) 2014-07-23 2014-07-23 Speech composition recognition method and system

Publications (1)

Publication Number Publication Date
CN104123934A true CN104123934A (en) 2014-10-29

Family

ID=51769324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410353819.6A Pending CN104123934A (en) 2014-07-23 2014-07-23 Speech composition recognition method and system

Country Status (1)

Country Link
CN (1) CN104123934A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104766607A (en) * 2015-03-05 2015-07-08 广州视源电子科技股份有限公司 Television program recommendation method and system
CN105719662A (en) * 2016-04-25 2016-06-29 广东顺德中山大学卡内基梅隆大学国际联合研究院 Dysarthrosis detection method and dysarthrosis detection system
CN105810192A (en) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 Speech recognition method and system thereof
CN106384587A (en) * 2015-07-24 2017-02-08 科大讯飞股份有限公司 Voice recognition method and system thereof
CN106846581A (en) * 2017-01-25 2017-06-13 胡建军 Door access control system and method
CN107358963A (en) * 2017-07-14 2017-11-17 中航华东光电(上海)有限公司 One kind removes breathing device and method in real time
WO2018014537A1 (en) * 2016-07-22 2018-01-25 百度在线网络技术(北京)有限公司 Voice recognition method and apparatus
CN107886941A (en) * 2016-09-29 2018-04-06 亿览在线网络技术(北京)有限公司 A kind of audio mask method and device
CN110232913A (en) * 2019-06-19 2019-09-13 桂林电子科技大学 A kind of sound end detecting method
CN110876609A (en) * 2019-07-01 2020-03-13 上海慧敏医疗器械有限公司 Voice treatment instrument and method for frequency band energy concentration rate measurement and audio-visual feedback
CN111276156A (en) * 2020-01-20 2020-06-12 深圳市数字星河科技有限公司 Real-time voice stream monitoring method
CN111276130A (en) * 2020-01-21 2020-06-12 河南优德医疗设备股份有限公司 MFCC cepstrum coefficient calculation method for computer language knowledge education system
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN111696530A (en) * 2020-04-30 2020-09-22 北京捷通华声科技股份有限公司 Target acoustic model obtaining method and device
CN111599347B (en) * 2020-05-27 2024-04-16 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (functional peripheral component interconnect) characteristics for artificial intelligent analysis

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0575815A1 (en) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Speech recognition method
US5890111A (en) * 1996-12-24 1999-03-30 Technology Research Association Of Medical Welfare Apparatus Enhancement of esophageal speech by injection noise rejection
CN1346126A (en) * 2000-09-27 2002-04-24 中国科学院自动化研究所 Three-tone model with tune and training method
CN1946029A (en) * 2006-10-30 2007-04-11 北京中星微电子有限公司 Method and its system for treating audio signal
CN101515456A (en) * 2008-02-18 2009-08-26 三星电子株式会社 Speech recognition interface unit and speed recognition method thereof
CN102063903A (en) * 2010-09-25 2011-05-18 中国科学院深圳先进技术研究院 Speech interactive training system and speech interactive training method
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
CN102237083A (en) * 2010-04-23 2011-11-09 广东外语外贸大学 Portable interpretation system based on WinCE platform and language recognition method thereof
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN103065629A (en) * 2012-11-20 2013-04-24 广东工业大学 Speech recognition system of humanoid robot
CN103383845A (en) * 2013-07-08 2013-11-06 上海昭鸣投资管理有限责任公司 Multi-dimensional dysarthria measuring system and method based on real-time vocal tract shape correction
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
CN103705218A (en) * 2013-12-20 2014-04-09 中国科学院深圳先进技术研究院 Dysarthria identifying method, system and device

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0575815A1 (en) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Speech recognition method
US5890111A (en) * 1996-12-24 1999-03-30 Technology Research Association Of Medical Welfare Apparatus Enhancement of esophageal speech by injection noise rejection
CN1346126A (en) * 2000-09-27 2002-04-24 中国科学院自动化研究所 Three-tone model with tune and training method
CN1946029A (en) * 2006-10-30 2007-04-11 北京中星微电子有限公司 Method and its system for treating audio signal
CN101515456A (en) * 2008-02-18 2009-08-26 三星电子株式会社 Speech recognition interface unit and speed recognition method thereof
CN102237083A (en) * 2010-04-23 2011-11-09 广东外语外贸大学 Portable interpretation system based on WinCE platform and language recognition method thereof
CN102063903A (en) * 2010-09-25 2011-05-18 中国科学院深圳先进技术研究院 Speech interactive training system and speech interactive training method
CN102543073A (en) * 2010-12-10 2012-07-04 上海上大海润信息系统有限公司 Shanghai dialect phonetic recognition information processing method
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
CN103065629A (en) * 2012-11-20 2013-04-24 广东工业大学 Speech recognition system of humanoid robot
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN103383845A (en) * 2013-07-08 2013-11-06 上海昭鸣投资管理有限责任公司 Multi-dimensional dysarthria measuring system and method based on real-time vocal tract shape correction
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
CN103705218A (en) * 2013-12-20 2014-04-09 中国科学院深圳先进技术研究院 Dysarthria identifying method, system and device

Non-Patent Citations (16)

* Cited by examiner, † Cited by third party
Title
刘建等: "《基于幅度差平方和函数的基音周期提取算法》", 《清华大学学报(自然科学版)》 *
张超琼等: "《基于语音基频的性别识别方法及其改进》", 《百度文库》 *
张超琼等: "《基于高斯混合模型的语音性别识别》", 《计算机应用》 *
李阿妮: "《失语症患者语音信号的识别研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨帅: "《聋儿语音恢复系统的语音识别研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
柯登峰 等: "《互联网时代语音识别基本问题》", 《中国科学:信息科学》 *
武光利等: "《基于LPC残差与SCMDSF相结合的基音周期检测》", 《微计算机信息》 *
瞿仰: "《基于声调识别的汉语计算机辅助学习系统研究》", 《中国博士学位论文全文数据库 信息科技辑》 *
米日古力·阿布都热素: "《基于HTK的维吾尔语连续音素识别技术研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
赵欢: "《面向嵌入式计算平台的自动语音识别关键技术研究》", 《中国博士学位论文全文数据库 信息科技辑》 *
邓杏娟: "《基于语音识别技术的失语症辅助诊断及康复治疗系统的研究》", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *
郝建英: "《面向聋儿言语康复的发音检测技术的研究与应用》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈东帆 等: "《构音语音测量系统及其实验研究》", 《计算机工程与科学》 *
陈伟: "《语音识别声学建模中的主动学习研究》", 《中国博士学位论文全文数据库 信息科技辑》 *
马俊: "《语音识别技术研究》", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 *
高卫强: "《聋儿言语康复系统 》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105810192B (en) * 2014-12-31 2019-07-02 展讯通信(上海)有限公司 Audio recognition method and its system
CN105810192A (en) * 2014-12-31 2016-07-27 展讯通信(上海)有限公司 Speech recognition method and system thereof
CN104766607A (en) * 2015-03-05 2015-07-08 广州视源电子科技股份有限公司 Television program recommendation method and system
CN106384587A (en) * 2015-07-24 2017-02-08 科大讯飞股份有限公司 Voice recognition method and system thereof
CN105719662A (en) * 2016-04-25 2016-06-29 广东顺德中山大学卡内基梅隆大学国际联合研究院 Dysarthrosis detection method and dysarthrosis detection system
CN105719662B (en) * 2016-04-25 2019-10-25 广东顺德中山大学卡内基梅隆大学国际联合研究院 Dysarthrosis detection method and system
WO2018014537A1 (en) * 2016-07-22 2018-01-25 百度在线网络技术(北京)有限公司 Voice recognition method and apparatus
CN107886941A (en) * 2016-09-29 2018-04-06 亿览在线网络技术(北京)有限公司 A kind of audio mask method and device
CN106846581A (en) * 2017-01-25 2017-06-13 胡建军 Door access control system and method
CN107358963A (en) * 2017-07-14 2017-11-17 中航华东光电(上海)有限公司 One kind removes breathing device and method in real time
CN110232913A (en) * 2019-06-19 2019-09-13 桂林电子科技大学 A kind of sound end detecting method
CN110876609A (en) * 2019-07-01 2020-03-13 上海慧敏医疗器械有限公司 Voice treatment instrument and method for frequency band energy concentration rate measurement and audio-visual feedback
CN111276156A (en) * 2020-01-20 2020-06-12 深圳市数字星河科技有限公司 Real-time voice stream monitoring method
CN111276130A (en) * 2020-01-21 2020-06-12 河南优德医疗设备股份有限公司 MFCC cepstrum coefficient calculation method for computer language knowledge education system
CN111696530A (en) * 2020-04-30 2020-09-22 北京捷通华声科技股份有限公司 Target acoustic model obtaining method and device
CN111696530B (en) * 2020-04-30 2023-04-18 北京捷通华声科技股份有限公司 Target acoustic model obtaining method and device
CN111599347A (en) * 2020-05-27 2020-08-28 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (Mel frequency cepstrum coefficient) features for artificial intelligence analysis
CN111599347B (en) * 2020-05-27 2024-04-16 广州科慧健远医疗科技有限公司 Standardized sampling method for extracting pathological voice MFCC (functional peripheral component interconnect) characteristics for artificial intelligent analysis

Similar Documents

Publication Publication Date Title
CN104123934A (en) Speech composition recognition method and system
CN102231278B (en) Method and system for realizing automatic addition of punctuation marks in speech recognition
CN104200804B (en) Various-information coupling emotion recognition method for human-computer interaction
CN103928023B (en) A kind of speech assessment method and system
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
Van Segbroeck et al. A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice.
CN107610715A (en) A kind of similarity calculating method based on muli-sounds feature
CN104900229A (en) Method for extracting mixed characteristic parameters of voice signals
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
CN105825852A (en) Oral English reading test scoring method
CN104978507A (en) Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition
CN103985390A (en) Method for extracting phonetic feature parameters based on gammatone relevant images
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN106898362A (en) The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis
Yutai et al. Speaker recognition based on dynamic MFCC parameters
CN105845126A (en) Method for automatic English subtitle filling of English audio image data
Eray et al. An application of speech recognition with support vector machines
Kanabur et al. An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition
JP5325130B2 (en) LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program
Saloni et al. Disease detection using voice analysis: A review
CN105679332A (en) Cleft palate speech initial and final automatic segmentation method and system
MY An improved feature extraction method for Malay vowel recognition based on spectrum delta
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language
Razak et al. Towards automatic recognition of emotion in speech
Yousfi et al. Isolated Iqlab checking rules based on speech recognition system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141029