CN103310272A - Articulation method of Directions Into of Articulators (DIVA) neural network model improved on basis of track action knowledge base - Google Patents

Articulation method of Directions Into of Articulators (DIVA) neural network model improved on basis of track action knowledge base Download PDF

Info

Publication number
CN103310272A
CN103310272A CN2013102743413A CN201310274341A CN103310272A CN 103310272 A CN103310272 A CN 103310272A CN 2013102743413 A CN2013102743413 A CN 2013102743413A CN 201310274341 A CN201310274341 A CN 201310274341A CN 103310272 A CN103310272 A CN 103310272A
Authority
CN
China
Prior art keywords
diva
model
knowledge base
vocal organs
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102743413A
Other languages
Chinese (zh)
Other versions
CN103310272B (en
Inventor
张少白
徐歆冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201310274341.3A priority Critical patent/CN103310272B/en
Publication of CN103310272A publication Critical patent/CN103310272A/en
Application granted granted Critical
Publication of CN103310272B publication Critical patent/CN103310272B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to an articulation method, in particular to an articulation method of a Directions Into of Articulators (DIVA) neural network model improved on the basis of a track action knowledge base. The articulation method is characterized in that the improved DIVA neural network model added with the track action knowledge base is adopted, and for voice not included in a voice mapping set, the corrected auditory feedback information is obtained by combining disturbance factors, the neural network is trained by aid of the corrected auditory feedback information, and therefore the training times of the model in articulation generation are reduced and articulation accuracy is improved.

Description

Based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation
Technical field
The present invention relates to a kind of manner of articulation, especially based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation.
Background technology
Neural computing speech model (Neuro-computational speech model) is with the speech production of computing machine the Realization of Simulation, perception and the model of series of complex process such as obtains.The composition of neural computing speech model is very complicated, comprises at least a cognitive part, a motion process part and a sense organ processing section: the effect of cognitive part is to produce neuron activation (or produce phoneme characterize) in speech production and speech perception stage; Motion process partly starts from characterizing activation planning motion according to producing phoneme, ends at the vocal organs motion of particular phoneme item correspondence; The sense organ processing section comprises that producing the corresponding sense of hearing according to the external voice signal characterizes and activate corresponding phoneme sign.
Up to the present, research to neural computing voice model has obtained a lot of achievements, and wherein DIVA (Directions Into of Articulators) model is exactly a kind of more advanced speech production, perception and the neural computing speech model that obtains.
The DIVA model is the sound lab Frank.Guenther of Boston University professor and team develops thereof.Really have in the neural computing voice model of biophysics meaning at present, the definition of DIVA model and test are the most thoroughly, and it or unique a kind of adaptive neural network model of using the pseudoinverse control technology.The DIVA model can describe that voice obtain, the relevant processing procedure in perception and the generative process, and can generate phoneme, syllable or word by the control simulation sound channel.Provided the composition frame chart of DIVA model among Fig. 1.
The characteristics of DIVA model comprise:
Model comprises feedforward control and FEEDBACK CONTROL two sub-systems;
The target area of model is comprised of fundamental frequency F0, first three formant frequency and corresponding somatesthesia target;
The input of model is word, syllable or phoneme.Although the object of up to now model focusing is still short and simple voice sequence, but its impact on language (being the rhythm and metrics structure, morphology and word circle etc.) must relate to longer more complicated structure, and these structures are considered in model;
Model to coarticulation with and the explanation of correlation be similar to the window model of Keating, but have more advantage than window model aspect the explanation of how to be learnt in target;
The DIVA model has obtained unprecedented success by abundant application to the study of sensory perceptual system.The method of its institute's foundation is that the audible sound that has existed is classified, and need not explain and how to be learnt.
There are some defectives in the DIVA model, these defectives be mainly manifested in following some: for model, suppose that all status informations that provide at set point all are can use moment; There is not neural the delay and system's use instantaneous feedback control in hypothetical model; The infrastructural frame that is used for control can only be selected vocal organs sensation reference frame space or auditory space reference frame, and both can not simultaneously and deposit; About cortex and sub-cortex processing procedure cut apart and the description of the relevance of brain region composition relative coarse.
Summary of the invention
Technical matters to be solved by this invention is the deficiency for the above-mentioned background technology, provides based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation.
The present invention adopts following technical scheme for achieving the above object:
Based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation, comprise the steps:
Step 1 makes up improved DIVA neural computing speech model: add the sound channel action knowledge base that acts on the simulation vocal organs in DIVA neural computing speech model;
Step 2, the formant frequency of collection pronunciation unit is as the input quantity of DIVA neural computing speech model;
Step 3 is mapped to the input quantity of DIVA neural network model in the voice mapping ensemblen, and all phoneme unit are unactivated state in the initialization voice mapping ensemblen;
Step 4, input be the peak frequency of shaking of pronunciation unit arbitrarily, trains based on the improved DIVA neural computing of sound channel action knowledge base speech model:
When the identical factor unit of the peak frequency of shaking of the pronunciation unit that is present in input in the voice mapping ensemblen, the simulation vocal organs directly send the pronunciation unit of input through feedforward control;
Otherwise the simulation vocal organs learn to send the pronunciation unit of input through FEEDBACK CONTROL.
Described based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation, the pronunciation unit embodiment that the simulation vocal organs described in the step 4 send input through FEEDBACK CONTROL is as follows:
Steps A imposes disturbance pronunciation unit to the simulation vocal organs, gathers audio feedback information, the somatesthesia feedback information of DIVA model, and somatesthesia error map collection obtains the somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B pronounces unit maps to sense of hearing state mapping collection with audio feedback information, the disturbance of DIVA model;
Step C, sense of hearing error map collection obtains the audio feedback order according to input quantity and the described simulation vocal organs audio feedback information of described DIVA neural network model;
Step D, vocal organs speed and position mapping ensemblen obtain the training burden of described simulation vocal organs according to somatesthesia feedback command, audio feedback order, the simulation vocal organs pronounce under the effect of sound channel action knowledge base.
The present invention adopts technique scheme, has following beneficial effect: reduce the frequency of training of model when producing pronunciation, improve the pronunciation accuracy.
Description of drawings
Fig. 1 is the block diagram of DIVA model.
Fig. 2 is the block diagram of sound channel vibration knowledge base.
Fig. 3 is the block diagram of improved DIVA model.
Embodiment
Be elaborated below in conjunction with the technical scheme of accompanying drawing to invention:
Provided the composition frame chart of sound channel action knowledge base model among Fig. 2.Comprise sense organ motion (sensory-motor), articulatory skill (speaking skills) and comparable psychological syllable (mental syllabary) in the sound channel action knowledge base.
The workflow of sound channel action knowledge base model is divided into voice generation and classification two stages of perception:
Voice produce the stage workflow: sound channel action knowledge base model activates the activation of the phoneme sign that starts from speech items, and this speech pattern is the syllable of processing one by one.In the situation of processing the high frequency syllable, model has obtained the planning campaign of high frequency syllable, and at first planning moves through the voice mapping ensemblen and is activated the sound channel action generation motor neuron enable mode that then each syllable is corresponding.Neuromuscular is subsequently processed the motion that has caused vocal organs, and allows to generate voice signal by pronunciation-auditory model.The sense organ state of the identical syllable that obtains previously activates simultaneously by the voice mapping ensemblen.State TS is corresponding with state ES among Fig. 3, and then produces current syllable.In the situation that there is notable difference, the sense of hearing and somatesthesia error signal are transmitted by the voice mapping, be used for changing one new or upgrade after the planning campaign of syllable.In the situation of low frequency syllable, activate the planning phonetically of similar syllable by the voice mapping ensemblen and activate the planning motion module and then produce the planning motion.
Classification perception stage model workflow is: speech perception starts from the generation of external voice signal.If be intended to phoneme recognition, must be that the signal of high frequency syllable could be realized.For this purpose, signal carries out pre-service in peripheral and hypodermal layer zone, and short-term memory is loaded into the external auditory state.Then its neuron activation pattern is passed to the physical training condition mapping ensemblen, at first causes the common activation in the neuron zone on the voice mapping level, secondly is the special neuronic common activation on the phoneme mapping ensemblen level; First represents the pronunciation of this syllable, the harmonious sounds of second this syllable of expression.This nerve pathway shines upon by voice, is also referred to as the speech perception of dorsal part nerve tract, also is planning motion of the common activation of high frequency syllable.Second nerve tract in the speech perception such as the veutro nerve tract, contacted directly sense of hearing enable mode and speech processing module.Suppose that the dorsal part nerve tract is very important in speech acquisition process, and the veutro nerve tract is occupied an leading position in adult speech's perception afterwards.
Improvement DIVA model of the present invention has added the sound channel action base module and the disturbance module that act on the simulation vocal organs as shown in Figure 3.
The model use different phonetic is trained 200 examples to the initialization of phoneme, sense organ, planning Motion mapping collection.Each example obtains in the period of babbling out one's first speech sounds and imitation period model, and " knowledge " is stored in the voice mapping ensemblen in the two-way neural mapping of other mapping ensemblens.Neuron is expressed as in the voice mapping ensemblen:
(a) realization of vowel or vowel consonant phoneme state;
(b) planning motion state;
(c) sense of hearing state;
(d) somatesthesia state.
Training experiment comprises babbling stage and imitation stage (being embodied) in the DIVA model.In the babbling stage, model is in the same place the planning motion state with sense of hearing state relation.On this basis, this model can produce the planning motion in the imitation exercise stage.
In stage, the phoneme zone has appearred on the voice mapping ensemblen level at imitation exercise.After having carried out these initial experiments, we have proceeded more complicated model language, comprise vowel--, syllable, this is based on a larger consonant collection to consonant vowel--and consonant vowel vowel--.Training has shown the strict ordering of a voice mapping ensemblen, the consonant type that this ordering is relevant to characteristics of speech sounds, phoneme alignment characteristics and troops again.
In order to understand workflow and the voice effect of the DIVA process after the improvement, we use the DIVA model after the improvement to carry out following study experiment:
1. five vowel systems/i, e, a, o, u/
2. little consonant system (by turbid plosive/b, d, the simple syllable that g/ and 5 vowels that obtain before are combined into)
3. a little language model comprises five vowel systems, turbid plosive and tenuis/b, d, g, p, t, k/, nasal sound/m, n/, sidetone/l/ and three syllable types (V, CV, CCV)
4. with modal 200 syllables in a child's of 6 years old the testing standard test English.
Step 1 makes up improved DIVA neural computing speech model: add the sound channel action knowledge base that acts on the simulation vocal organs in DIVA neural computing speech model;
Step 2, the formant frequency of collection pronunciation unit is as the input quantity of DIVA neural computing speech model;
Step 3 is mapped to the input quantity of DIVA neural network model in the voice mapping ensemblen, and all phoneme unit are unactivated state in the initialization voice mapping ensemblen;
Step 4, input be the peak frequency of shaking of pronunciation unit arbitrarily, trains based on the improved DIVA neural computing of sound channel action knowledge base speech model:
When the identical factor unit of the peak frequency of shaking of the pronunciation unit that is present in input in the voice mapping ensemblen, the simulation vocal organs directly send the pronunciation unit of input through feedforward control;
Otherwise the simulation vocal organs learn to send the pronunciation unit of input through FEEDBACK CONTROL.
In the step 4, the pronunciation unit embodiment that the simulation vocal organs send input through FEEDBACK CONTROL is as follows:
Steps A imposes disturbance pronunciation unit to the simulation vocal organs, gathers audio feedback information, the somatesthesia feedback information of DIVA model, and somatesthesia error map collection obtains the somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B pronounces unit maps to sense of hearing state mapping collection with audio feedback information, the disturbance of DIVA model;
Step C, sense of hearing error map collection obtains the audio feedback order according to input quantity and the described simulation vocal organs audio feedback information of described DIVA neural network model;
Step D, vocal organs speed and position mapping ensemblen obtain the training burden of described simulation vocal organs according to somatesthesia feedback command, audio feedback order, the simulation vocal organs pronounce under the effect of sound channel action knowledge base.
The disturbance unit maps of pronouncing is further to improve sense of hearing state mapping collection to the purpose of sense of hearing state mapping collection, the adding of sound channel action knowledge base is intended to the action of abundant simulation vocal organs, and then improve the pronunciation degree of accuracy, improve the learning efficiency of whole DIVA model.
Amended model sense organ motion and cognitive aspect integrate.The problem of the sternness that voice or sensorimotor model face in the sound language processing procedure is that modeling is carried out in the development of phoneme mapping ensemblen when voice not being obtained.We improve this problem, have introduced a feasible solution: be to obtain the incipient stage at the voice that do not have clear and definite introducing voice mapping ensemblen, make behavior knowledge base and mental lexicon direct-coupling.Like this our amended DIVA model compare with original model the pronunciation time delay less, accuracy is higher.
The present invention compared with prior art, have following significant advantage: the present invention is take the DIVA neural network model as the basis, Nervous System Anatomy and neuro-physiology level describe and emulation the correlation function that pronounces, model is added the disturbance module, make model can produce more efficient, accurately pronunciation; Model is added sound channel action base module enriched the original channel configuration of DIVA model, reduce the frequency of training of model when producing pronunciation, improve the pronunciation accuracy.The DIVA neural network model finally can by with the combination of brain-computer interface (BCI), construct the Chinese speech generation and the neural computation model that obtains that meet Chinese speech sounding rule, have real physiologic meaning, establish the theory and practice basis thereby further construct " thought reader " with Chinese's Thinking Characteristics.

Claims (2)

1. based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation, it is characterized in that, comprise the steps:
Step 1 makes up improved DIVA neural computing speech model: add the sound channel action knowledge base that acts on the simulation vocal organs in DIVA neural computing speech model;
Step 2, the formant frequency of collection pronunciation unit is as the input quantity of DIVA neural computing speech model;
Step 3 is mapped to the input quantity of DIVA neural network model in the voice mapping ensemblen, and all phoneme unit are unactivated state in the initialization voice mapping ensemblen;
Step 4, input be the peak frequency of shaking of pronunciation unit arbitrarily, trains based on the improved DIVA neural computing of sound channel action knowledge base speech model:
When the identical factor unit of the peak frequency of shaking of the pronunciation unit that is present in input in the voice mapping ensemblen, the simulation vocal organs directly send the pronunciation unit of input through feedforward control;
Otherwise the simulation vocal organs learn to send the pronunciation unit of input through FEEDBACK CONTROL.
2. according to claim 1ly it is characterized in that based on the improved DIVA neural network model of sound channel action knowledge base manner of articulation, the pronunciation unit embodiment that the simulation vocal organs described in the step 4 send input through FEEDBACK CONTROL is as follows:
Steps A imposes disturbance pronunciation unit to the simulation vocal organs, gathers audio feedback information, the somatesthesia feedback information of DIVA model, and somatesthesia error map collection obtains the somatesthesia feedback command by somatesthesia target area and somatesthesia feedback information;
Step B pronounces unit maps to sense of hearing state mapping collection with audio feedback information, the disturbance of DIVA model;
Step C, sense of hearing error map collection obtains the audio feedback order according to input quantity and the described simulation vocal organs audio feedback information of described DIVA neural network model;
Step D, vocal organs speed and position mapping ensemblen obtain the training burden of described simulation vocal organs according to somatesthesia feedback command, audio feedback order, the simulation vocal organs pronounce under the effect of sound channel action knowledge base.
CN201310274341.3A 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved Expired - Fee Related CN103310272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310274341.3A CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Publications (2)

Publication Number Publication Date
CN103310272A true CN103310272A (en) 2013-09-18
CN103310272B CN103310272B (en) 2016-06-08

Family

ID=49135459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310274341.3A Expired - Fee Related CN103310272B (en) 2013-07-02 2013-07-02 Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved

Country Status (1)

Country Link
CN (1) CN103310272B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104605845A (en) * 2015-01-30 2015-05-13 南京邮电大学 Electroencephalogram signal processing method based on DIVA model
CN104679249A (en) * 2015-03-06 2015-06-03 南京邮电大学 Method for implementing Chinese BCI (brain and computer interface) based on a DIVA (directional into velocities of articulators) model
CN107368895A (en) * 2016-05-13 2017-11-21 扬州大学 A kind of combination machine learning and the action knowledge extraction method planned automatically

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5586033A (en) * 1992-09-10 1996-12-17 Deere & Company Control system with neural network trained as general and local models
CN102201236A (en) * 2011-04-06 2011-09-28 中国人民解放军理工大学 Speaker recognition method combining Gaussian mixture model and quantum neural network
CN102880906A (en) * 2012-07-10 2013-01-16 南京邮电大学 Chinese vowel pronunciation method based on DIVA nerve network model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104605845A (en) * 2015-01-30 2015-05-13 南京邮电大学 Electroencephalogram signal processing method based on DIVA model
CN104679249A (en) * 2015-03-06 2015-06-03 南京邮电大学 Method for implementing Chinese BCI (brain and computer interface) based on a DIVA (directional into velocities of articulators) model
CN104679249B (en) * 2015-03-06 2017-07-07 南京邮电大学 A kind of Chinese brain-computer interface implementation method based on DIVA models
CN107368895A (en) * 2016-05-13 2017-11-21 扬州大学 A kind of combination machine learning and the action knowledge extraction method planned automatically

Also Published As

Publication number Publication date
CN103310272B (en) 2016-06-08

Similar Documents

Publication Publication Date Title
CN104538024B (en) Phoneme synthesizing method, device and equipment
Räsänen Computational modeling of phonetic and lexical learning in early language acquisition: Existing models and future directions
Kröger et al. Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception
Caponetti et al. Biologically inspired emotion recognition from speech
CN103165126A (en) Method for voice playing of mobile phone text short messages
Murakami et al. Seeing [u] aids vocal learning: Babbling and imitation of vowels using a 3D vocal tract model, reinforcement learning, and reservoir computing
Prom-on et al. Training an articulatory synthesizer with continuous acoustic data.
Prom-on et al. Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach
Rasilo et al. Feedback and imitation by a caregiver guides a virtual infant to learn native phonemes and the skill of speech inversion
CN103310272B (en) Based on the DIVA neural network model manner of articulation that sound channel action knowledge base is improved
Xu et al. The PENTA Model: Concepts, Use, and Implications
Kröger et al. Phonemic, sensory, and motor representations in an action-based neurocomputational model of speech production
Krug et al. Articulatory synthesis for data augmentation in phoneme recognition
Schröder Approaches to emotional expressivity in synthetic speech
Yu A Model for Evaluating the Quality of English Reading and Pronunciation Based on Computer Speech Recognition
Li Modular design of English pronunciation proficiency evaluation system based on Speech Recognition Technology
Uchida et al. Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion.
Davis The Cohort Model of auditory word recognition
Kröger et al. The LS Model (Lexicon-Syllabary Model)
Breidegard et al. Speech development by imitation
Liu Fundamental frequency modelling: An articulatory perspective with target approximation and deep learning
Lapthawan et al. Estimating underlying articulatory targets of Thai vowels by using deep learning based on generating synthetic samples from a 3D vocal tract model and data augmentation
Prom-on et al. Estimating vocal tract shapes of Thai vowels from contextual vowel variation
Gao Articulatory copy synthesis based on the speech synthesizer vocaltractlab
Fry Modeling the Acquisition of Intonation: A First Step

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130918

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Denomination of invention: Articulation method of Directions Into of Articulators (DIVA) neural network model improved on basis of track action knowledge base

Granted publication date: 20160608

License type: Common License

Record date: 20161109

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000207

Date of cancellation: 20180116

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160608

Termination date: 20190702