WO2010097452A1 - A character animation tool - Google Patents

A character animation tool Download PDF

Info

Publication number
WO2010097452A1
WO2010097452A1 PCT/EP2010/052445 EP2010052445W WO2010097452A1 WO 2010097452 A1 WO2010097452 A1 WO 2010097452A1 EP 2010052445 W EP2010052445 W EP 2010052445W WO 2010097452 A1 WO2010097452 A1 WO 2010097452A1
Authority
WO
WIPO (PCT)
Prior art keywords
vowel
speech
character
stress
animation
Prior art date
Application number
PCT/EP2010/052445
Other languages
French (fr)
Inventor
Charlie Cullen
Original Assignee
Dublin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dublin Institute Of Technology filed Critical Dublin Institute Of Technology
Publication of WO2010097452A1 publication Critical patent/WO2010097452A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Character animation is key to successful and engaging computer animations whether it be for media such as in movies or in computer games. A known difficulty in animation is the linking of the motions of an animated character with spoken words. The present application solves this problem by detecting the locations of vowels in a piece of speech and determining a stress value for the detected vowels and then animating the characters at the vowel locations in a manner consistent with the determined stress values.

Description

A CHARACTER ANIMATION TOOL
Field
The present application is directed to the field of computer animation, in particular to software tools and production workflow solutions for computer animation.
Background
Character animation is key to successful and engaging computer animations whether it be for media such as in movies or in computer games. A known difficulty in animation is the linking of the motions of an animated character with spoken words. Software is known that animates a characters mouth in response to a speech signal, as a result of which an animated characters is seen to appear to utter the words. Examples of such general techniques are described in KR20010038772 and WO1997036288. Whilst the approach of this technique is useful, the results tend to be regarded by viewers as unnatural. Other techniques have been employed which attempt to process speech to match an animated characters mouth. Again the results of this tend to be unnatural.
Some systems have made investigations into the role of more extensive face and body movements (notably the MIT BEAT prototype). The BEAT system performs linguistic analysis of synthesized text-to- speech (TTS) audio output in an attempt to predict the formal structure of the associated gestures and movements. However the benefits of this system are limited and artificial insofar as the system only operates on synthesized speech.
Summary
It has been identified by the inventor that the prior art methods whilst somewhat effective are lacking. The inventor has appreciated that in human communication linguistic content only accounts for about 7%, with the acoustic properties of speech (rhythm and prosody) accounting for a further 38% or so. Moreover, he has appreciated that the majority of human communication relies on subtle movements and more expansive gestures that comprise 55% of our interactions. The present application focuses on providing these subtle movements and more expansive gestures and relies upon the rhythm and prosody of the speech signal rather than the linguistic content (as with current speech recognition and lip-synching algorithms) to provide a simple system for assigning these movements and gestures to an animated character. Thus the presently described technology places the emphasis of animation on the same criteria that humans use in communication. The approach of the "stress tagging animation" technique may be compared with the human operated characters such as muppets, which adopt a similar approach of concentrating on the rhythms of hand/head movements rather than lip-synching accuracy. The system presented herein by providing a simple list of events prioritized by rhythm and prosody, allows developers to easily match speech with movements, in contrast to most animations which are built from scratch. With "Stress tagging", content may be re-used and characters and voices easily changed as the timing and priority of animation events resides with the speech signal. This allows for automated tools to be provided that allocate animation to speech events, rather than the converse. In addition, as the "stress tagging framework" focuses on acoustic attributes, it is completely language independent. Thus, tools are envisaged where a particular character may be developed to respond to "stress tags", allowing it to be re-purposed in any language desired as often as needed. The present application employs a pre-defined library of movements and gestures for several distinct characters (as examples), that may be quickly allocated to the prioritized speech events on a manual, semi-automatic or fully automated basis as required by the production.
A first embodiment provides a speech analysis system for assisting in the animation of at least one character in response to a piece of speech. The system comprises a memory for storing the piece of speech and a vowel locator for identifying the locations of vowels within the piece of speech. A vowel stress detector identifies the degree of stress associated with each identified vowel and stores the associated degree of stress for each location.
The vowel locator may determine the duration of each vowel. Suitably, the piece of speech is stored in a database in the memory. The database may store timestamps indicating locations of vowels and durations of vowels and\or a stress value for each vowel.
The vowel stress detector may score at least one characteristic of each vowel against a reference value for the characteristic. This reference value may be determined by averaging the characteristic over a windowed section of the piece of speech. The windowed section may comprise the entire piece of speech. The characteristic may comprise one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e) jitter, and f) voice breaks.
Preferably, the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c) duration.
The animation tool may provide a character animation feature employing the locations of vowels as a trigger for a character's motion. The motion of a character selected at a particular location may be determined with reference to the degree of stress for that location. The motion of the character is automatically selected based upon the degree of stress. The animation tool may allow an animator to select a particular motion from a list presented, suitably where the list is populated with possible character motions based upon the degree of stress. The list is presented for each vowel location allows an animator to select an animated character's motion at each vowel location.
Description of Drawings
Figure 1 is a block diagram of an exemplary system according to the present application, Figure 2 is a flow chart for exemplary methods according to the present application, and Figure 3 is a graphical user interface for use with the system or method of Figures 2 or 3.
Detailed Description
The present application will now be described with reference to some exemplary methods and systems, in which speech data is provided to a voice analysis system 2 which in turn analyses the speech data to identify the locations of vowels and the corresponding stress levels of these vowels. The inputted speech is desirably monophonic in nature. The speech 1 may be directly inputted, for example by means of a microphone. Alternatively, a pre-corded piece of speech may be employed. A database 3 stored in local memory or external memory may be employed to store different items of speech content. It will be appreciated that such a database may be readily constructed by one skilled in the art. In addition to storing the items of speech content, the database may store the results of analysis performed upon the items of speech content by a vowel locator engine 4 and vowel stress detector 5. The operation of which will be explained in greater detail below. The voice analysis system may be any general purpose computer including those operating under Windows™' Macintosh™ or Linux™ operating systems. The Analysis stage of the system may be performed by any suitable set of DSP audio analysis algorithms, such as provided within MATLAB™ as provided by The MathWorks, Inc., Natick, USA, or the specific speech software Praat (Boersma, Paul & Weenink, David (2009). Praat: doing phonetics by computer (Version 5.1) [Computer program]. Retrieved January 31, 2009, from http://www.praat.org/) Boersma, Paul & Weenink, David (2009). Praat: doing phonetics by computer (Version 5.1) [Computer program]. Retrieved January 31, 2009, from http://www.praat.org/or purpose built SDK's like MS Speech. The Animation tool may be implemented by any suitably configured animation engine, such as Adobe Flash in AS3 .
Once the voice analysis system has performed an analysis, the system can provide vowel stress information 6 to an animation tool 7. The manner and mode of use of the vowel stress information by the animation tool is explained below. The animation tool 7 may operate on the same computing system as the voice analysis system 2 or operate on a separate computing system. Similarly, the animation tool may be provided within the same software program as the vowel locator and vowel stress detectors or separate programs may be employed for each.
The mode and manner of operation of the system 2 and animation tool 7 will now be explained with reference to some exemplary modes of operation, shown in Figure 2, in which the analysis steps 20 are shown separate to the animation steps 23.
The method commences with a recorded piece of speech content which is to be used with an animated character. The piece of speech may be a single item, e.g. a sentence, or it comprises an entire vocabulary for the character in which different phrases are combined into an overall speech recording. This overall speech recording may be used for example as a library of speech from which different pieces may be retrieved as required.
A primary step in the method, where necessary, may be employed to convert the piece of speech from stereo to monophonic speech. It will become apparent that whilst stereo speech may be employed by analyzing the Left and Right channels, that for the present purposes it is simpler and more efficient to use a monophonic form of speech. A variety of techniques are known for creating a monophonic signal from a stereo signal, including the abandonment of one channel or the simple addition of the two channels.
The monophonic speech piece is then passed through a vowel detector which is employed to detect the positions of vowels in the piece of speech. Where a vowel is located, its position is marked with a time stamp. Each time stamp suitably identifies the location and duration of the associated vowel. The piece of speech and the associated time stamps may be stored together in the database. Vowel detection techniques are well known in the art. One exemplary technique would employ a simple intensity derivative detector, which takes the differential of the input wave to obtain maxima (vowel peaks). The vowel analysis may, for example, be performed using the FFT algorithm provided as part of the Flash AS3 core sound classes available from Adobe Systems Incorporated.
Each vowel is analysed for a number of prosodic characteristics. Each prosodic characteristic is then compared with an overall mean for the particular prosodic characteristic for the entire speech clip. Exemplary prosodic characteristics which are employed include pitch, intensity and duration. These characteristics have been identified as being particularly important prosodic attributes in human speech.
Other characteristics that may be employed would include, for example but not limited to, voice quality, jitter and voice breaks. The exemplary method described herein uses a simple scoring system and applies it to the characteristics of each vowel. This scoring system ignores interrelationships between characteristics and treats individual characteristics separately and evenly, i.e. each characteristic is scored identically. It will be appreciated that the scoring system may however be adapted to include a weighted scoring formula.
In the exemplary method however, the individual characteristics (pitch, intensity and duration) of each vowel is compared with the mean for the piece of speech as a whole. Where a characteristic for a vowel has a value which exceeds the average then the vowel receives a score of 1, where two characteristics exceed the mean value, the vowel receives a score of 2 and so on. Thus where the pitch of a vowel is above the average pitch for the piece of speech and where the duration of the vowel exceeds the average duration and the intensity exceeds the average intensity, the vowel would receive a score of 3. This score is stored with the timestamp for the vowel in the database. As a result, the speech, vowel locations and importance (score) of each vowel location are stored or related together within the database.
The values stored in the database may then be employed with a character animation tool by automatically\semi automatically linking gestures to the locations of the time stamped vowels. In particular, the analysis tool may export an XML file for a piece of speech to the animation tool in which the speech is embedded along with information identifying the locations and scores of vowels.
Character animation tools are well known in the art and the techniques employed would be readily familiar to the skilled person. One common technique is the use of games physics to animate characters based on particular inputs as provided, for example, by an animator. These inputs are converted into motion of the character on the screen. The advantage of these animation tools is that the animator does not have to specify the precise movements for a character between frames. Instead, for example, the start point and end points might be detailed over a particular time span and the animation tool using appropriate mathematics can effectively interpolate the characters movements for each frame between the start and end points.
The present system employs such a tool and provides the timestamp and scoring data with the speech data to the character animation tool. The character animation tool employs the scoring data as an input at each identified time stamp. In an automated mode of operation, different scores may be associated with different character actions or character features. For example, a score of one might be associated with a character winking, whereas a score of two might be associated with movement of the hands and a score of three might be associated with head movement. The characters action is timed to occur at the timestamp and for the duration of the timestamp. An exemplary screen shot from an animation tool using the present methods is shown in Figure 3 in which a section of speech content is represented along an abbreviated time line 64. The section of speech is selectable from the entire piece of speech content which is represented in a smaller scale (graphical section 53). One or more slider features 55a, 55b allows a user to select a section of speech from an overall time line 52 for the speech content. Other features including for example a moving window allow a user to select the region of the speech content to be represented by the abbreviated time line. The vowel stress information is represented by dots 57 for the complete item of speech and by diamonds 60, 58, 56, 54 in a separate region 50 for the abbreviated time line.
The character to be animated is represented in a character region 62 above the time line with a variety of different actions (in this example the hand movements). Each diamond represents a vowel, with the degree of stress identified by differently identified diamonds. Thus in the exemplary screenshot shown, the scoring system described above was used with a maximum score of 3. The stress is thus represented by the relative height on the screen of the diamonds with diamonds with a score of 3 being higher than diamonds with a score of 2 etc. In addition, the diamonds contain a numeric representation of the score. Similarly, the colours of the diamonds may be different to identify different scores, e.g. a diamond representing a score of three could be red, one with a score of 2 could be blue and a score of one might be colored green. To assist the animator, sections with no speech may also be represented 54. When an animator is using the tool, they may move along the time line selecting individual diamonds. As a diamond is selected, using a mouse for example, a motion selection tool may appear, e.g. a drop down list, allowing the animator to select an action for a character. Different actions can be pre-assigned to each drop down list with different levels, i.e. minor actions assigned to lower stress levels and major actions assigned to higher stress levels. The animator can thus select a major action from the list of major actions for a diamond with a value of 3 and a minor action from a list of minor actions presented for a diamond with a value of 1.
The animation tool generates and stores the character actions in response to the animator's selection. It will be appreciated that the speed with which the animation may be completed is extremely fast since the animator does not need to focus on timing or content. Suitably, the animation tool is one that allows for layering, thus the animator may use one layer to store the characters actions resulting from the speech above with other layers employed to account for a characters general movements about a scene. Whilst this approach may appear relatively primitive compared to animation generally, the reality is that lipVnouth movement is only used by humans for linguistic information, which in itself accounts for a very small percentage of communication (approx 7%) with the large majority of communication hence being performed by motion of other features (55%). More importantly, the context of the exact gesture is less important than the rhythm of the gestures and the present method by tying the gestures into vowel locations and into the relative importance of vowels in the speech provides an effective animation tool. The automatic animation tool is obviously of importance in situations where an animator is not involved to produce the final piece of content, e.g. in a video game, where a character's actions whilst depending on prerecorded speech content may have other inputs, e.g. from a player.
In a semi-automatic arrangement, the tool allows a user to select from different actions for each time stamp. Thus an animator can select different actions from a dropdown box for each timestamp. In this scenario, the contents of the drop down list may be selected based on the associated score for the timestamp.
This character animation technique employs the use of acoustic, linguistic and emotional speech analysis to semi-automatically generate gestures and body movements in response to the acoustic parameters in a character's voice.
The present application provides a platform that enables the creation of computer animations for use in a wide number of applications. It is cutting edge given that instead of basing animation on lip-synch, it uses speech events (acoustic, linguistic and emotional) to both manually and automatically define character movements, gestures and facial positions. The techniques have been demonstrated to work in practice. A software front end as described above has been implemented that takes in user data
(speech) and produces a corresponding animation that is close to half complete in a fraction of the time that would be required by an animator using traditional techniques..
The techniques described herein may be used to produce cheaper, faster and more effective character animations in films, games, children's TV programmes and advertisements. The advantages include lower costs since the overall production overhead is reduced since character animation events may be characterised by non-animators based on a speech clip, freeing animators to work on other aspects of the animation process. Moreover, the animation process is faster since it is semi-automated using pre-defined libraries that allow up to 70% of the animation to be achieved without customization by an animator. The system is character independent, so that the gesture and movement libraries and characters may easily be changed.
In contrast to prior art methods, the system is largely language independent in that the techniques may be used to semi-automate characters in any spoken language.
The technology is character and language independent and the use of re-usable and pre-defined gesture / movement libraries makes it a cheap, fast and effective alternative to conventional character animation techniques. The systems of the present application have been implemented with a variety of different characters has been tested in various languages and with various voices and the potential to reduce production costs, save time and streamline workflows have been clearly demonstrated. The process and resulting system is essentially a labour saving device that allows animators to achieve better production values in a shorter period of time, given that it takes care of 70% of the ground work-allowing animators to focus on the nuance and detail of the overall animated output.
The animation tool described above may be provided as a plug-in with proprietary animation software, including for example Autodesk Maya and Autodesk 3DSMax both from Autodesk, Inc. of San Rafael, USA. It will be appreciated that an important feature of the Autodesk applications is their openness to third -party software. However, it will be appreciated that the present teaching may be employed within any animation tool generally. It will also be appreciated that the animation may be 2D or 3D in nature.
As described previously, the software may employ a pre-defined library of movements and gestures for a character that may be quickly allocated to the prioritized speech events on a fully automated basis. In such a context, it will be appreciated that a piece of animation incorporating an animated character whose movements have been generated in response to a segment of speech may be outputted as a piece of video content, e.g. a MPEG file, in which the animation is provided with the speech. This outputted content may be an animated cartoon for entertainment purposes or it may be educational purposes for example a language learning product in which spoken phrases are presented to a person learning a language.

Claims

Claims
1. A computer implemented method of animating a character's actions to a piece of speech, the method comprising the steps of: analyzing the piece of speech to identify at least one location of a vowel, determining the degree of stress associated with the at least one identified vowel location and selecting the characters action at that at least one location based on the determined degree of stress.
2. A method according to claim 1 wherein the duration of the at least one vowel is determined.
3. A method according to claim 1 or claim 2 wherein the degree of stress is determined by comparing at least one characteristic of each vowel against a reference value for the characteristic.
4. A method according to claim 3, wherein the reference value is determined by averaging the characteristic over a windowed section of the piece of speech.
5. A method according to claim 3 wherein the windowed section comprises the entire piece of speech.
6. A method according to any one of claims 1 to 5, wherein the at least one characteristic comprises one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e) jitter, and f) voice breaks.
7. A method according to any one of claims 1 to 6, wherein the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c) duration.
8. A method according to any preceding claim, wherein the locations of vowels are used as a trigger for a character's motion in the animation.
9. A method according to claim 8 wherein the character's motion at a location is determined with reference to the degree of stress for that location.
10. A method according to claim 9, wherein the motion of the character is automatically selected based upon the degree of stress.
11. A method according to claim 9, further comprising presenting an animator with a list of possible character motions and allowing the animator to select a particular motion from the list.
12. A method according to claim 11, wherein the list is populated with possible character motions based upon the degree of stress.
13. A method according to claim 11 or claim 12, wherein the list is presented for each vowel location allowing an animator to select an animated character's motion at each vowel location.
14. A computer implemented speech analysis system for assisting in the animation of at least one character to a piece of speech, the system comprising: a memory for storing the piece of speech, a vowel locator for identifying the locations of vowels within the stored piece of speech, a vowel stress detector for identifying the degree of stress associated with each identified vowel and storing the associated degree of stress for each location for use in an animation process.
15. A speech analysis system according to claim 14, wherein the vowel locator identifies the duration of each vowel.
16. A system according to claim 14 or claim 15, wherein the piece of speech is stored in a database in the memory.
17. A system according to claim 16, wherein the database stores timestamps indicating locations of vowels and durations of vowels.
18. A system according to claim 17, wherein the database further stores a stress value for each vowel.
19. A system according to any one of claims 14 to 18, wherein the vowel stress detector scores at least one characteristic of each vowel against a reference value for the characteristic.
20. A system according to claim 19, wherein the reference value is determined by averaging the characteristic over a windowed section of the piece of speech.
21. A system according to claim 20, wherein the windowed section comprises the entire piece of speech.
22. A system according to anyone of claims 19 to 21, wherein the at least one characteristic comprises one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e) jitter, and f) voice breaks.
23. A system according to anyone of claims 19 to 22, wherein the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c) duration.
24. An animation system comprising the system according to any one of claims 14 to 23, wherein the animation tool provides a character animation feature employing the locations of vowels as a trigger for a character's motion.
25. A system according to claim 24, wherein the motion of a character selected at a particular location is determined with reference to the degree of stress for that location.
26. A system according to claim 25, wherein the motion of the character is automatically selected based upon the degree of stress.
27. A system according to claim 25, wherein the animation tool allows an animator to select a particular motion from a list presented.
28. A system according to claim 27, wherein the list is populated with possible character motions based upon the degree of stress.
29. A system according to claim 27 or claim 28, wherein the list is presented for each vowel location allowing an animator to select an animated character's motion at each vowel location.
PCT/EP2010/052445 2009-02-26 2010-02-25 A character animation tool WO2010097452A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0903270.7 2009-02-26
GB0903270A GB2468140A (en) 2009-02-26 2009-02-26 A character animation tool which associates stress values with the locations of vowels

Publications (1)

Publication Number Publication Date
WO2010097452A1 true WO2010097452A1 (en) 2010-09-02

Family

ID=40565755

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/052445 WO2010097452A1 (en) 2009-02-26 2010-02-25 A character animation tool

Country Status (2)

Country Link
GB (1) GB2468140A (en)
WO (1) WO2010097452A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
WO1997036288A1 (en) 1996-03-26 1997-10-02 British Telecommunications Plc Image synthesis
KR20010038772A (en) 1999-10-27 2001-05-15 최창석 Automatic and adaptive synchronization method of image frame using speech duration time in the system integrated with speech and face animation
EP1326445A2 (en) * 2001-12-20 2003-07-09 Matsushita Electric Industrial Co., Ltd. Virtual television phone apparatus
FR2906056A1 (en) * 2006-09-15 2008-03-21 Cantoche Production Sa METHOD AND SYSTEM FOR ANIMATING A REAL-TIME AVATAR FROM THE VOICE OF AN INTERLOCUTOR

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1991981A (en) * 2005-12-29 2007-07-04 摩托罗拉公司 Method for voice data classification
FR2905510B1 (en) * 2006-09-01 2009-04-10 Voxler Soc Par Actions Simplif REAL-TIME VOICE ANALYSIS METHOD FOR REAL-TIME CONTROL OF A DIGITAL MEMBER AND ASSOCIATED DEVICE

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
WO1997036288A1 (en) 1996-03-26 1997-10-02 British Telecommunications Plc Image synthesis
KR20010038772A (en) 1999-10-27 2001-05-15 최창석 Automatic and adaptive synchronization method of image frame using speech duration time in the system integrated with speech and face animation
EP1326445A2 (en) * 2001-12-20 2003-07-09 Matsushita Electric Industrial Co., Ltd. Virtual television phone apparatus
FR2906056A1 (en) * 2006-09-15 2008-03-21 Cantoche Production Sa METHOD AND SYSTEM FOR ANIMATING A REAL-TIME AVATAR FROM THE VOICE OF AN INTERLOCUTOR

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
C. CULLEN, B. VAUGHAN, S. KOUSIDIS AND J. MCAULEY: "Emotional Speech Corpora for Analys", 3RD INTERNATIONAL CONFERENCE ON SEMANTIC AND DIGITAL MEDIA TECHNOLOGIES, 2 December 2008 (2008-12-02) - 5 December 2008 (2008-12-05), Koblenz,DE, XP002579189, Retrieved from the Internet <URL:http://resources.smile.deri.ie/conference/2008/samt/Short/178_short.pdf> [retrieved on 20100421] *

Also Published As

Publication number Publication date
GB0903270D0 (en) 2009-04-08
GB2468140A (en) 2010-09-01

Similar Documents

Publication Publication Date Title
US10839825B2 (en) System and method for animated lip synchronization
Marsella et al. Virtual character performance from speech
EP3226245B1 (en) System and method to insert visual subtitles in videos
US7571099B2 (en) Voice synthesis device
Albrecht et al. Automatic generation of non-verbal facial expressions from speech
JP2011516954A (en) How to change the display based on user instructions
Fernández-Baena et al. Gesture synthesis adapted to speech emphasis
CN111145777A (en) Virtual image display method and device, electronic equipment and storage medium
Bozkurt et al. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures
Liu et al. Realistic facial expression synthesis for an image-based talking head
KR101089184B1 (en) Method and system for providing a speech and expression of emotion in 3D charactor
Gibbon et al. Audio-visual and multimodal speech-based systems
Naert et al. Coarticulation analysis for sign language synthesis
CN116309984A (en) Mouth shape animation generation method and system based on text driving
Bozkurt et al. Affect-expressive hand gestures synthesis and animation
JP2015038725A (en) Utterance animation generation device, method, and program
Tao et al. Emotional Chinese talking head system
Brooke et al. Two-and three-dimensional audio-visual speech synthesis
Liu et al. Real-time speech-driven animation of expressive talking faces
WO2010097452A1 (en) A character animation tool
Verma et al. Animating expressive faces across languages
Kolivand et al. Realistic lip syncing for virtual character using common viseme set
Pan et al. VOCAL: Vowel and Consonant Layering for Expressive Animator-Centric Singing Animation
Altarawneh et al. Leveraging Cloud-based Tools to Talk with Robots.
Liu et al. Evaluation of an image-based talking head with realistic facial expression and head motion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10708166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10708166

Country of ref document: EP

Kind code of ref document: A1