inn iiiiiii Hi mi mi Iiii ijii iiijiijijii mi nun mi mi mi
United States Patent
Junqua
(io) Patent No.: (45) Date of Patent:
US 7,069,214 B2 Jun. 27, 2006
(54;
(75 (73 (*
(21 (22 (65
(63 (51
(52;
(58 (56)
FACTORIZATION FOR GENERATING A
LIBRARY OF MOUTH SHAPES
Inventor: Jean-Claude Junqua, Santa Barbara,
CA (US)
Assignee: Matsushita Electric Industrial Co.,
Ltd., Osaka (JP)
Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 816 days.
Appl. No.: 10/095,813
Filed: Mar. 12, 2002
Prior Publication Data
US 2002/0152074 Al Oct. 17, 2002
Related U.S. Application Data
Continuation-in-part of application No. 09/792,928, filed on Feb. 26, 2001.
U.S. PATENT DOCUMENTS
5,608,839 A * 3/1997 Chen 704/235
6,112,177 A 8/2000 Cosatto et al.
6,188,776 Bl * 2/2001 Covell et al 382/100
2003/0072482 Al* 4/2003 Brand 382/154
OTHER PUBLICATIONS
Bregler et al. "Video Rewrite: Driving Visual Speech with
Audio," AVSP, 1997, pp. 153-156.*
Ezzat et al. "MikeTalk: A Talking Facial Display Based on
Morphing Visemes," Proc. of the Computer Animation
Conference, Philadelphia, Pa., Jun. 1998.*
Shih et al. "Efficient Adaptation of TTS Duration Model to
New Speakers," ICSLP, 1998.*
Bregler et al., "Video Rewrite: Driving Visual Speech with
Audio" Proc. ACM SIGGRAPH 1997, in Computer Graph-
ics Preceedings, Annual Conference Series, 1997.*
Bregler et al., "Video Rewrite: Visual Speech Synthesis
from Video" Proc. of the AVSP '97 Workshop, Rhodes
(Greece), Sep. 26-27, 1997.*
* cited by examiner
Primary Examiner—V. Paul Harper
(74) Attorney, Agent, or Firm—Harness, Dickey & Pierce, PLC
^ ABSTRACT
A library of mouth shapes is created by separating speakerdependent and speaker independent variability. Preferably, speaker dependent variability is modeled by a speaker space while the speaker independent variability (i.e. context dependency), is modeled by a set of normalized mouth shapes that need be built only once. Given a small amount of data from a new speaker, it is possible to construct a corresponding mouth shape library by estimating a point in speaker space that maximizes the likelihood of adaptation data and by combining speaker dependent and speaker independent variability. Creation of talking heads is simplified because creation of a library of mouth shapes is enabled with only a few mouth shape instances. To build the speaker space, a context independent mouth shape parametric representation is obtained. Then a supervector containing the set of context-independent mouth shapes is formed for each speaker included in the speaker space. Dimensionality reduction is used to find the areas of the speaker space.
20 Claims, 4 Drawing Sheets
![[blocks in formation]](http://www.google.fr/patents?id=1yd4AAAAEBAJ&hl=fr&ie=ISO-8859-1&output=text&pg=PA1&img=1&zoom=3&hl=fr&q=&cds=1&sig=ACfU3U0nMeExxEe6R5oQW91L_ZR6zUDRRg&edge=0&edge=stretch&ci=393,913,205,289)