WO2003049039A1

WO2003049039A1 - Performance-driven facial animation techniques

Info

Publication number: WO2003049039A1
Application number: PCT/GB2002/005418
Authority: WO
Inventors: Glyn Cowe; Alan Johnston
Original assignee: University College London
Priority date: 2001-12-01
Filing date: 2002-11-29
Publication date: 2003-06-12
Also published as: AU2002349141A1; GB0128863D0

Abstract

A method of generating a facial animation sequence comprising the steps of: a) observing a real facial image sequence - the original image sequence - and capturing the information generated thereby; b) aligning another facial image - the end image - in an appropriate manner co-ordinate-wise with the original one; c) analysing the information from the original image sequence mathematically; and d) using the results so obtained to drive the movements which generate the end image sequence. Principal components analysis is applied to successive frames from the original image sequence to generate the necessary co-ordinate frames characterising an individual's permissible facial actions and the resulting new sequence is then projected into the thus-defined co-ordinate frames to drive the end image accordingly. Alternatively the necessary vectorisations are generated by non-PCA-based analytical tecniques in which the analysis proceeds from vectorial bases.

Description

PERFORMANCE-DRIVEN FACIAL ANIMATION TECHNIQUES

Field of the Invention

The invention relates to performance-driven facial animation techniques and to films or other media products generated by the operation of such techniques.

Review of Art known to the Applicants

Performance-driven facial animation has previously been approached with a variety of tracking techniques and deformable facial models. Parke introduced the first computer generated facial model (Parke 1972; Parke 1974). A polygonal mesh was painted onto a face, then the face was photographed from two angles and the 3d location of each vertex calculated by measurement and geometry. His choice of a polygonal mesh has always been the most popular approach and was animated simply by interpolating between key- frames.

Williams tracked coloured markers on his own face and manually defined corresponding points on a three-dimensional laser-scanned polygonal mesh of a head with Hanning window (cosine fall-off) influence zones for the surrounding motion around each point (Williams 1990). Markers have been used often since in variations on this theme and the introduction of automated dot tracking techniques and multiple cameras for 3-d motion estimation has led to high quality results (Guenter, Grimm et al. 1998). Commercial packages are even available with automated dot tracking (eg. Famous Faces).

Sophisticated underlying muscle models, based on those of Platt and Badler (Platt and Badler 1981), have furthermore been incorporated to enable more anatomically realistic movements (Waters 1987; Terzopoulos and Waters 1993) and alternative tracking strategies, such as active contour models ('snakes') (Terzopoulos and Waters 1993), deformable templates (Yuille 1991) and more simplistic feature trackers (for cartoon animation) (Buck, Finkelstein et al. 2000) have been employed. Essa and Pentland (Essa and Pentland 1997) and Black and Yacoob (Black and Yacoob 1995) sought denser motion information by tracking facial motion with optical flow for recognition of expression.

It is, however, very difficult to fool a human observer to believe that a computer model is a real face. Tiddeman and Perrett demonstrated a technique, based on prototyping, which allows them to transform existing facial image sequences in dimensions such as age, race and sex (Tiddeman and Perrett 2001). Prototypes are generated by averaging shape and texture information from a set of similar images (same race, for example). Each frame from the sequence can then be transformed towards this prototype. Points (179) must be located in each image and, although this can be automated using active shape models (Cootes, Taylor et al. 1995), a set of examples must first be delineated; so manual intervention cannot be avoided. Although this is not technically performance-driven facial animation, a new face is generated driven by the original motion.

Video Rewrite, a system developed by Bregler et al., automatically lip-synchs existing footage to a new audio track (Bregler, Covell et al. 1997). The video sequence remains the same, except for the mouth. Only the voice of the other actor drives the mouth and their facial expressions are ignored. They track the lips using eigenpoints (Covell and Bregler 1996) and employ hidden Markov models to learn the deformations associated with phonemes from the original audio track. Mouth shapes are predicted for each frame from the new audio track, and these are incorporated into the existing sequence by warping and blending. An extension of this approach is described by Cosatto and Graf, driven by a text to speech synthesiser (Cosatto and Graf 2000). Ezzat and Poggio also describe similar work [Ezzat, 1998 #121; Ezzat, 2000 #122].

Development of the Invention

The invention is developed preferably around the application of principal components analysis to vectors representing faces; we now proceed to discuss previous work applying PCA in this area.

Principal components analysis (PCA) is a mathematical technique that extracts uncorrelated vectors from a set of correlated vectors, in order of the variance they account for within the set. Early components thus provide strong descriptors of change within the set and later vectors have less relevance.

Sirovich and Kirby were the first to apply PCA to vectorised images of faces, principally as a means of data compression (Sirovich and Kirby 1987). Face images were turned into vectors by concatenating rows of pixel-wise grey level intensity values and transposing . They demonstrated how the weighted sum of just a small number of principal components can be used to reconstruct recognisable faces, requiring only the storage of the weights. The principal components extracted from sets of facial images in this way are often termed eigenfaces and have been successfully applied since, particularly for facial recognition (Turk and Pentland 1991; Pentland, Moghaddam et al. 1994). A problem with the application of PCA on intensity values of images is blurring, since linearly combining images results in deterioration of sharp edges. By first aligning face images onto an average shape, blurring can be dramatically reduced. Shape and texture information can thus be separated for an improved vectorisation. Beymer, and Netter and Troje presented such improved vectorisations using optic flow to find pixel-to-pixel correspondences between images (Beymer 1995; Netter and Troje 1995). Once flow fields were extracted from each face to a chosen reference face, these could be averaged to define the mean shape and, for each face, shape could be encoded as the flow field deviation from this mean. By then warping faces onto the average, shape was removed, leaving only texture.

Although PCA has often been used to find axes of variation between people, variations within people have been considered less often. PCA has been applied to dot tracking data from facial sequences. Arslan et al. used it simply for dimensionality reduction in building codebooks relating acoustic data and phonemes to three-dimensional positions of dots for speech-driven facial animation (Arslan and Talkin 1998). Kshirsagar et al. used PCA on these vectors of dot positions and mapped a configuration associated with each phoneme into the principal component space (Kshirsagar, Molet et al. 2001).

Kuratate et al. captured laser scans of a face in eight different poses and used PCA to reduce the dimensionality of the data (Kuratate, Yehia et al. 1998). By relating the positions of a small number of points on the meshes to their principal component scores via a linear estimator, they were able to drive the 3D mesh by tracking points positioned analogously on an actor.

Summary of the Invention

It will be appreciated from the review above that most work in the field to date has centred around tracking the motion of an actor's face and transferring this on to a computer-generated model. The invention takes a new approach, applying mathematical analysis techniques to the information available from a real facial image sequence in order to enable that information to be used to drive the movements of another face appropriately co-ordinately aligned with the original. Given a set of examples of a particular face in motion, each example can be vectorised in a chosen manner. Once having established a set of high dimensional vectors representing examples of facial movements, one can define a subspace therein, with a co-ordinate system constraining deformations to these observed permissible actions. The resultant virtual avatar can be controlled by projecting novel deformations into this co-ordinate frame if the new sequence of movements is appropriately aligned with the original in position and scale (although this alignment need not be precise).

Specifically therefore the invention envisages an essentially space-based performance-driven facial animation technique in which frames from a preexisting real facial image sequence are analysed to generate a co-ordinate frame characterising an individual's permissible facial actions with a new sequence then being projected into the thus-defined co-ordinate frame to animate the end image accordingly.

The subsequent claims which will define the boundaries of the invention clearly include within their scope an image, for example a film image sequence, or other media product, generated by applying techniques in accordance with the invention in its broad conceptual scope.

In accordance with that overall approach there will now be described certain currently preferred embodiments of the inventive concept which demonstrate this space-based approach to performance-driven facial animation.

Brief Description of the Figures

The accompanying Figures 1 through 7 are derived from facial image sequences of two male subjects Glyn (the younger man) and Harry (the elder) with the recorded facial movements of Glyn being used to drive the facial end image of Harry as will be described below.

The detailed content of each individual Figure will become apparent as the description proceeds as will their individual relevance to the text with which they are inter-referenced.

Description of Currently Preferred Techniques for putting the Invention into Practice

This detailed description begins by presenting an example of vectorisation, demonstrating how images from a facial image sequence can be represented in vector form as pixel-wise intensity variations from their mean. The space-based approach underlying the invention is then outlined, and is finally shown to be generahsable to more sophisticated vectorisations, all essentially by way of example of current work on the concept.

Proceeding then to this detailed explanation and expansion of the concept:

A simple vectorisation: Facial motion as pixel-wise intensity variations

Consider an nxm image to be a vector of grey level intensity values, one value for each pixel of the image. These vectors (of length N = nxm) can be thought of as representing locations in an N-dimensional image space. Now consider a set of M frames from a continuous recorded sequence of a face, x,, x₂,...,x_w , where each frame has been converted to a long vector by concatenating each row and transposing (Figure 1 Figure 1 ).

Since frames from a continuous recorded facial sequence tend to vary smoothly, these images will generally be clustered together in this space, centred

A M approximately on their mean, μ = — Y x, . Figure 2 shows how each face, x, in the set can be considered as a linear translation, φ, from the mean, φ = x - μ (note that φ is displayed in a different range with zero as mid-level grey, so that negative and positive values are visible).

In order to move around the subspace occupied by these particular vectors, we can set up a co-ordinate system that spans it using the examples as a basis. This space will necessarily have dimensionality of at most M, but it is unlikely that this will form a good description, since two or more example faces may be of a similar configuration and image noise will be responsible for most of the variance in those dimensions. By application of a mathematical technique, known as principal components analysis, we can define a new improved orthonormal coordinate system centred on μ, which optimally spans this subspace, with axes chosen in order of descriptive importance. That is, basis vectors are defined sequentially, each chosen to point in the direction of maximum variance unaccounted for so far by their predecessors, subject always to the constraint of orthonormality. Since noise tends to be uncorrelated, vectors describing this will be of low importance in the hierarchy and can be later discarded by truncation to a lower dimensionality.

Principal Components Analysis - Creating a Puppet

Principal components analysis is a mathematical technique that seeks to linearly transform a set of correlated N-dimensional variables, { x,,x₂,...,x_w }, into an

uncorrelated set that best describes the data, termed principal components, { u,,u₂,...,u_w } (Chatfield and Collins 1980). For simplicity, we translate the data

to a new set, { φ,,φ₂,...,φ_w }, centred on the set's mean, μ, simply by subtracting

it from each datum, φ, = x, -μ . We define Φ = [φ,,φ₂,...,φ_M] , the matrix with

columns consisting of the φ,'s. We proceed to show that these principal components, sequentially chosen to maximise the variance thus far accounted for, subject to the constraints of orthonormality, turn out simply to be the eigenvectors of the covariance matrix for { x,,x₂,...,x_yW }.

First principal component

Consider first ui. This is our first principal component and so must point in the direction of maximum variance of the data set. We thus wish to choose our first basis vector, such that the magnitude of the projection of each member of the dataset onto u, is optimal,

∑(φ, *u,)² M

-^ώ = ∑(φ, - u₁)² (3.1)

(since orthonormality of our basis set dictates that u, must have a magnitude of 1). This can be represented in matrix form as

(u,^rΦ)(u_I ^rΦ)^ϊ' = 11^11, (3.2) where Σ = ΦΦ T . It should be noted that ∑ is, by definition, the covariance -l matrix of the set of image vectors (recall that the <p,'s are centred on their mean) and 1 u, T ∑u, gives a measure of the variance in the set that Ui accounts for.

M -l

T

Orthonormality adds the constraint that U_[ u, = 1. Introducing a Lagrange multiplier, λ_\, we can define a new function, Zι(u, ),

Z,(u₁) = u₁ ^r∑u₁

- 1) (3.3)

Employing the procedure of Lagrange multipliers, maximisation is now a case of

finding when — - = 0 , du_t

2∑u_j - 2Λ,u, (3.4) du, dL. Setting — = 0 , we have

∑u.^u, (3.5)

This leaves us with an eigenvalue problem, where candidate solutions are the eigenvectors of Σ. Pre-multiplying by uι^r yields, u, ∑u, =^u, u, =A, (3.6) which is the very function we sought to maximise (3.2), so the optimal solution is necessarily the eigenvector associated with the largest eigenvalue of Σ.

Second principal component

To find the second principal component, u₂, we need, similarly, to maximise

subject to u u₂ = 1 and u,^ru₂ = 0 (3.8)

With two Lagrange multipliers, λ_ and δ, we define the new function ,₂(u₂),

Z,₂(u₂) = u₂ Σi^-A^i^ u₂-ι)-£u, u₂ (3.9)

Λ T

Again, maximisation is now a case of finding when — - = 0 , which leaves us δu₂ with

dL 2 _ = 2∑u₂-21₂u₂-<5u₁=0 (3.10) du.

Pre-multiplying by uι^r,

2u, !Lx_₂-2^u_x u₂-£u, u, = 0 (3.11) this reduces to

2u,^r∑u₂= (3.12)

due to orthonormality constraints (3.8). Rearranging this, we can use symmetry of Σ, (3.5) and (3.8) together, to show that δ= 0: δ = 2u,^r∑u₂ = 2(∑u,)^ru₂ = 2(λ_iu_])^τu₂ = 2^u,^ru₂ = 0 (3.13)

This reduces (3.10) to

■^²- = 2Σu₂ - 2^u₂ = 0 (3.14)

<5u₂ which leaves us again with the eigensystem,

Σu₂ = Λ₂u₂ (3.15)

Since

is already the eigenvector associated with the largest eigenvalue, the next best solution will be the eigenvector associated with the second largest eigenvalue.

The other principal components

By continuing this process for each j e [l, ], with the constraints u u, = 1 and u u = 0 for all i < j, it is apparent that the principal components are the eigenvectors of Σ (or, equivalently, the eigenvectors of the set's covariance matrix) ordered by magnitude of their associated eigenvalues, λ_p

∑u , = λ_yιι_y (3.16)

Pre-multiplying (3.16) by u₇, we see that

1 T Since the variance accounted for by u, is given by u , ∑u , , the

corresponding eigenvalues provide a measure of this, differing only by scaling. With consideration towards these variances, lower order components can be discarded as noise, thus reducing the dimensionality to some P < M.

Figure 3 shows the first five principal components from an image sequence of Harry speaking, vectorised as described previously. Together, these mere five principal components account for 75% of the variance in the sequence of 317 frames. In each case, the central column always shows the mean image from the sequence. The left and right columns show the images two standard deviations, 2σ, away from the mean in the negative and positive directions respectively for each principal component. More explicitly, for rowy^", from left to right, images are λ, ^■ 2σu , , μ and μ + 2σu , , where standard deviation, σ = . \ — - — ' ^y V -1

Reducing computation

Computationally, finding the eigenvalues and eigenvectors of the NxN matrix,

Σ = ΦΦ^r , is difficult due to its large size. We therefore look to find the eigenvalues and eigenvectors of the MxM matrix Φ^rΦ when «N,

Φ^rΦv = Av (3.18)

This can be exploited, since, pre-multiplying each side by Φ, φφ^rΦv = AΦv (3.19) and adding some parentheses,

(ΦΦ^r)(Φv) = ;i(Φv) (3.20) we see that Φ^rΦ and ΦΦ^r share the same eigenvalues, and that, if v is an eigenvector of Φ^rΦ , then u = Φv will be an eigenvector of ΦΦ^r . This provides us with a useful computational shortcut.

For particularly large values of M and N , however, memory constraints sometimes make it impractical to store the matrix of outer products, whether it be ΦΦ^T or Φ^τ Φ . In such situations, the first P principal components can be learned by a neural network [Sanger, 1989 #112], or can be extracted using a convergence algorithm [Roweis, 1998 #160].

Projecting into face space

Having found a new co-ordinate system representing an individual's face space, we can project any facial movement, ξ, from a sequence of any individual into this space, provided it is vectorised in the same manner, centred on its own sequence mean and roughly aligned (just performing a simple affine transform to bring the two eye positions and two mouth corners into correspondence has been found to be sufficient).

Given a set of M_tram training vectors from individual one (the face we wish to drive), x,,x, Ϊ_W , and a set of M_drιve driving vectors from individual two (the face that will be doing the driving), y,,y₂,...,y_M , we centre them both on their means and find matrices Φ and Ψ, such that Φ = {φ₁,φ₂,...,φ_Λγ } , where φ, ^{= x}, -μ,_ra,„ > ^{and Ψ =} {Ψ_P Ψ₂>-> Ψ_Λ,„ > where ψ, = y, - μ_dnve . Principal components analysis provides us with a set of basis vectors, u u₂,...,u , where R ≤ M_tram. We project into the new lower dimensional co-ordinate frame provided by the principal components by employing the basis transformation matrix, U = { u,,u₂,...,u_p }. For example, to project the N-dimensional vector, ψ, , into the R- dimensional subspace described by the principal components basis, we apply c, = U^rψ, (3.21)

Each element of c, represents a weighting on the respective basis vector.

An optional rescaling step can be included, where the distribution of c/s can be transformed so the means and standard deviations of the weights associated with each basis vector match those for the training set. The distribution can also be rescaled for exaggeration or anti-exaggeration purposes.

In order to transform the projection, C_; , back to N-dimensional space translated to the standard origin, we apply the inverse transformation and add the training mean,

Z, = Uc, +μ_(rfl,_π (3.22)

In the case of the pixel-wise intensity vectorisation, the new Nx 1 vector, z„ is then rearranged into n rows of m elements, to form an nxm image. Figures 3 and 4 demonstrate typical results from this process for the vectorisation defined.

A face space is defined for Harry (the first five dimensions of which are shown in

Figure 3). The top row of Figure 4 shows five frames from a real image sequence of Glyn telling a joke. These are then projected into Harry's face space using the procedure defined, and the resulting images, transformed back to image space, are shown below their corresponding frames.

Alternative Nectorisations

Alternative vectorisations could be employed to define the original space of an individual's facial movement. By concatenating the three colour planes, RGB colour images can be vectorised and the procedure outlined above can be applied.

A clear problem with the examples presented previously, however, is the blur inherent in linearly combining images. Given a facial image sequence, one approach for evading this drawback is to choose an arbitrary frame to be a reference and define the remaining frames in terms of warps from this single frame.

Figure 5 demonstrates this warping approach. Here we represent each frame, I, as a matrix containing the colour information for each pixel of the image, for example as an RGB triple. We write I(x, y) to represent the colour information for the pixel at (x, y). We choose the image shown in (a) as a reference; although this is a somewhat arbitrary choice, we select the image closest to the luminance mean, additionally ensuring it is in a 'neutral' pose with eyes open and mouth slightly open; this is because, for example, an open mouth can be warped onto a closed mouth, but a closed mouth cannot be warped onto an open mouth.

In order to warp from one image to another, it is necessary to be able to find pixel-wise correspondences between them. There are a variety of approaches for estimating these correspondences, but in these examples we apply an optic flow algorithm.

The Multi-channel Gradient Model (McGM)

The Multi-channel Gradient Model (McGM) is an optic flow algorithm modelled on the processing of the human visual system (Johnston, McOwan et al. 1999). We chose to apply the model to just two images for each frame, the reference and the target, since optic flow provides only an estimate and errors would be disproportionately magnified for frames temporally further from the reference, were fields to be combined over time. Some adaptation is thus required for this application, since the McGM would usually have a large temporal buffer of images to work with and, in this case, we have only two. This can be overcome by replacing the zero* and first temporal derivatives with their average and difference, respectively, and discarding all those of higher order.

A coarse-to-fine implementation of the McGM was applied at three spatial scales, 0.25, 0.5 and 1.0, progressively warping a reference facial image onto the target frame.

Warping a reference

By application of the McGM, we can find the flow field (U, V), that takes us from a reference Q to the target frame, P, (shown in (b)), where U and V are matrices containing the horizontal and vertical components of the field, respectively, for each location (x, y). The target can be approximately reconstructed from the reference and the flow field, by backward mapping:

¥(x,y) * R(x,y) = Q(x-V(x,y),y-V(x,y)) (1.1)

where R is the reconstruction. Since (x - V(x, y), y - V( , y)) will rarely correspond exactly to pixel locations in Q, an interpolation technique is employed. Here we use bilinear interpolation. All images in the sequence can be represented as warps from Q and the entire sequence can be reconstructed by warping this one reference frame. Each vector field (U, V) can be vectorised, by concatenating each row of U and V, joining them and transposing to form one long vector.

Once facial image sequences from two individuals have been vectorised in this manner, one can be driven by the other by application of the procedure as described previously.

Results

Figure 6 Figure 6 shows the first five principal components from Harry's sequence, vectorised as warps from a reference as described above. The middle column shows the chosen reference image and the left and right columns show the warp -2 standard deviations and +2 standard deviations respectively in the direction of each shown component. Together, these five components account for 85% of the variance in the whole set.

Projecting vectors from Glyn's sequence into this space results in a new sequence with Harry mimicking Glyn's facial movements. Five frames from this are shown below their corresponding frames in Figure 7 (using only 20 principal components, which account for 94% of the variance).

A difficulty with this vectorisation is the appearance of features of the face that were previously obscured. If there is no evidence of a feature in the reference image, then it is not possible to generate these features with a warp only. Teeth, for example, are often occluded by the lips. We refer to such changes as iconic. A vectorisation based on the luminance or RGB values in an image will capture such iconic changes, although with the disadvantage of blurring, so a combined approach can be applied. Figure 8Figure 8 outlines such an approach, where images from the training and driving sets are reverse warped onto their respective references to provide stabilised sequences where remaining changes should essentially consist of iconic and lighting variations only. A first basis can be extracted from the set of forward training warps and a second basis can be extracted from the image-based vectorisations (luminance, or RGB, etc.) of the stabilised training sequence. We refer to the first basis as the configural basis, and the second as the image basis. There should be little blurring in the image basis, since the stabilised images that it is generated from will be aligned. Once the driving flow fields are projected onto the configural basis, and the stabilised driving images are projected onto the image basis, a new sequence can be generated by applying the projected flow fields onto the projected stabilised images, thus incorporating iconic changes with minimal blurring.

Alternatively, the feature aligned texture information and the configural information (in the form of the flow fields), can be combined together into one single vector for each frame of the sequence. The basis can then be extracted as before from this information. Such vectors can then be converted into images by simply warping the texture component by its configural component.

Mapping between vectorisations

Since the basis sets discussed are generated from linear combinations of the examples, it is possible to apply the same weights to the examples in an alternative vectorisation, thus producing a (non-orthogonal) version of the basis set in a second encoding. Each frame from a novel sequence can then be encoded in the first mean-centred vectorisation as ψ, , projected onto the -dimensional basis for that vectorisation, A , then decoded using the second -dimensional basis B . The encoding step would thus be (from 3.21) c, = A^rψ, (3.23)

The decoding step would then be (from 3.22)

Z, = Bc, +μ,_ra,. (3.24) Despite the non-orthogonality of the second basis set, this technique has been found to work well and allows the possibility of encoding fast in a low quality, low resolution basis, then projecting onto a much better quality, high resolution basis set, crucially enabling high quality real-time implementation of the technology.

Conclusions

It is possible to generate a low-dimensional co-ordinate frame in high- dimensional space that encapsulates the dimensions of movement for an actor. Another's movements can then be projected into this space.

Only movements that can be made by combinations of those experienced in the training phase can be projected onto another face, since only those are represented by the basis set. This may seem to be a limitation, but can equally be considered advantageous, since only movements faithful to the target's repertoire can be made. It would, for example, be unnatural to see someone wink, or raise their eyebrows, were they not normally able to.

It is to be noted that PCA is not the only way to generate a set of bases. The original vectors from the training sequence could be used, for example (the transformation matrix U, would then be the matrix generated with these vectors as its columns, after normalisation, and the inverse transformation matrix would be its pseudoinverse rather than U^τ). PCA happens to be particularly good because it orders the bases in terms of descriptive importance, so noise can be truncated away, and orthogonality is enforced, so no pseudoinverses need to be calculated.

The scope of the claims which follow is to be interpreted accordingly.

The whole process, involving the vectorisations presented, requires no manual intervention other than the approximate alignment of the two sequences in position, rotation and scale. This is very desirable in a field where tracking often necessitates much tedious manual labour.

Specific Prior Art References

Arslan, L. M. and D. Talkin (1998). 3-D face point traiectorv synthesis using an automatically derived visual phoneme similarity matrix. Auditory- Visual Speech

Processing, Terrigal, NSW, Australia.

Beymer, D. (1995). Vectorizing face images by interleaving shape and texture computations, Massachusetts Institute of Technology.

Black, M. J. and Y. Yacoob (1995). Tracking and recognising rigid and non-rigid facial motions using local parametric models of image motions. International

Conference of Computer Vision.

Bregler, C, M. Covell, et al. (1997). Video Rewrite: driving visual speech with audio. SIGGRAPH Conference on Computer Graphics, Los Angeles, California.

Buck, I., A. Finkelstein, et al. (2000). Performance-driven hand-drawn animation.

Proceedings of the first international symposium of Non-photorealistic animation and rendering.

Chatfϊeld, C. and A. J. Collins (1980). Introduction to Multivariate Analysis.

London, Chapman and Hall.

Cootes, T., C. Taylor, et al. (1995). "Active shape models- their training and application." Computer Vision. Graphics and Image Understanding 61(1): 38-59.

Cosatto, E. and H. P. Graf (2000). "Photo-realistic talking heads from image samples." IEEE Transactions on Multimedia 2(3 : 152-163.

Covell, M. and C. Bregler (1996). Eigenpoints. International Conference for

Image Processing, Lausanne, Switzerland.

Essa, I. and A. Pentland (1997). "Coding, analysis, interpretation, and recognition of facial expressions." IEEE Transactions on Pattern Analysis and Machine

Intelligence 19(7): 757-763. Ezzat, T. and T. Poggio (1998). MikeTalk: A talking facial display based on morphing visemes. Computer Animation Conference,

Philadelphia, Pennsylvania. Ezzat, T. and T. Poggio (2000). "Visual speech synthesis by morphing visemes."

International Journal of Computer Vision 38(1): 45-57.

Guenter, B., C. Grimm, et al. (1998). Making faces. ACM SIGGRAPH, Orlando,

FL.

Johnston, A., P. W. McOwan, et al. (1999). "Robust velocity computation from a biologically motivated model of motion perception." Proceedings of the Royal

Society of London B266: 509-518.

Kshirsagar, S., T. Molet, et al. (2001). Principal components of expressive speech animation. Computer Graphics International, Hong Kong, IEEE Computer

Society.

Kuratate, T., H. Yehia, et al. (1998). Kinematics-based synthesis of realistic talking faces. Auditory- Visual Speech Processing, Terrigal, NSW, Australia.

Parke, F. I. (1972). Computer generated animation of faces. Salt Lake City,

Univesity of Utah.

Parke, F. I. (1974). A Parametric model for human faces. Salt Lake City,

University of Utah.

Pentland, A., B. Moghaddam, et al. (1994). View-based and modular eigenspaces for face recognition. Proc. Computer Vision and Pattern Recognition Conference.

Platt, S. M. and N. I. Badler (1981). "Animating facial expression." Computer

Graphics 15(3): 245-252.

Roweis, S. (1998). "EM algorithms for PCA and SPCA." Advances in Neural

Information Processing Systems 10.

Sanger, T. D. (1989). "Optimal unsupervised learning in a single-layer linear feedforward neural network." Neural Networks 2: 459-473.

Sirovich, L. and M. Kirby (1987). "Low-dimensional procedure for the characterization of human faces." Journal of the Optical Society of America A

4(3): 519-524.

Terzopoulos, D. and K. Waters (1993). "Analysis and synthesis of facial image sequences using physical and anatomical models." IEEE Transactions on Pattern

Analysis and Machine Intelligence 15(6): 569-579. Tiddeman, B. and D. Perrett (2001). Moving facial image transformations using static 2D prototypes. The 9th International Conference in Central Europe on

Computer Graphics, Visualisation and Computer Vision, Plzen, Czech Republic.

Turk, M. and A. Pentland (1991). "Eigenfaces for recognition." Journal of

Cognitive Neuroscience 3: 71-86.

Vetter, T. and N. Troje (1995). A separated linear shape and texture space for modeling two-dimensional images of human faces, Max-Planck-Institut fur biologische Kybernetik.

Waters, K. (1987). "A muscle model for animating three-dimensional facial expression." Computer Graphics 21(4): 17-24.

Williams, L. (1990). "Performance driven facial animation." Computer Graphics

24(4): 235-242.

Yuille, A. L. (1991). "Deformable templates for face recognition." Journal of

Cognitive Neuroscience 3(1): 59-70.

Claims

1. A method of generating a facial animation sequence comprising the steps of:

a) observing a real facial image sequence - the original image sequence - and capturing the information generated thereby;

b) aligning another facial image - the end image - in an appropriate manner co-ordinate-wise with the original one;

c) analysing the information from the original image sequence mathematically; and

d) using the results so obtained to drive the movements which generate the end image sequence.

2. A method in accordance with Claim 1 and in which principal components' analysis is applied to successive frames from the original image sequence to generate respective co-ordinate frames characterising an individual's permissible facial actions and the resulting new sequence then being projected into the thus- defined co-ordinate frames to drive and animate the end image accordingly.

3. A method in accordance with Claim 1 and in which the necessary vectorisations are generated by non-PCA-based analytical techniques in which the analysis proceeds from vectorial bases.

4. A method in accordance with Claims 1, 2 and 3 and further incorporating a post-processing step wherein the distribution of the resulting weights on each basis vector is statistically transformed to match the mean and standard deviation of its equivalents in the training set, or is statistically altered for exaggeration or anti-exaggeration purposes.

5. A facial animation technique incorporating all the essential steps of any one of the methods embodied in an appropriate combination of the teachings disclosed herein.

6. An image, for example a film image sequence, or other media product, generated by applying techniques in accordance with any of the preceding Claims.

7. A method employing multiple vectorisations as a means of combining separate encodings of both configural and iconic change as defined in the text.

8. A method employing two separate co-ordinate frames, with related axes, where the first is used for encoding novel movements and the second is used for constructing the end images.