US20090041310A1 - Video-based face recognition using probabilistic appearance manifolds - Google Patents

Video-based face recognition using probabilistic appearance manifolds Download PDF

Info

Publication number
US20090041310A1
US20090041310A1 US10/703,288 US70328803A US2009041310A1 US 20090041310 A1 US20090041310 A1 US 20090041310A1 US 70328803 A US70328803 A US 70328803A US 2009041310 A1 US2009041310 A1 US 2009041310A1
Authority
US
United States
Prior art keywords
image
recognition
identification information
pose
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/703,288
Other versions
US7499574B1 (en
Inventor
Ming-Hsuan Yang
Jeffrey Ho
Kuang-chih Lee
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honda Motor Co Ltd
Original Assignee
Honda Motor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honda Motor Co Ltd filed Critical Honda Motor Co Ltd
Priority to US10/703,288 priority Critical patent/US7499574B1/en
Assigned to HONDA MOTOR CO., LTD. reassignment HONDA MOTOR CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HO, JEFFREY, LEE, KUANG-CHIH, YANG, MING-HSUAN
Publication of US20090041310A1 publication Critical patent/US20090041310A1/en
Application granted granted Critical
Publication of US7499574B1 publication Critical patent/US7499574B1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • This invention relates generally to object recognition by computers, and more specifically, to facial recognition techniques applied to a sequence of images.
  • Computer vision through object recognition is a giant step in computer intelligence that provides a myriad of new capabilities. Facial recognition has particularly valuable applications in verifying a person's identity, robot interaction with humans, security surveillance, etc. With a reliable facial recognition system, computers can provide security clearance for authorized individuals, and robots can perform a set of actions designed for specific individual. However, when currently available facial recognition systems perform identifications, they are limited to basing such identification on a single image generated under ideal circumstances. Examples of currently available facial recognition systems include R. Chellappa, C. L. et al. “Human and Machine Recognition of Faces: A Survey,” Proceedings of the IEEE (1995); A. Samal et al. “Automatic recognition and Analysis of Human Faces and Facial Expressions: A Survey,” Pattern Recognition (1992); and W. Y. Zhao et al. “Face Recognition: A Literature Survey,” Technical Report CAR-TR-948, Center for Automation Research, University of Maryland (2000).
  • Obstacles between the individual's face and the camera create additional problems for conventional recognition systems. Since those systems are incapable of distinguishing an obstacle from the individual's face in a resulting image, the obstacle distorts any following comparisons. As with facial rotations, occluded faces can also prevent the camera from collecting sufficient data.
  • a problem related to non-ideal circumstances is that typical recognition systems use a single image, so if the single image is distorted, the identification will be affected. False identification can consequently result in security breaches, and the like. Even systems that incorporate more than one image in recognition, such as temporal voting techniques, are susceptible to false identifications. Temporal voting techniques make an identification for a first image, make an independent identification for a second image, and so on, in basing recognition the most frequent independent identification. Examples of temporal voting techniques include A. J. Howell and H. Buxton, “Towards Unconstrained Face Recognition From Image Sequences,” Proc. IEEE Int'l Conf. On Automatic Face and Gesture Recognition (1996); G.
  • One embodiment of the present invention comprises a manifold recognition module to perform identification using a sequence of images.
  • a manifold training module receives a plurality of training image sequences (e.g., from a video camera), each training image sequence including an individual in a plurality of poses, and establishes relationships between the images including a target individual for recognition, and identifies the target individual based on the relationship of training images corresponding to the recognition images.
  • a partitioning module partitions the training image sequence into pose manifolds, or groups of images related to a pose.
  • the union of pose manifolds defines an appearance manifold for an individual.
  • a linear approximation module transforms the nonlinear appearance manifold into a linear function by linearizing each pose.
  • a transition probability module generates a matrix of probabilistic relationships between the images of the training image sequence.
  • an appearance manifold uses transition probabilities to determine which appearance manifold is closest to the recognition image sequence. In one embodiment, if identification information at a time instance falls below an identification threshold, there is no positive identification for that time. Also, positive identifications can change over time based on current identification information.
  • an occlusion module masks occluded portions of an individual's face to prevent distorted identifications.
  • the occlusion module compares a representative pose image to a recognition image to find portions of the recognition image (e.g., individual pixels) with large variations from the pose image.
  • a mask adjustment module examines a binary image based on the variations to construct a weighted mask. Masked portions of an image have a reduced influence on identification information.
  • FIG. 1 is a block diagram illustrating a system for manifold appearance recognition according to one embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating the manifold recognition module according to one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating the manifold training module according to one embodiment of the present invention.
  • FIG. 4 is a block diagram illustrating the probabilistic identity module according to one embodiment of the present invention.
  • FIG. 5 is a block diagram illustrating the occlusion module according to one embodiment of the present invention.
  • FIG. 6 is a flow chart illustrating a method of manifold appearance recognition according to one embodiment of the present invention.
  • FIG. 7 is a flow chart illustrating the method of populating the manifold database according to one embodiment of the present invention.
  • FIG. 8 is an illustration of the plurality of training image sequences according to one embodiment of the present invention.
  • FIG. 9 is an illustration of the image partitions with a linearly approximated appearance manifold comprising pose manifolds according to one embodiment of the present invention.
  • FIG. 10 is an illustration of the transition probability matrix according to one embodiment of the present invention.
  • FIG. 11 is a flow chart illustrating a method of recognizing the individual from a plurality of individuals according to one embodiment of the present invention.
  • FIG. 12 is a flow chart illustrating the method of determining the pose manifold according to one embodiment of the present invention.
  • FIG. 13 is an illustration of a recognition image projected onto a pose according to one embodiment of the present invention.
  • FIG. 14 is an illustration of a sequence of recognition images projected onto two appearance manifolds according to one embodiment of the present invention.
  • FIG. 15 is a flow chart illustrating the method of determining an occlusion adjustment according to one embodiment of the present invention.
  • FIGS. 16( a )-( c ) illustrate occluded images according to one embodiment of the present invention.
  • FIG. 1 is a block diagram illustrating a system for manifold appearance recognition according to one embodiment of the present invention.
  • the system 100 is, for example a robot, a verification system, a security system, or the like.
  • the system 100 comprises a computing environment 105 coupled to a video camera 110 and an output device 120 .
  • the video camera 110 generates a sequence of images that are used for both the training and recognition processes.
  • the images can include an individual.
  • the video camera is, for example, a robot eye, a verification camera, a surveillance camera, or any camera capable of generating the sequence of images.
  • a first video camera is used for training and a second video camera is used for recognition.
  • the system 100 is loaded with images from a source other than a video camera.
  • the output device 120 is, for example, a display, robot control system, a security response system or any other device that receives recognition output from the computing environment 105 .
  • the computing environment 105 further comprises an input/output controller 130 , a processor 140 , a memory 150 , and data storage 160 , each of which is coupled to a bus 199 .
  • the input/output controller 130 receives video data from the video camera 110 for processing and sends processed video data to the output device 120 .
  • the processor 140 such a Pentium 4 by Intel Corp. of Santa Clara, Calif., an Athlon XP by Advanced Micro Devices, Inc. of Sunnyvale, Calif., an ASIC or an FPG, executes instructions and manipulates data.
  • the memory 150 provides volatile storage of software and data such as the manifold recognition module 155 shown in FIG. 2 .
  • the data storage 160 provides non-volatile storage of data such as the manifold database 165 shown in FIG. 3 .
  • the computing environment 105 may be a separate device such as a computer, or a system of components integrated into a host environment such as a robot or a vehicle.
  • the described components can be implemented in hardware, in software executing within hardware, or a combination.
  • the computing environment 105 includes other components such as an operating system, a BIOS, a user interface, etc. While the computing environment 105 and its components may be optimized for appearance manifold recognition, it may also be used for other computing tasks. Methods operating with the system 100 are described below.
  • FIG. 2 is a block diagram illustrating the manifold recognition module according to one embodiment of the present invention.
  • the manifold recognition module 155 comprises a video buffer 210 , a manifold training module 220 , and a probabilistic identity module 230 .
  • the video buffer 210 receives video data representing the sequence of images from the video camera 110 as shown in FIG. 8 .
  • the video buffer 210 signals the manifold training module 220 to indicate that a training image or sequence of training images is ready for processing.
  • original and manipulated images are stored in the video buffer 210 .
  • the video buffer 210 stores the training images in the manifold database 165 .
  • the video buffer 210 signals the probabilistic identity module that a recognition image or sequence of recognition images is ready for processing.
  • the video buffer 210 is, for example, a portion of the memory 150 , a separate system, processor memory, or otherwise.
  • the manifold training module 220 generates a plurality of appearance manifolds from the plurality of training images. Each sequence of training images contains an individual in a variety of poses. The manifold training module 220 processes the images to generate an appearance manifold comprising several pose manifolds. The manifold training module 220 also determines a matrix of probabilities describing the likelihood of a first pose in a first image being followed by a second pose in a second image. The manifold training module 220 stores its results in the manifold training database 165 in association with the individual's image training sequence. The manifold training module 220 and related methods are described in greater detail below.
  • the probabilistic identity module 230 receives a plurality of recognition images that contain a target individual to be recognized from the plurality of appearance manifolds.
  • the probabilistic identity module 230 considers the interrelationships between recognition images to generate identification information.
  • the identification information relates to more than one possible identity when there is no single positive identity above an identity threshold.
  • FIG. 3 is a block diagram illustrating the manifold training module according to one embodiment of the present invention.
  • the manifold training module 220 comprises a partitioning module 310 , a linear approximation module 320 , and a transition probability module 330 .
  • the partitioning module 310 generates pose manifolds by grouping related training images into partitions.
  • the partitioning module 310 uses k-means clustering for grouping.
  • the images of each partition are slight variations of a common pose.
  • a pose variation is a two-dimensional or three-dimensional rotation with respect to a reference pose. For example, the individual is facing forward in a first partition, is facing to the right in a second partition, and is facing to the left in a third partition. Additionally, the individual can be facing up or down, have a head tilt, or any combination of the above.
  • the linear approximation module 320 generates a linear approximation of the several pose manifolds. For each pose manifold, the linear approximation module 320 uses PCA (Principal Component Analysis) to determine subspaces represented as affine planes. In one embodiment, the partitioning module 310 generates a representative pose image by combining each partition image into a single image as shown in FIG. 10 .
  • PCA Principal Component Analysis
  • the transition probability module 330 incorporates temporal coherency by determining the relationship between pose manifolds.
  • the relationship determined by a distance between pose manifolds in the appearance manifold, is expressed as a probability that, given a first pose of the individual, a second pose will follow.
  • the probabilities are represented as conditional probabilities.
  • FIG. 4 is a block diagram illustrating the probabilistic identity module according to one embodiment of the present invention.
  • the probabilistic identity module 230 comprises an appearance manifold module 420 , and an occlusion module 420 .
  • the identification control module 410 identifies the target individual by determining which pose manifold is closest to the recognition images.
  • the identification control module 410 assesses identification information, and if it meets an identification threshold, outputs the identification.
  • the appearance manifold module 420 determines identification information based on a target individual's pose variations over time.
  • the identification information may comprise one or more potential identifications.
  • a first recognition image may be nearly the same distance to pose manifolds of two individuals, so the appearance manifold 420 continues to consider both individuals as the potential target individual for the following recognition images.
  • the determination is based on the likelihood of a current pose manifold in a current recognition image given previous pose manifolds in previous recognition images.
  • the appearance manifold module 420 calculates a joint conditional probability comprising a transition probability between the current and immediately previous poses, and an immediately previous joint conditional probability result.
  • the occlusion module 420 determines identification information by masking portions of the target individual that are blocked from view from the identification process.
  • the occlusion module 420 further comprises a mask generation module 510 and a mask adjustment module 520 .
  • the mask generation module 510 generates an occlusion mask by determining which pixel clusters have the greatest variance from the representative pose image.
  • the mask adjustment module 520 reduces the weighting of masked pixels or removes them from the identification process.
  • FIG. 6 is a flow chart illustrating a method of manifold appearance recognition according to one embodiment of the present invention.
  • the process initializes 605 in response to receiving an image in the computing environment 105 from the video camera 110 .
  • the manifold recognition module 155 operates in two phases: in a training phase, the manifold training module 220 populates 620 the manifold database 165 with a plurality of individuals by analyzing the training image sequence; and in a recognition phase, the probabilistic identity module 230 recognizes 620 a target individual 620 from a plurality of individuals.
  • FIG. 7 is a flow chart illustrating the method of populating 610 the manifold database 165 according to one embodiment of the present invention.
  • the population step 610 initializes responsive to receiving 710 , in the video buffer 210 , a set S k of one or more consecutive training images I l for an individual k as expressed in (1):
  • training images are distinguished from recognition images by a user, a bit, a signal, or in response to operating in the training phase.
  • a user can mount a hard drive having training image sequences to the computing environment 105 and initialize training analysis through a user interface.
  • an individual within view of the video camera 110 moves naturally or in a predetermined pattern of two-dimensional and three-dimensional movements.
  • the manifold recognition module 155 enters the training phase, thereby treating received images as training images.
  • FIG. 8 is an illustration of the plurality of training image sequences according to one embodiment of the present invention.
  • the plurality of training images 800 comprises several rows of training image sequences 810 a - l, each row representing a different individual (e.g., 810 a ).
  • An image sequence represents an individual in varying poses over time. Subsequent images may be categorized as separate poses, or a variant of the same pose as shown in FIG. 9 . Time increments between subsequent images are, for example, 1/24 th , 1/30 th , 1/60 th of a second, or any other time increment.
  • the image format is, for example, JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), BMP (BitMaP), TIFF (Tagged Image File Format), or the like.
  • the partitioning module 310 partitions 720 the training image sequence into m disjoint subsets. Each disjoint subset represents a pose and variations thereof as shown in FIG. 9 .
  • Pose variations are two-dimensional or three-dimensional rotations between images.
  • Example poses include: facing forward, facing left, facing right, facing up, facing down, or the like.
  • Example variations of the facing forward pose include: head tilting left, head tilting right, or the like.
  • the above variations are poses themselves. Indeed, one of ordinary skill of the art will recognize that the poses and variations are merely illustrative, and that many other embodiments are within the scope of the present invention.
  • the partitioning module 310 uses a k-means clustering algorithm for grouping images.
  • k-means clustering is described in D. Hochbaum and D. Shmoys “A Best Possible Heuristic For the K-Center Problem,” Mathematics of Operations Research (1985).
  • L 2 Hausdorff distance
  • the partition centers are recalculated based on a distance between the center and grouped images.
  • the optimized partition centers are pose manifolds (C k ).
  • the total set of pose manifolds comprise an appearance manifold (M k ) for an individual.
  • FIG. 9 is an illustration of the image partitions with a linearly approximated appearance manifold comprising pose manifolds according to one embodiment of the present invention.
  • the appearance manifold 910 is initially a nonlinear image space representation of the training image sequence 911 .
  • the image partitions comprise a left-facing pose 920 a , a front-facing pose 920 b , and a right-facing pose 920 c .
  • Each pose has three associated images that are variations of main poses with a linear approximation representing that subspace 915 a - c. The union of the subspaces forms the linear approximation of the appearance module.
  • the linear approximation module 320 constructs the linear approximations 915 a - c by calculating a PCA plane of fixed dimension for the images.
  • the PCA plane i.e., subspace
  • the PCA plane is constructed to provide a compact low-dimensional representation of an object (e.g., face, mug, or any 3D object).
  • the transition probability module 330 determines 740 transition probabilities, or temporal coherence, between training images. Transition probabilities between abutting pose manifolds is shown in FIG. 9 .
  • C k2 ) 912 represents the probability the second pose follows the first. Further, the expression P(C k2
  • a more comprehensive set of relationships between poses is represented in a transition matrix as shown in FIG. 10 .
  • the transition probability module 330 determines transitional probabilities by counting the actual transitions between different disjoint subsets S i observed in the image sequence as shown in (2):
  • FIG. 10 is an illustration of the transition probability matrix according to one embodiment of the present invention.
  • the poses have an associated image, which is generated in one embodiment by combining partition images such as the three images in the first partition 910 .
  • the brighter blocks have a higher transition probability.
  • Pose 1 is thus much more likely to follow pose 2 than pose 5 because rather than transitioning directly from the left-facing pose to the right-facing pose over two consecutive images, it is highly likely to pass through at least one intermediate pose. This process puts a first order Markov process or finite state machine over a piecewise linear structure.
  • the transition probability module 330 stores 750 results of the transition probability matrix in the manifold database 165 for use in the recognition phase.
  • the process 610 continues populating the database if there are more individuals 760 . Otherwise, it returns 795 until called for recognition.
  • FIG. 11 is a flow chart illustrating a method of recognizing the individual from a plurality of individuals according to one embodiment of the present invention.
  • the process initializes responsive to receiving 1110 one or more recognition images. For example, when an individuals requests authentication and stands in proper view, a video camera sends the sequence of recognition images to the video buffer 210 .
  • the appearance manifold module 420 determines 1120 which pose manifold, from the plurality of pose manifolds associated with the plurality of individuals, is closest to a first recognition image.
  • FIG. 12 is a flow chart illustrating the method of determining the pose manifold according to one embodiment of the present invention.
  • the first recognition is projected 1210 onto a plurality of pose manifolds, either from the same appearance manifold as shown in FIG. 13 or different appearance manifolds.
  • the appearance manifold module 420 identifies 1170 an individual associated with a closest appearance manifold. If there are more recognition images 1180 , the process repeats. Otherwise, it returns 1195 and ends 695 .
  • the appearance manifold module 420 determines 1220 which pose manifold is closest to the first image in the image space. Examples of image plotting in the ambient space are shown in H. Murase and S. K. Nayar “Visual Learning and Recognition of 3-D Objects From Appearance,” Int'l J. Computer Vision (1995). Identification information includes 1230 the appearance manifold associated with the closest pose manifold. The individual associated with the closest pose manifold is a candidate for the target individual's identity. In one embodiment, the identification information includes statistics on more than one candidate since the leading candidate can change based on following recognition images considered in combination with the first recognition image.
  • FIG. 13 is an illustration of a recognition image projected onto a closest pose manifold according to one embodiment of the present invention.
  • a point I 1310 is a vector representation of a recognition image.
  • the linearized appearance manifold comprises the set of pose manifolds C k1 through C k6 1320 a - f.
  • the variable x 1330 represents a minimum distance d H 1340 between the image 1310 and set of pose manifolds 1320 a - f, which in this case is C k4 1320 d . Assuming that x 1330 is the closest to among pose manifolds associated with other appearance manifolds, an individual associated with the shown appearance manifold 1350 is the leading candidate for identification.
  • an identity k* is determined by finding the appearance manifold M k with the minimum distance d h to the recognition image I as shown in expression (4):
  • the appearance manifold module 420 determines 1140 which appearance manifolds are closest to the recognition image sequence.
  • the appearance manifold module 420 projects the second image onto the plurality of pose manifolds to find the pose manifold closest to the second image in the ambient space.
  • the minimum distance is derived from the second image considered in combination with the first image to incorporate temporal dependency. In one embodiment, the minimum distance is expressed as (6) through (8):
  • x 1330 is a point on the manifold M k for 1 ⁇ i ⁇ m.
  • a second identification information is based on the probability that the second pose module follows the first pose module.
  • the probability is determined from a conditional joint probability including the transition probability from the training phase as expressed in (9) through (11) using Baye's rule:
  • is a normalization term to ensure a proper probability distribution.
  • the appearance manifold module 420 determines the second identification information from preprocessed identification information rather than an image.
  • the preprocessed identification information is derived as discussed above from previous image.
  • the appearance manifold module 420 outputs the identity to an output device 120 .
  • the process if the second identification information is not above the identification threshold, parallel statistics are maintained for possible identifications. As the process continues through iterations in response to receiving more recognition images, identification information is updated, and a positive identification is made when the identification threshold is surpassed. Whether or not a positive identification has been made, the process constantly updates identification information responsive to receiving new images.
  • FIG. 14 is an illustration of a sequence of recognition images projected onto two appearance manifolds according to one embodiment of the present invention.
  • the recognition image sequence 1420 spans from time t ⁇ 6 through t+3. From t ⁇ 6 through t ⁇ 4, the recognition image sequence is closest to appearance manifold B 1430 in the ambient space. Thus, the identification information includes an associated identity. However, during t ⁇ 3 through t, the recognition image sequence 1420 is closest to appearance manifold A 1510 . In one embodiment, at time instances t ⁇ 4 and t ⁇ 3, the identification information includes both appearance manifolds 1410 , 1430 since the recognition image sequence• 1420 is not sufficiently close to either one for a positive identification. From t+1 through t+3, the recognition image sequence 1420 is again closest to appearance manifold B 1430 . But then, at t and t+1, there is no positive identification.
  • FIG. 11 if an occlusion is detected 1150 , the occlusion module 420 determines 1160 an occlusion adjustment.
  • FIG. 15 is a flow chart illustrating the method of determining an occlusion adjustment according to one embodiment of the present invention. An occlusion occurs when an object blocks the video camera's 110 view of the individual, or an image becomes otherwise obstructed.
  • FIG. 16( a ) illustrates a first image with an occlusion 1610 a and a second image without an occlusion 1610 b , with both images comprising the same individual in the same pose.
  • the mask generation module detects 1510 the occlusion by comparing the image to its closest pose module as in step 1120 or the closest pose module associated with an appearance module selected in step 1140 .
  • FIG. 16( b ) illustrates the pose module associated with the first and second image 1620 a - b, which are preferably the same pose modules.
  • the mask generation module 420 determines 1520 a probability that each pixel is occluded.
  • the probability is measured by how much a pixel varies in color from the corresponding pose module image pixels; a large variation corresponding to high probability and a zero or negligible variation corresponding to a low probability.
  • FIG. 16( c ) illustrates a grayscale representation of the first image 1630 a and the second image 1630 b . Pixels with a large variation are designated black, while pixels with no variation are designated white, and pixels with a medium variation are designated an appropriate shade of gray. In one embodiment, variation data is binary and thus represented in black and white.
  • the mask adjustment module 520 identifies 1530 occlusion groups and defines a resulting occlusion mask.
  • the lower left hand region of the first binary image 1630 a includes a cluster of black pixels indicating a large group of pixels with high variation in color across frames.
  • the cluster, and resulting occlusion mask thus corresponds to the occlusion shown in the image 1610 a.
  • the mask adjustment module 520 applies 1540 a weighted occlusion mask to the pose manifold by reducing the influence of masked pixels on future decisions. By contrast, in one embodiment of determining the appearance manifold in step 1120 , all pixels are treated equally.
  • the mask may be applied directly to the recognition image, or to the pose manifolds in determining the closest pose or appearance manifold.
  • the masked pixels are weighted by modifying the distance computation d H (M k , I t ) to d H (M k* , W t *I t ).
  • the weighted projection of W t *I t on M k* is x*.
  • the mask W t updates with each successive image I t by the estimate at a previous frame W t-1 by (12):
  • W t is iteratively updated based on W t (1) and Î x* (1) as in (13):
  • W t i + 1 exp ⁇ ( - 1 2 * ⁇ 2 ⁇ ( I ⁇ x * ( i ) - I t ) * ( I ⁇ x * ( i ) - I t ) ) ( 13 )
  • the occlusion mask also exploits temporal coherency to allow more accurate facial recognition.
  • the present invention exploits temporal coherency between images during a recognition phase to increase facial recognition accuracy.
  • the manifold training module 220 establishes a matrix of probabilistic interrelationships between a sequence of recognition images for an individual with varying poses.
  • the appearance manifold module 420 uses prior evidence and the probabilistic relationships between received images to make current identification decisions.
  • the occlusion module 430 achieves even further accuracy by masking non-facial parts of the recognition image from identification decisions.

Abstract

The present invention meets these needs by providing temporal coherency to recognition systems. One embodiment of the present invention comprises a manifold recognition module to use a sequence of images for recognition. A manifold training module receives a plurality of training image sequences (e.g. from a video camera), each training image sequence including an individual in a plurality of poses, and establishes relationships between the images of a training image sequence. A probabilistic identity module receives a sequence of recognition images including a target individual for recognition, and identifies the target individual based on the relationship of training images corresponding to the recognition images. An occlusion module masks occluded portions of an individual's face to prevent distorted identifications.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is related to U.S. Provisional Patent Application No. 60/425,214, filed on Nov. 7, 2002, entitled “Video-Based Recognition Using Probabilistic Appearance Manifolds,” by Ming-Hsuan Yang et al. and U.S. Provisional Patent Application No. 60/478,644 filed on Jun. 12, 2003, entitled “Video-Based Recognition Using Probabilistic Appearance Manifolds” by Ming-Hsuan Yang et al. from which priority is claimed under 35 U.S.C. § 119(e) and which applications are incorporated by reference herein in their entirety. This application is related to co-pending U.S. patent application Ser. No. [Attorney reference 23085-07343] filed on Nov. 6, 2003, entitled “Clustering Appearances of Objects Under Varying Illumination Conditions” by Ming-Hsuan Yang et al. which is incorporated by reference in its entirety.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • This invention relates generally to object recognition by computers, and more specifically, to facial recognition techniques applied to a sequence of images.
  • 2. Background Art
  • Computer vision through object recognition is a giant step in computer intelligence that provides a myriad of new capabilities. Facial recognition has particularly valuable applications in verifying a person's identity, robot interaction with humans, security surveillance, etc. With a reliable facial recognition system, computers can provide security clearance for authorized individuals, and robots can perform a set of actions designed for specific individual. However, when currently available facial recognition systems perform identifications, they are limited to basing such identification on a single image generated under ideal circumstances. Examples of currently available facial recognition systems include R. Chellappa, C. L. et al. “Human and Machine Recognition of Faces: A Survey,” Proceedings of the IEEE (1995); A. Samal et al. “Automatic recognition and Analysis of Human Faces and Facial Expressions: A Survey,” Pattern Recognition (1992); and W. Y. Zhao et al. “Face Recognition: A Literature Survey,” Technical Report CAR-TR-948, Center for Automation Research, University of Maryland (2000).
  • One problem with relying on ideal circumstances, such as assuming that an individual to be recognized is positioned in an ideal pose, is that circumstances are rarely ideal. In an ideal pose, a camera has full frontal view of the face without any head tilt. Any two-dimensional or three-dimensional rotations may either cause a false identification or prevent the camera from collecting a sufficient number of data points for comparison. Even when the individual attempts to position himself for the ideal image, misjudgment in orientation may still be problematic.
  • Obstacles between the individual's face and the camera create additional problems for conventional recognition systems. Since those systems are incapable of distinguishing an obstacle from the individual's face in a resulting image, the obstacle distorts any following comparisons. As with facial rotations, occluded faces can also prevent the camera from collecting sufficient data.
  • A problem related to non-ideal circumstances is that typical recognition systems use a single image, so if the single image is distorted, the identification will be affected. False identification can consequently result in security breaches, and the like. Even systems that incorporate more than one image in recognition, such as temporal voting techniques, are susceptible to false identifications. Temporal voting techniques make an identification for a first image, make an independent identification for a second image, and so on, in basing recognition the most frequent independent identification. Examples of temporal voting techniques include A. J. Howell and H. Buxton, “Towards Unconstrained Face Recognition From Image Sequences,” Proc. IEEE Int'l Conf. On Automatic Face and Gesture Recognition (1996); G. Shakhnarovich et al., “Face Recognition From Long-Term Observations,” Proce. European Conf. On Computer Vision (1992); and H. Wechsler et al. “Automatic Video-Based Person Authentication Using the RBF Network,” Proc. Int'l. Conf. On Audio and Video-Based Person Authentication (1997), which are incorporated by reference herein in their entirety. However, each identification is independent of other images. Thus, sustained pose variations and/or occlusions will still distort the outcome.
  • Therefore, what is needed is a robust facial recognition system that exploits temporal coherency between successive images to make recognition decisions. As such, the system should make accurate identification of target individuals in non-ideal circumstances such as pose variations or occlusions.
  • SUMMARY OF THE INVENTION
  • The present invention meets these needs by providing temporal coherence to recognition systems. One embodiment of the present invention comprises a manifold recognition module to perform identification using a sequence of images. A manifold training module receives a plurality of training image sequences (e.g., from a video camera), each training image sequence including an individual in a plurality of poses, and establishes relationships between the images including a target individual for recognition, and identifies the target individual based on the relationship of training images corresponding to the recognition images.
  • In one embodiment, a partitioning module partitions the training image sequence into pose manifolds, or groups of images related to a pose. The union of pose manifolds defines an appearance manifold for an individual. A linear approximation module transforms the nonlinear appearance manifold into a linear function by linearizing each pose. A transition probability module generates a matrix of probabilistic relationships between the images of the training image sequence.
  • In another embodiment, an appearance manifold uses transition probabilities to determine which appearance manifold is closest to the recognition image sequence. In one embodiment, if identification information at a time instance falls below an identification threshold, there is no positive identification for that time. Also, positive identifications can change over time based on current identification information.
  • In still another embodiment, an occlusion module masks occluded portions of an individual's face to prevent distorted identifications. The occlusion module compares a representative pose image to a recognition image to find portions of the recognition image (e.g., individual pixels) with large variations from the pose image. A mask adjustment module examines a binary image based on the variations to construct a weighted mask. Masked portions of an image have a reduced influence on identification information.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is a block diagram illustrating a system for manifold appearance recognition according to one embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating the manifold recognition module according to one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating the manifold training module according to one embodiment of the present invention.
  • FIG. 4 is a block diagram illustrating the probabilistic identity module according to one embodiment of the present invention.
  • FIG. 5 is a block diagram illustrating the occlusion module according to one embodiment of the present invention.
  • FIG. 6 is a flow chart illustrating a method of manifold appearance recognition according to one embodiment of the present invention.
  • FIG. 7 is a flow chart illustrating the method of populating the manifold database according to one embodiment of the present invention.
  • FIG. 8 is an illustration of the plurality of training image sequences according to one embodiment of the present invention.
  • FIG. 9 is an illustration of the image partitions with a linearly approximated appearance manifold comprising pose manifolds according to one embodiment of the present invention.
  • FIG. 10 is an illustration of the transition probability matrix according to one embodiment of the present invention.
  • FIG. 11 is a flow chart illustrating a method of recognizing the individual from a plurality of individuals according to one embodiment of the present invention.
  • FIG. 12 is a flow chart illustrating the method of determining the pose manifold according to one embodiment of the present invention.
  • FIG. 13 is an illustration of a recognition image projected onto a pose according to one embodiment of the present invention.
  • FIG. 14 is an illustration of a sequence of recognition images projected onto two appearance manifolds according to one embodiment of the present invention.
  • FIG. 15 is a flow chart illustrating the method of determining an occlusion adjustment according to one embodiment of the present invention.
  • FIGS. 16( a)-(c) illustrate occluded images according to one embodiment of the present invention.
  • DETAILED DESCRIPTIONS OF THE PREFERRED EMBODIMENTS
  • FIG. 1 is a block diagram illustrating a system for manifold appearance recognition according to one embodiment of the present invention. The system 100 is, for example a robot, a verification system, a security system, or the like. The system 100 comprises a computing environment 105 coupled to a video camera 110 and an output device 120.
  • The video camera 110 generates a sequence of images that are used for both the training and recognition processes. The images can include an individual. The video camera is, for example, a robot eye, a verification camera, a surveillance camera, or any camera capable of generating the sequence of images. In one embodiment, a first video camera is used for training and a second video camera is used for recognition. In yet another embodiment, the system 100 is loaded with images from a source other than a video camera. The output device 120 is, for example, a display, robot control system, a security response system or any other device that receives recognition output from the computing environment 105.
  • The computing environment 105 further comprises an input/output controller 130, a processor 140, a memory 150, and data storage 160, each of which is coupled to a bus 199. The input/output controller 130 receives video data from the video camera 110 for processing and sends processed video data to the output device 120. The processor 140, such a Pentium 4 by Intel Corp. of Santa Clara, Calif., an Athlon XP by Advanced Micro Devices, Inc. of Sunnyvale, Calif., an ASIC or an FPG, executes instructions and manipulates data. The memory 150 provides volatile storage of software and data such as the manifold recognition module 155 shown in FIG. 2. The data storage 160 provides non-volatile storage of data such as the manifold database 165 shown in FIG. 3.
  • Note that the computing environment 105 may be a separate device such as a computer, or a system of components integrated into a host environment such as a robot or a vehicle. The described components can be implemented in hardware, in software executing within hardware, or a combination. Furthermore, the computing environment 105 includes other components such as an operating system, a BIOS, a user interface, etc. While the computing environment 105 and its components may be optimized for appearance manifold recognition, it may also be used for other computing tasks. Methods operating with the system 100 are described below.
  • FIG. 2 is a block diagram illustrating the manifold recognition module according to one embodiment of the present invention. The manifold recognition module 155 comprises a video buffer 210, a manifold training module 220, and a probabilistic identity module 230.
  • The video buffer 210 receives video data representing the sequence of images from the video camera 110 as shown in FIG. 8. In a training phase, the video buffer 210 signals the manifold training module 220 to indicate that a training image or sequence of training images is ready for processing. During processing, original and manipulated images are stored in the video buffer 210. After processing, the video buffer 210 stores the training images in the manifold database 165. In a recognition phase, the video buffer 210 signals the probabilistic identity module that a recognition image or sequence of recognition images is ready for processing. The video buffer 210 is, for example, a portion of the memory 150, a separate system, processor memory, or otherwise.
  • The manifold training module 220 generates a plurality of appearance manifolds from the plurality of training images. Each sequence of training images contains an individual in a variety of poses. The manifold training module 220 processes the images to generate an appearance manifold comprising several pose manifolds. The manifold training module 220 also determines a matrix of probabilities describing the likelihood of a first pose in a first image being followed by a second pose in a second image. The manifold training module 220 stores its results in the manifold training database 165 in association with the individual's image training sequence. The manifold training module 220 and related methods are described in greater detail below.
  • The probabilistic identity module 230 receives a plurality of recognition images that contain a target individual to be recognized from the plurality of appearance manifolds. The probabilistic identity module 230 considers the interrelationships between recognition images to generate identification information. In one embodiment, the identification information relates to more than one possible identity when there is no single positive identity above an identity threshold. The probabilistic identity module 230 and related methods are described in greater detail below.
  • FIG. 3 is a block diagram illustrating the manifold training module according to one embodiment of the present invention. The manifold training module 220 comprises a partitioning module 310, a linear approximation module 320, and a transition probability module 330.
  • The partitioning module 310 generates pose manifolds by grouping related training images into partitions. In one embodiment, the partitioning module 310 uses k-means clustering for grouping. The images of each partition are slight variations of a common pose. A pose variation is a two-dimensional or three-dimensional rotation with respect to a reference pose. For example, the individual is facing forward in a first partition, is facing to the right in a second partition, and is facing to the left in a third partition. Additionally, the individual can be facing up or down, have a head tilt, or any combination of the above.
  • The linear approximation module 320 generates a linear approximation of the several pose manifolds. For each pose manifold, the linear approximation module 320 uses PCA (Principal Component Analysis) to determine subspaces represented as affine planes. In one embodiment, the partitioning module 310 generates a representative pose image by combining each partition image into a single image as shown in FIG. 10.
  • The transition probability module 330 incorporates temporal coherency by determining the relationship between pose manifolds. The relationship, determined by a distance between pose manifolds in the appearance manifold, is expressed as a probability that, given a first pose of the individual, a second pose will follow. In one embodiment, the probabilities are represented as conditional probabilities.
  • FIG. 4 is a block diagram illustrating the probabilistic identity module according to one embodiment of the present invention. The probabilistic identity module 230 comprises an appearance manifold module 420, and an occlusion module 420.
  • The identification control module 410 identifies the target individual by determining which pose manifold is closest to the recognition images. The identification control module 410 assesses identification information, and if it meets an identification threshold, outputs the identification.
  • The appearance manifold module 420 determines identification information based on a target individual's pose variations over time. The identification information may comprise one or more potential identifications. For example, a first recognition image may be nearly the same distance to pose manifolds of two individuals, so the appearance manifold 420 continues to consider both individuals as the potential target individual for the following recognition images. The determination is based on the likelihood of a current pose manifold in a current recognition image given previous pose manifolds in previous recognition images. In one embodiment, the appearance manifold module 420 calculates a joint conditional probability comprising a transition probability between the current and immediately previous poses, and an immediately previous joint conditional probability result.
  • The occlusion module 420 determines identification information by masking portions of the target individual that are blocked from view from the identification process. Referring to FIG. 5, the occlusion module 420 further comprises a mask generation module 510 and a mask adjustment module 520. The mask generation module 510 generates an occlusion mask by determining which pixel clusters have the greatest variance from the representative pose image. In applying the mask while generating identification information, the mask adjustment module 520 reduces the weighting of masked pixels or removes them from the identification process.
  • FIG. 6 is a flow chart illustrating a method of manifold appearance recognition according to one embodiment of the present invention. The process initializes 605 in response to receiving an image in the computing environment 105 from the video camera 110. Generally, the manifold recognition module 155 operates in two phases: in a training phase, the manifold training module 220 populates 620 the manifold database 165 with a plurality of individuals by analyzing the training image sequence; and in a recognition phase, the probabilistic identity module 230 recognizes 620 a target individual 620 from a plurality of individuals.
  • The population step 610 is now described in greater detail with reference to FIGS. 7-10. FIG. 7 is a flow chart illustrating the method of populating 610 the manifold database 165 according to one embodiment of the present invention. The population step 610 initializes responsive to receiving 710, in the video buffer 210, a set Sk of one or more consecutive training images Il for an individual k as expressed in (1):

  • Sk={I1, I2, . . . , Il}  (1)
  • In one embodiment, training images are distinguished from recognition images by a user, a bit, a signal, or in response to operating in the training phase. For example, a user can mount a hard drive having training image sequences to the computing environment 105 and initialize training analysis through a user interface. In another example, an individual within view of the video camera 110 moves naturally or in a predetermined pattern of two-dimensional and three-dimensional movements. In another embodiment, if the probabilistic identity module 230 fails to recognize the target individual in the images, the manifold recognition module 155 enters the training phase, thereby treating received images as training images.
  • FIG. 8 is an illustration of the plurality of training image sequences according to one embodiment of the present invention. One of ordinary skill in the art will recognize that the plurality of training images 800 is merely illustrative, and does not limit the scope of the present invention. The plurality of training images 800 comprises several rows of training image sequences 810 a-l, each row representing a different individual (e.g., 810 a). An image sequence represents an individual in varying poses over time. Subsequent images may be categorized as separate poses, or a variant of the same pose as shown in FIG. 9. Time increments between subsequent images are, for example, 1/24th, 1/30th, 1/60th of a second, or any other time increment. Individual training images are received directly from the video camera 110, interpolated from surrounding frames, or otherwise constructed. The image format is, for example, JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), BMP (BitMaP), TIFF (Tagged Image File Format), or the like.
  • Referring to FIG. 7, the partitioning module 310 partitions 720 the training image sequence into m disjoint subsets. Each disjoint subset represents a pose and variations thereof as shown in FIG. 9. Pose variations are two-dimensional or three-dimensional rotations between images. Example poses include: facing forward, facing left, facing right, facing up, facing down, or the like. Example variations of the facing forward pose include: head tilting left, head tilting right, or the like. In another embodiment, the above variations are poses themselves. Indeed, one of ordinary skill of the art will recognize that the poses and variations are merely illustrative, and that many other embodiments are within the scope of the present invention.
  • In one embodiment, the partitioning module 310 uses a k-means clustering algorithm for grouping images. An example of k-means clustering is described in D. Hochbaum and D. Shmoys “A Best Possible Heuristic For the K-Center Problem,” Mathematics of Operations Research (1985). In a first iteration, m seeds having the largest Hausdorff distance (L2) from each other in the image space represent the partitions centers. Each image associates with the closest seed. In repetitive iterations, the partition centers are recalculated based on a distance between the center and grouped images. The optimized partition centers are pose manifolds (Ck). The total set of pose manifolds comprise an appearance manifold (Mk) for an individual.
  • Referring again to FIG. 7, the linear approximation module 320 constructs 730 a linear approximation of each partition. FIG. 9 is an illustration of the image partitions with a linearly approximated appearance manifold comprising pose manifolds according to one embodiment of the present invention. The appearance manifold 910 is initially a nonlinear image space representation of the training image sequence 911. The image partitions comprise a left-facing pose 920 a, a front-facing pose 920 b, and a right-facing pose 920 c. Each pose has three associated images that are variations of main poses with a linear approximation representing that subspace 915 a-c. The union of the subspaces forms the linear approximation of the appearance module.
  • In one embodiment, the linear approximation module 320 constructs the linear approximations 915 a-c by calculating a PCA plane of fixed dimension for the images. The PCA plane (i.e., subspace) is constructed to provide a compact low-dimensional representation of an object (e.g., face, mug, or any 3D object).
  • In FIG. 7, the transition probability module 330 determines 740 transition probabilities, or temporal coherence, between training images. Transition probabilities between abutting pose manifolds is shown in FIG. 9. The expression P(Ck1|Ck2) 912 represents the probability the second pose follows the first. Further, the expression P(Ck2|Ck3) 923 represents the probability the third pose follows the first. A more comprehensive set of relationships between poses is represented in a transition matrix as shown in FIG. 10.
  • In one embodiment, the transition probability module 330 determines transitional probabilities by counting the actual transitions between different disjoint subsets Si observed in the image sequence as shown in (2):
  • p ( C ki | C kj ) = 1 Λ ki q = 2 l δ ( I q - 1 S ki ) δ ( I q S kj ) ( 2 )
  • where δ(IqεSkj)=1 if IqεSkj and otherwise it is 0. The normalizing constant Λki ensures that (3):
  • j = 1 m p ( C ki | C kj ) = 1 ( 3 )
  • where p(Cki|Ckj) is set to a constant k.
  • FIG. 10 is an illustration of the transition probability matrix according to one embodiment of the present invention. The transition probability matrix 1000 comprises five poses from a training image sequence. For this example, with respect to (3), m=5. The poses have an associated image, which is generated in one embodiment by combining partition images such as the three images in the first partition 910. The brighter blocks have a higher transition probability. Pose 1 is thus much more likely to follow pose 2 than pose 5 because rather than transitioning directly from the left-facing pose to the right-facing pose over two consecutive images, it is highly likely to pass through at least one intermediate pose. This process puts a first order Markov process or finite state machine over a piecewise linear structure.
  • The transition probability module 330 stores 750 results of the transition probability matrix in the manifold database 165 for use in the recognition phase. The process 610 continues populating the database if there are more individuals 760. Otherwise, it returns 795 until called for recognition.
  • Returning to FIG. 6, the recognition step 620 is described in greater detail with reference to FIGS. 11-17. FIG. 11 is a flow chart illustrating a method of recognizing the individual from a plurality of individuals according to one embodiment of the present invention. The process initializes responsive to receiving 1110 one or more recognition images. For example, when an individuals requests authentication and stands in proper view, a video camera sends the sequence of recognition images to the video buffer 210.
  • The appearance manifold module 420 determines 1120 which pose manifold, from the plurality of pose manifolds associated with the plurality of individuals, is closest to a first recognition image. FIG. 12 is a flow chart illustrating the method of determining the pose manifold according to one embodiment of the present invention. The first recognition is projected 1210 onto a plurality of pose manifolds, either from the same appearance manifold as shown in FIG. 13 or different appearance manifolds.
  • If there is no pose variation 1130 or occlusion 1150 detected, the appearance manifold module 420 identifies 1170 an individual associated with a closest appearance manifold. If there are more recognition images 1180, the process repeats. Otherwise, it returns 1195 and ends 695.
  • The appearance manifold module 420 determines 1220 which pose manifold is closest to the first image in the image space. Examples of image plotting in the ambient space are shown in H. Murase and S. K. Nayar “Visual Learning and Recognition of 3-D Objects From Appearance,” Int'l J. Computer Vision (1995). Identification information includes 1230 the appearance manifold associated with the closest pose manifold. The individual associated with the closest pose manifold is a candidate for the target individual's identity. In one embodiment, the identification information includes statistics on more than one candidate since the leading candidate can change based on following recognition images considered in combination with the first recognition image.
  • FIG. 13 is an illustration of a recognition image projected onto a closest pose manifold according to one embodiment of the present invention. A point I 1310 is a vector representation of a recognition image. The linearized appearance manifold comprises the set of pose manifolds Ck1 through Ck6 1320 a-f. The variable x 1330 represents a minimum distance d H 1340 between the image 1310 and set of pose manifolds 1320 a-f, which in this case is C k4 1320 d. Assuming that x 1330 is the closest to among pose manifolds associated with other appearance manifolds, an individual associated with the shown appearance manifold 1350 is the leading candidate for identification.
  • In one embodiment, for a target individual k, an identity k*, is determined by finding the appearance manifold Mk with the minimum distance dh to the recognition image I as shown in expression (4):
  • k * = arg min k d H ( I , M K ) ( 4 )
  • where dH is the Hausdorff distance (L2) between the image and closest appearance module. Probabilistically, expression (4) results from defining the conditional probability as shown in expression (5):
  • p ( k | I ) = 1 Λ exp ( - 1 σ 2 ( d H ) 2 ( I , M k ) ) ( 5 )
  • Referring again to FIG. 11, if there is a pose variation 1130, the appearance manifold module 420 determines 1140 which appearance manifolds are closest to the recognition image sequence. The appearance manifold module 420 projects the second image onto the plurality of pose manifolds to find the pose manifold closest to the second image in the ambient space. The minimum distance is derived from the second image considered in combination with the first image to incorporate temporal dependency. In one embodiment, the minimum distance is expressed as (6) through (8):
  • d H ( I , M K ) = M K d ( x , I ) p M K ( x | I ) x = i = 1 m p ( C ki | I ) C k 1 d H ( x , I ) p p Cki ( x | I ) x = i = 1 m p ( C ki | I ) d H ( I , C ki ) ( 6 ) ( 7 ) ( 8 )
  • where x 1330 is a point on the manifold Mk for 1≦i≦m.
  • A second identification information is based on the probability that the second pose module follows the first pose module. In one embodiment, the probability is determined from a conditional joint probability including the transition probability from the training phase as expressed in (9) through (11) using Baye's rule:
  • p ( C t ki | I t , I 0 : t - 1 ) = α p ( I t | C t ki , I 0 : t - 1 ) p ( C t ki | I 0 : t - 1 ) = α p ( I t | C t ki ) j = 1 m p ( C t ki | C t - 1 kj , I 0 : t - 1 ) p ( C t - 1 ki | I 0 : t - 1 ) = α p ( I t | C t ki ) j = 1 m p ( C t ki | C t - 1 kj ) p ( C t - 1 kj , | I 0 : t - 1 , I 0 : t - 2 ) ( 9 ) ( 10 ) ( 11 )
  • where α is a normalization term to ensure a proper probability distribution.
  • In still another embodiment, the appearance manifold module 420 determines the second identification information from preprocessed identification information rather than an image. The preprocessed identification information is derived as discussed above from previous image.
  • If the second identification information has a probability above an identification threshold, a positive identification can be made from the appearance manifold 410 associated with the closest pose manifold. In one embodiment, the appearance manifold module 420 outputs the identity to an output device 120.
  • In one embodiment, if the second identification information is not above the identification threshold, parallel statistics are maintained for possible identifications. As the process continues through iterations in response to receiving more recognition images, identification information is updated, and a positive identification is made when the identification threshold is surpassed. Whether or not a positive identification has been made, the process constantly updates identification information responsive to receiving new images.
  • FIG. 14 is an illustration of a sequence of recognition images projected onto two appearance manifolds according to one embodiment of the present invention. The recognition image sequence 1420 spans from time t−6 through t+3. From t−6 through t−4, the recognition image sequence is closest to appearance manifold B 1430 in the ambient space. Thus, the identification information includes an associated identity. However, during t−3 through t, the recognition image sequence 1420 is closest to appearance manifold A 1510. In one embodiment, at time instances t−4 and t−3, the identification information includes both appearance manifolds 1410, 1430 since the recognition image sequence•1420 is not sufficiently close to either one for a positive identification. From t+1 through t+3, the recognition image sequence 1420 is again closest to appearance manifold B 1430. But then, at t and t+1, there is no positive identification.
  • Advantageously, with each additional recognition image, a new data point is added to the appearance manifold. Therefore, recognition accuracy increases over time.
  • In FIG. 11, if an occlusion is detected 1150, the occlusion module 420 determines 1160 an occlusion adjustment. FIG. 15 is a flow chart illustrating the method of determining an occlusion adjustment according to one embodiment of the present invention. An occlusion occurs when an object blocks the video camera's 110 view of the individual, or an image becomes otherwise obstructed. FIG. 16( a) illustrates a first image with an occlusion 1610 a and a second image without an occlusion 1610 b, with both images comprising the same individual in the same pose.
  • The mask generation module detects 1510 the occlusion by comparing the image to its closest pose module as in step 1120 or the closest pose module associated with an appearance module selected in step 1140. FIG. 16( b) illustrates the pose module associated with the first and second image 1620 a-b, which are preferably the same pose modules.
  • The mask generation module 420 determines 1520 a probability that each pixel is occluded. In one embodiment, the probability is measured by how much a pixel varies in color from the corresponding pose module image pixels; a large variation corresponding to high probability and a zero or negligible variation corresponding to a low probability. FIG. 16( c) illustrates a grayscale representation of the first image 1630 a and the second image 1630 b. Pixels with a large variation are designated black, while pixels with no variation are designated white, and pixels with a medium variation are designated an appropriate shade of gray. In one embodiment, variation data is binary and thus represented in black and white.
  • The mask adjustment module 520 identifies 1530 occlusion groups and defines a resulting occlusion mask. In the example of FIG. 16( c), the lower left hand region of the first binary image 1630 a includes a cluster of black pixels indicating a large group of pixels with high variation in color across frames. The cluster, and resulting occlusion mask thus corresponds to the occlusion shown in the image 1610 a.
  • The mask adjustment module 520 applies 1540 a weighted occlusion mask to the pose manifold by reducing the influence of masked pixels on future decisions. By contrast, in one embodiment of determining the appearance manifold in step 1120, all pixels are treated equally. When the next recognition image is received, the mask may be applied directly to the recognition image, or to the pose manifolds in determining the closest pose or appearance manifold.
  • In one embodiment, the masked pixels are weighted by modifying the distance computation dH(Mk, It) to dH(Mk*, Wt*It). The weighted projection of Wt*It on Mk*, is x*. The mask Wt updates with each successive image It by the estimate at a previous frame Wt-1 by (12):
  • W t ( 1 ) = exp ( - 1 2 * σ 2 ( I ^ x * - I t ) * ( I ^ x * - I t ) ) ( 12 )
  • where Îx* is the reconstructed image. In another embodiment, Wt is iteratively updated based on Wt (1) and Îx* (1) as in (13):
  • W t i + 1 = exp ( - 1 2 * σ 2 ( I ^ x * ( i ) - I t ) * ( I ^ x * ( i ) - I t ) ) ( 13 )
  • until the difference between Wt i and Wt i-1 is below a threshold value at the i-th iteration.
  • Advantageously, the occlusion mask also exploits temporal coherency to allow more accurate facial recognition.
  • In summary, the present invention exploits temporal coherency between images during a recognition phase to increase facial recognition accuracy. The manifold training module 220 establishes a matrix of probabilistic interrelationships between a sequence of recognition images for an individual with varying poses. The appearance manifold module 420 uses prior evidence and the probabilistic relationships between received images to make current identification decisions. The occlusion module 430 achieves even further accuracy by masking non-facial parts of the recognition image from identification decisions.
  • The above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. The scope of the invention is to be limited only by the following claims. From the above discussion, many variations will be apparent to one of ordinary skill in the art that would yet be encompassed by the spirit and scope of the invention.

Claims (58)

1. A method for recognizing a target individual, comprising the steps of:
receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual in a first pose, the first recognition image captured at a first time;
receiving a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual in a second pose, the second recognition image captured at a second time;
generating a first identification information comprising a candidate identity of the target individual based on the first image; and
generating a second identification information comprising an updated identity of the target individual based on the first identification information and a likelihood that the second pose will follow the first pose in the sequence of recognition images.
2. (canceled)
3. The method of claim 1, wherein the second pose is a facial position having a two-dimensional rotation with respect to the first pose.
4. The method of claim 1, wherein the second pose is a facial position having a three-dimensional rotation with respect to the first pose.
5. The method of claim 1, wherein the step of generating the second identification information comprise generating the second identification information based on a conditional probability that defines a relationship between the first identification information and the second image.
6. The method of claim 1, wherein the step of generating the first identification information comprises the steps of:
generating a first vector representation of the first image; and
projecting the first vector representation onto a plurality of vector representations representing a plurality of individuals; and
selecting one or more vector representations from the plurality of vector representations having the least distance from the first vector representation.
7. The method of claim 6, wherein the plurality of vector representations are pose manifolds associated with the plurality of individuals.
8. The method of claim 6, further comprising the steps of:
generating the plurality of vector representations from a plurality of training image sequences including the plurality of individuals.
9. The method of claim 6, wherein the plurality of vector representations are linear approximations of a plurality of image sequences including the plurality of individuals.
10. The method of claim 1, further comprising the steps of:
if the second identification information exceeds an identification threshold for an individual, identifying the target individual as the individual.
11. The method of claim 1, further comprising the steps of:
receiving a third recognition image at a third time, the third recognition image including a third image of the target individual at a third time; and
generating a third identification information, said identification information based on the third image and the second identification information.
12. A method for recognizing a target individual, comprising the steps of:
receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual at a first time;
receiving a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual at a second time;
generating a first identification information from the first image;
detecting that a portion of the second recognition image is at least partially occluded;
generating a weighted mask including the portion of the second image; and
generating a second identification information from the second image, as adjusted by the weighted mask, and the first identification information.
13. The method of claim 12, wherein the step of detecting the occlusion further comprises the step of:
determining that a group of pixels in the second image exceeds a variance threshold.
14. The method of claim 13, wherein the step of determining that the group of pixels in the second image exceeds a variance threshold comprises:
determining that a first group of pixels in the second image exceeds a variance threshold with respect to a corresponding group of pixels in the first image.
15. A method for recognizing a target individual from an image, comprising the steps of:
receiving a first identification information comprising a candidate identity of the target individual based on one or more recognition images at a previous time;
generating a second identification information comprising an updated identity of the target individual based on a recognition image at a current time; and
determining an identification of the target individual from a plurality of individuals based on a conditional probability given the first identification information and the recognition image at the current time.
16-17. (canceled)
18. A computer program product, comprising:
a computer-readable medium having computer program instructions and data embodied thereon for recognizing a target individual from a plurality of individuals, comprising the steps of:
receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual in a first pose, the first recognition image captured at a first time;
receiving a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual in a second pose, the second recognition image captured at a second time;
generating a first identification information comprising a candidate identity of the target individual from the first image; and
generating a second identification information comprising an updated identity of the target individual based on the first identification information and a likelihood that the second pose will follow the first pose in the sequence of recognition images.
19. (canceled)
20. The computer program product of claim 18, wherein the second pose is a facial position having a two-dimensional rotation with respect to the first pose.
21. The computer program product of claim 18, wherein the second pose is a facial position having a three-dimensional rotation with respect to the first pose.
22. The computer program product of claim 18, wherein the step of generating the second identification information comprises generating the second identification information based on a conditional probability that defines a relationship between the first identification information and the second image.
23. The computer program product of claim 18, wherein the step of generating the first identification information comprises the steps of:
generating a first vector representation of the first image; and
projecting the first vector representation onto a plurality of vector representations representing a plurality of individuals; and
selecting one or more vector representations from the plurality of vector representations having the least distance from the first vector representation.
24. The computer program product of claim 23, wherein the plurality of vector representations are pose manifolds associated with the plurality of individuals.
25. The computer program product of claim 23, further comprising the steps of:
generating the plurality of vector representations from a plurality of training image sequences including the plurality of individuals.
26. The computer program product of claim 23, wherein the plurality of vector representations are linear approximations of a plurality of image sequences including the plurality of individuals.
27. The computer program product of claim 18, further comprising the steps of:
if the second identification information exceeds an identification threshold for an individual, identifying the target individual as the individual.
28. The computer program product of claim 18, further comprising the steps of:
receiving a third recognition image at a third time, the third recognition image including a third image of the target individual at a third time; and
generating a third identification information, said third identification information based on the third image and the updated identification information.
29. A computer program product, comprising:
a computer-readable medium having computer program instructions and data embodied thereon for recognizing a target individual from a plurality of individuals, comprising the steps of:
receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual at a first time;
receiving a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual at a second time;
generating a first identification information from the first image;
detecting that a portion of the second recognition image is at least partially occluded;
generating a weighted mask including the portion of the second image; and
generating the second identification information from the second image, as adjusted by the weighted mask, and the first identification information.
30. The computer program product of claim 29, wherein the step of detecting the occlusion further comprises the step of:
determining that a group of pixels in the second image exceeds a variance threshold.
31. The computer program product of claim 30, wherein the step of determining that the group of pixels in the second image exceeds a variance threshold comprises:
determining that a first group of pixels in the second image exceeds a variance threshold with respect to a corresponding group of pixels in the first image.
32. A recognition module for recognizing a target individual, comprising:
a video buffer to receive a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual in a first pose, the first recognition image captured at a first time, and a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual in a second pose, the second recognition image captured at a second time;
an identity module to generate a first identification information comprising a candidate identity of the target individual based on the first image, and generate a second identification information comprising an updated identity of the target individual based on the first identification information and a likelihood that the second pose will follow the first pose in the sequence of recognition images.
33. (canceled)
34. The recognition module of claim 32, wherein the second pose is a facial position having a two-dimensional rotation with respect to the first pose.
35. The recognition module of claim 32, wherein the second pose is a facial position having a three-dimensional rotation with respect to the first pose.
36. The recognition module of claim 32, wherein the identity module further comprises a transition module to define a relationship between the first identification information and the second image, and the identity module generates the second identification information based on the relationship.
37. The recognition module of claim 32, wherein the identity module further comprises an identification control module to generate a first vector representation of the first image, project the first vector representation onto a plurality of vector representations representing a plurality of individuals, and select one or more vector representations from the plurality of vector representations having the least distance from the first vector representation.
38. The recognition module of claim 37, wherein the plurality of vector representations are pose manifolds associated with the plurality of individuals.
39. The recognition module of claim 37, further comprising a linear approximation module to generate the plurality of vector representations from a plurality of training image sequences including the plurality of individuals.
40. The recognition module of claim 37, wherein the plurality of vector representations are linear approximations of a plurality of image sequences including the plurality of individuals.
41. The recognition module of claim 32, wherein if the second identification information exceeds an identification threshold for an individual, the identify module identifies the target individual as the individual.
42. The recognition module of claim 32, wherein the video buffer receives a third recognition image, said third recognition image including a third image of the target individual at a third time, and the identity module generates a third identification information, said third identification information based on the third image and the second identification information.
43. A recognition module for recognizing a target individual, comprising:
a video buffer to receive a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual at a first time, and a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual at a second time;
an identity module to generate a first identification information from the first image;
an occlusion module to detect that a portion of the second recognition image is at least partially occluded;
a mask generation module to generate a weighted mask including the portion; and
a mask adjustment module to generate the second identification information based on the second recognition image, as adjusted by the weighted mask, and the first identification information.
44. The recognition module of claim 43, wherein the mask generation means determines that a group of pixels in the second image exceeds a variance threshold.
45. The recognition module of claim 44, wherein the mask generation module determines that a first group of pixels in the second image exceeds a variance threshold with respect to a corresponding group of pixels in the first image.
46. A recognition module for recognizing a target individual, comprising:
a buffer means for receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual in a first pose, the first recognition image captured at a first time, and a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual in a second pose, the second recognition image captured at a second time;
an identity means for generating a first identification information comprising a candidate identity of the target individual based on the first image, and generate a second identification information comprising an updated identity of the target individual based on the first identification information and a likelihood that the second pose will follow the first pose in the sequence of recognition images.
47. (canceled)
48. The recognition module of claim 46, wherein the second pose is a facial position having a two-dimensional rotation with respect to the first pose.
49. The recognition module of claim 46, wherein the second pose is a facial position having a three-dimensional rotation with respect to the first pose.
50. The recognition module of claim 46, wherein the identity means further comprises a transition means to define a relationship between the first identification information and the second image, and the identity means generates the second identification information based on the relationship.
51. The recognition module of claim 46, wherein the identity module further comprises an identification control means for generating a first vector representation of the first image, projecting the first vector representation onto a plurality of vector representations representing a plurality of individuals, and selecting one or more vector representations from the plurality of vector representations having the least distance from the first vector representation.
52. The recognition module of claim 51, wherein the plurality of vector representations are pose manifolds associated with the plurality of individuals.
53. The recognition module of claim 51, further comprising a linear approximation means for generating the plurality of vector representations from a plurality of training image sequences including the plurality of individuals.
54. The recognition module of claim 51, wherein the plurality of vector representations are linear approximations of a plurality of image sequences including the plurality of individuals.
55. The recognition module of claim 46, wherein if the second identification information exceeds an identification threshold for an individual, the identify module identifies the target individual as the individual.
56. The recognition module of claim 46, wherein the buffer means receives a third recognition image, the third recognition image including a third image of the target individual at a third time, and the identity means generates a third identification information, said third identification information based on the third image and the second identification information.
57. A recognition module for recognizing a target individual, comprising:
a buffer means for receiving a first recognition image from a sequence of recognition images, said first recognition image including a first image of the target individual at a first time, and a second recognition image of the sequence of recognition images, said second recognition image including a second image of the target individual at a second time;
an identity means for generating a first identification information from the first image;
an occlusion means for detecting that a portion of the second recognition image is at least partially occluded;
a mask generation means for generating a weighted mask including the portion of the second image; and
a mask adjustment means for generating the second identification information from the second image, as adjusted by the weighted mask, and the first identification information.
58. The recognition module of claim 57, wherein the mask generation means determines that a group of pixels in the second image exceeds a variance threshold.
59. The recognition module of claim 58, wherein the mask generation means determines that a first group of pixels in the second image exceeds a variance threshold with respect to a corresponding group of pixels in the first image.
US10/703,288 2002-11-07 2003-11-06 Video-based face recognition using probabilistic appearance manifolds Active 2025-11-03 US7499574B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/703,288 US7499574B1 (en) 2002-11-07 2003-11-06 Video-based face recognition using probabilistic appearance manifolds

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US42521402P 2002-11-07 2002-11-07
US47864403P 2003-06-12 2003-06-12
US10/703,288 US7499574B1 (en) 2002-11-07 2003-11-06 Video-based face recognition using probabilistic appearance manifolds

Publications (2)

Publication Number Publication Date
US20090041310A1 true US20090041310A1 (en) 2009-02-12
US7499574B1 US7499574B1 (en) 2009-03-03

Family

ID=32314578

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/703,288 Active 2025-11-03 US7499574B1 (en) 2002-11-07 2003-11-06 Video-based face recognition using probabilistic appearance manifolds

Country Status (5)

Country Link
US (1) US7499574B1 (en)
EP (1) EP1565887A4 (en)
JP (1) JP4486594B2 (en)
AU (1) AU2003301795A1 (en)
WO (1) WO2004042539A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100008550A1 (en) * 2008-07-14 2010-01-14 Lockheed Martin Corporation Method and apparatus for facial identification
US20100061609A1 (en) * 2008-09-05 2010-03-11 Siemens Medical Solutions Usa, Inc. Quotient Appearance Manifold Mapping For Image Classification
US20100183218A1 (en) * 2008-06-20 2010-07-22 Aisin Seiki Kabushiki Kaisha Object determining device and program thereof
US8953843B1 (en) * 2012-07-17 2015-02-10 Google Inc. Selecting objects in a sequence of images
US8977003B1 (en) * 2012-07-17 2015-03-10 Google Inc. Detecting objects in a sequence of images
US20160004903A1 (en) * 2011-01-12 2016-01-07 Gary S. Shuster Graphic data alteration to enhance online privacy
US9483997B2 (en) 2014-03-10 2016-11-01 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using infrared signaling
US9696414B2 (en) 2014-05-15 2017-07-04 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using sonic signaling
US10070291B2 (en) 2014-05-19 2018-09-04 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using low energy bluetooth
US10474908B2 (en) * 2017-07-06 2019-11-12 GM Global Technology Operations LLC Unified deep convolutional neural net for free-space estimation, object detection and object pose estimation
US11276196B2 (en) * 2019-04-16 2022-03-15 Sony Interactive Entertainment Inc. Video processing
US11600108B2 (en) 2011-01-12 2023-03-07 Gary S. Shuster Video and still image data alteration to enhance privacy
US11853390B1 (en) * 2018-08-03 2023-12-26 Amazon Technologies, Inc. Virtual/augmented reality data evaluation

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2462851B (en) 2008-08-21 2010-09-15 4Sight Imaging Ltd Image processing
TWI382354B (en) * 2008-12-02 2013-01-11 Nat Univ Tsing Hua Face recognition method
US9020192B2 (en) 2012-04-11 2015-04-28 Access Business Group International Llc Human submental profile measurement
US9558396B2 (en) 2013-10-22 2017-01-31 Samsung Electronics Co., Ltd. Apparatuses and methods for face tracking based on calculated occlusion probabilities
US9721079B2 (en) 2014-01-15 2017-08-01 Steve Y Chen Image authenticity verification using speech
US9594949B1 (en) * 2015-08-31 2017-03-14 Xerox Corporation Human identity verification via automated analysis of facial action coding system features
US10896318B2 (en) * 2017-09-09 2021-01-19 Apple Inc. Occlusion detection for facial recognition processes

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5164992A (en) * 1990-11-01 1992-11-17 Massachusetts Institute Of Technology Face recognition system
US6272231B1 (en) * 1998-11-06 2001-08-07 Eyematic Interfaces, Inc. Wavelet-based facial motion capture for avatar animation
US6345110B1 (en) * 1996-10-11 2002-02-05 Mitsubishi Electric Research Laboratories, Inc. Identifying images using a tree structure
US6400828B2 (en) * 1996-05-21 2002-06-04 Interval Research Corporation Canonical correlation analysis of image/control-point location coupling for the automatic location of control points
US6671391B1 (en) * 2000-05-26 2003-12-30 Microsoft Corp. Pose-adaptive face detection system and process
US6741756B1 (en) * 1999-09-30 2004-05-25 Microsoft Corp. System and method for estimating the orientation of an object
US6873713B2 (en) * 2000-03-16 2005-03-29 Kabushiki Kaisha Toshiba Image processing apparatus and method for extracting feature of object
US7330566B2 (en) * 2003-05-15 2008-02-12 Microsoft Corporation Video-based gait recognition

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9123210D0 (en) * 1991-11-01 1991-12-18 Marconi Gec Ltd Filter
JPH07302327A (en) * 1993-08-11 1995-11-14 Nippon Telegr & Teleph Corp <Ntt> Method and device for detecting image of object
JPH1013832A (en) * 1996-06-25 1998-01-16 Nippon Telegr & Teleph Corp <Ntt> Moving picture recognizing method and moving picture recognizing and retrieving method
JP3943223B2 (en) * 1997-02-12 2007-07-11 富士通株式会社 Pattern recognition apparatus and method for performing classification using candidate table
JPH1125269A (en) * 1997-07-02 1999-01-29 Sanyo Electric Co Ltd Facial picture recognizing device and method therefor
JP2000099722A (en) * 1998-09-22 2000-04-07 Toshiba Corp Personal face recognizing device and its method
JP2000163396A (en) * 1998-11-25 2000-06-16 Nippon Telegr & Teleph Corp <Ntt> Device and method for clustering data having classes of unknown number and recording medium stored with program for executing the same method
JP2000220333A (en) * 1999-01-29 2000-08-08 Toshiba Corp Device and method for certifying person
US7117157B1 (en) * 1999-03-26 2006-10-03 Canon Kabushiki Kaisha Processing apparatus for determining which person in a group is speaking
JP4092059B2 (en) * 2000-03-03 2008-05-28 日本放送協会 Image recognition device
WO2002039371A2 (en) * 2000-11-03 2002-05-16 Koninklijke Philips Electronics N.V. Estimation of facial expression intensity using a bidirectional star topology hidden markov model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5164992A (en) * 1990-11-01 1992-11-17 Massachusetts Institute Of Technology Face recognition system
US6400828B2 (en) * 1996-05-21 2002-06-04 Interval Research Corporation Canonical correlation analysis of image/control-point location coupling for the automatic location of control points
US6345110B1 (en) * 1996-10-11 2002-02-05 Mitsubishi Electric Research Laboratories, Inc. Identifying images using a tree structure
US6272231B1 (en) * 1998-11-06 2001-08-07 Eyematic Interfaces, Inc. Wavelet-based facial motion capture for avatar animation
US6741756B1 (en) * 1999-09-30 2004-05-25 Microsoft Corp. System and method for estimating the orientation of an object
US6873713B2 (en) * 2000-03-16 2005-03-29 Kabushiki Kaisha Toshiba Image processing apparatus and method for extracting feature of object
US6671391B1 (en) * 2000-05-26 2003-12-30 Microsoft Corp. Pose-adaptive face detection system and process
US7330566B2 (en) * 2003-05-15 2008-02-12 Microsoft Corporation Video-based gait recognition

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100183218A1 (en) * 2008-06-20 2010-07-22 Aisin Seiki Kabushiki Kaisha Object determining device and program thereof
US8705850B2 (en) * 2008-06-20 2014-04-22 Aisin Seiki Kabushiki Kaisha Object determining device and program thereof
US20100008550A1 (en) * 2008-07-14 2010-01-14 Lockheed Martin Corporation Method and apparatus for facial identification
US9405995B2 (en) 2008-07-14 2016-08-02 Lockheed Martin Corporation Method and apparatus for facial identification
US20100061609A1 (en) * 2008-09-05 2010-03-11 Siemens Medical Solutions Usa, Inc. Quotient Appearance Manifold Mapping For Image Classification
US9202140B2 (en) * 2008-09-05 2015-12-01 Siemens Medical Solutions Usa, Inc. Quotient appearance manifold mapping for image classification
US9721144B2 (en) * 2011-01-12 2017-08-01 Gary S. Shuster Graphic data alteration to enhance online privacy
US11600108B2 (en) 2011-01-12 2023-03-07 Gary S. Shuster Video and still image data alteration to enhance privacy
US20160004903A1 (en) * 2011-01-12 2016-01-07 Gary S. Shuster Graphic data alteration to enhance online privacy
US10223576B2 (en) 2011-01-12 2019-03-05 Gary S. Shuster Graphic data alteration to enhance online privacy
US8953843B1 (en) * 2012-07-17 2015-02-10 Google Inc. Selecting objects in a sequence of images
US8977003B1 (en) * 2012-07-17 2015-03-10 Google Inc. Detecting objects in a sequence of images
US9483997B2 (en) 2014-03-10 2016-11-01 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using infrared signaling
US9696414B2 (en) 2014-05-15 2017-07-04 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using sonic signaling
US9858024B2 (en) 2014-05-15 2018-01-02 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using sonic signaling
US10070291B2 (en) 2014-05-19 2018-09-04 Sony Corporation Proximity detection of candidate companion display device in same room as primary display using low energy bluetooth
US10474908B2 (en) * 2017-07-06 2019-11-12 GM Global Technology Operations LLC Unified deep convolutional neural net for free-space estimation, object detection and object pose estimation
US11853390B1 (en) * 2018-08-03 2023-12-26 Amazon Technologies, Inc. Virtual/augmented reality data evaluation
US11276196B2 (en) * 2019-04-16 2022-03-15 Sony Interactive Entertainment Inc. Video processing

Also Published As

Publication number Publication date
AU2003301795A1 (en) 2004-06-07
JP2006505875A (en) 2006-02-16
WO2004042539A2 (en) 2004-05-21
WO2004042539A9 (en) 2005-07-21
JP4486594B2 (en) 2010-06-23
WO2004042539A3 (en) 2004-09-02
AU2003301795A8 (en) 2004-06-07
EP1565887A4 (en) 2009-05-27
US7499574B1 (en) 2009-03-03
EP1565887A2 (en) 2005-08-24

Similar Documents

Publication Publication Date Title
US7499574B1 (en) Video-based face recognition using probabilistic appearance manifolds
US6590999B1 (en) Real-time tracking of non-rigid objects using mean shift
US8885943B2 (en) Face detection method and apparatus
US7912253B2 (en) Object recognition method and apparatus therefor
Lee et al. Video-based face recognition using probabilistic appearance manifolds
Sung et al. Example-based learning for view-based human face detection
JP4479478B2 (en) Pattern recognition method and apparatus
US7167578B2 (en) Probabilistic exemplar-based pattern tracking
US7139411B2 (en) Pedestrian detection and tracking with night vision
US8553931B2 (en) System and method for adaptively defining a region of interest for motion analysis in digital video
US7773781B2 (en) Face detection method and apparatus and security system employing the same
US8050453B2 (en) Robust object tracking system
US7957560B2 (en) Unusual action detector and abnormal action detecting method
US20070258646A1 (en) Human detection method and apparatus
US8355576B2 (en) Method and system for crowd segmentation
JP2012190159A (en) Information processing device, information processing method, and program
Rosales Recognition of human action using moment-based feature
Foresti et al. Face detection for visual surveillance
Pérez et al. Comparison between genetic algorithms and the baum-welch algorithm in learning hmms for human activity classification
Selvi et al. FPGA implementation of a face recognition system
Zuo et al. Facial feature extraction by a cascade of model-based algorithms
Sultana et al. A new approach for efficient face detection using bpv algorithm based on mathematical modeling
Ramakrishna et al. A comparative study on face detection algorithms
Sultana et al. A New Approach for Efficient Face Detection Using BPV Algorithm Based
Anandathirtha et al. Experiential sampling for object detection in video

Legal Events

Date Code Title Description
AS Assignment

Owner name: HONDA MOTOR CO., LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, MING-HSUAN;HO, JEFFREY;LEE, KUANG-CHIH;REEL/FRAME:014647/0289;SIGNING DATES FROM 20040302 TO 20040504

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12