US20100027835A1 - Recognizing actions of animate objects in video - Google Patents
Recognizing actions of animate objects in video Download PDFInfo
- Publication number
- US20100027835A1 US20100027835A1 US12/183,078 US18307808A US2010027835A1 US 20100027835 A1 US20100027835 A1 US 20100027835A1 US 18307808 A US18307808 A US 18307808A US 2010027835 A1 US2010027835 A1 US 2010027835A1
- Authority
- US
- United States
- Prior art keywords
- action
- postures
- graph
- component
- posture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
- G06V30/1983—Syntactic or structural pattern recognition, e.g. symbolic string recognition
- G06V30/1988—Graph matching
Definitions
- Compensating a human being to monitor video data captured by a surveillance camera remains costly.
- many retail stores have video surveillance cameras that transmit video to a control room that includes multiple display screens, such that video from different surveillance cameras are provided to different display screens.
- One or more human beings monitor the display screens in search of suspicious or illegal activity and dispatch a security officer to a particular location if suspicious or illegal activity is observed on one of the display screens.
- Use of a human is expensive, as a retail store must compensate the human being that is monitoring the display screens.
- a brief lapse in concentration can result in misappropriation of valuable goods.
- the animate object may be a human, an animal, a projected object, and/or other suitable animate object.
- An action graph is used in connection with automatically recognizing actions.
- the action graph includes nodes that represent multiple posture models (states) that are representative of different portions of an action (e.g., a particular body position).
- the action graph includes transitional probabilities that describe a probability that an animate object will transfer between postures for various actions.
- the action graph can include postures that are shared between multiple actions that can be determined through use of the action graph.
- video data that includes a plurality of video frames can be received.
- the video frames may include silhouettes of the animate object.
- Postures of the animate object are recognized by comparing information derived from the video frames with posture models in the action graph.
- a most likely path of postures can be ascertained. Once the most likely path is determined, probabilities of the path corresponding to a particular action can be ascertained. If a probability of the path corresponding to an action is above a threshold, the action can be the determined action.
- the action graph can be automatically learned based upon training data. More particularly, training data that includes multiple postures of various actions can be received, and clusters of postures can be generated. For instance, clustering postures can be based at least in part upon a determined amount of shape and motion dissimilarity between postures. Once the clusters are ascertained, transitional probabilities corresponding to the clusters can be learned for multiple actions.
- FIG. 1 is a functional block diagram of an example system that facilitates automatically recognizing an action of an animate object in video data.
- FIG. 2 is an example depiction of an action graph.
- FIG. 3 is an example depiction of a component that can be used in connection with recognizing an action of an animate object in video data.
- FIG. 4 is a functional block diagram of an example system that facilitates annotating video based at least in part upon a recognized action of an animate object in video data.
- FIG. 5 is a functional block diagram of an example system that facilitates automatically learning an action graph.
- FIG. 6 is a functional block diagram of an example system that facilitates preparing data for use in connection with learning an action graph.
- FIG. 7 is an example depiction of a component that can be used to learn an action graph.
- FIG. 8 is an example depiction of a component that can be used to learn a new action in an existing action graph.
- FIG. 9 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data.
- FIG. 10 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data.
- FIG. 11 is a flow diagram that illustrates an example methodology for learning an action graph.
- FIG. 12 is a flow diagram that illustrates an example methodology for updating an action graph with a new action.
- FIG. 13 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data.
- FIG. 14 is an example silhouette.
- FIG. 15 is an example shape contour.
- FIG. 16 is an example depiction of a shape contour fitted with an ellipse.
- FIG. 17 is an example computing system.
- the animate object may be a human, an animal, a moving object (e.g., a thrown ball), or other suitable animate object.
- the system 100 may be retained in a video camera unit or may be in a separate computing device.
- the system 100 includes a receiver component 102 that receives video data, wherein the video data includes images of an animate object.
- the video data may be a video feed that is received in real-time from a video camera.
- the video data may be received from a data storage device.
- the video data may be received in any suitable video format, may be compressed or uncompressed, sampled or unsampled, etc.
- the video data may include a silhouette of a human that is undertaking a particular action, such as walking, running, jumping, sliding, etc.
- the system 100 also includes a determiner component 104 that is in communication with the receiver component 102 and receives the video data.
- the determiner component 104 can automatically determine an action undertaken by the animate object in the received video data. More specifically, in response to receiving the video data, the determiner component 104 can detect one or more postures of the animate object in the video data.
- a posture may be a particular position of segments of the animate object, a particular position of joints of the animate object, a spatiotemporal position, etc.
- the determiner component 104 can access a data store 106 that includes an action graph 108 , wherein the action graph 108 comprises a plurality of nodes that are representative of multiple possible postures of the animate object. Moreover, at least one node in the action graph 108 can be shared amongst multiple actions.
- the action graph 108 can include edges that are representative of probabilities of transition between nodes (postures) for different actions.
- the receiver component 102 can receive video data of an animate object.
- the animate object may be a human in the act of running.
- the video data can be received by the determiner component 104 , and the determiner component 104 can detect multiple postures of the human (captured in the received video) in a sequence.
- the determiner component 104 can access the action graph 108 and compare the sequence of postures with sequences of postures of actions determinable through use of the action graph. Based at least in part upon the comparison, the determiner component 104 can output a determined action.
- the output action can be output to a data repository and stored therein, output to a display device, output to a printer, and/or the like.
- the system 100 may be used in a variety of applications.
- the system 100 may be utilized in a security application, wherein the system 100 can detect suspicious activity of an individual.
- the system 100 may be used in a retail establishment in connection with detecting shoplifting.
- the system 100 may be used in an airport to detect suspicious activity.
- the system 100 may be used to detect actions of animals (e.g., in a zoo to determine whether an animal is beginning to become aggressive).
- other applications are contemplated.
- the action graph 200 is shown for purposes of explanation as three separate graphs (a first graph 202 that corresponds to an action of running, a second graph 204 that corresponds to an action of walking, and a third graph 206 that corresponds to an action of sliding). Further, it can be noted that the graphs 202 , 204 , and 206 share postures. Accordingly, the three graphs may be represented as a single graph that can be used to determine multiple actions in video data, wherein the determination of an action can be based at least in part upon postures that are shared between actions that are determinable by way of the graph 200 .
- each action can be encoded in one or multiple paths between postures.
- the three actions of running, walking, and sliding share nine postures (e.g., states in a state model). It can be discerned that one of the actions may undergo a subset of all postures corresponding to the action. For instance, a human that undertakes the action of running may go through postures S 1 , S 4 , S 3 and S 6 (but not S 5 and S 8 , which are also postures that correspond to running). Similarly, a human that undertakes the action of walking may go through postures S 6 , S 4 , S 0 , S 7 and S 5 .
- a human that undertakes the action of sliding may go through postures S 6 , S 2 , S 4 , S 7 , and S 8 . It can thus be discerned that the three example actions of running, walking, and sliding that can be determined through use of the action graph 200 can share postures, and each action can have numerous paths in the action graph.
- action paths in the graph 200 may be cyclic, and therefore there may be no specific beginning and ending postures for the action from the recognition point of view.
- Links between postures represented in the action graph 200 can have corresponding probabilities that, for a particular action, an animate object will transition from one posture to another. Thus, for example, there may be a particular probability that human will transition from posture S 1 to posture S 4 when the human is running. Therefore, when the determiner component 104 ( FIG. 1 ) detects a sequence of postures, the determiner component 104 can analyze the action graph 200 and determine a most-likely action based upon the detected postures and the probabilities and postures of the action graph 200 .
- a most likely action that can generate the observation of X can be formatted as:
- ⁇ * ⁇ arg ⁇ ⁇ max ⁇ ⁇ ⁇ , S ⁇ ⁇ ⁇ p ⁇ ( X , S , ⁇ ) ⁇ ⁇ arg ⁇ ⁇ max ⁇ ⁇ ⁇ , S ⁇ ⁇ ⁇ p ⁇ ( ⁇ ) ⁇ p ⁇ ( S
- S , ⁇ ) ⁇ arg ⁇ ⁇ max ⁇ ⁇ ⁇ , S ⁇ ⁇ ⁇ p ⁇ ( ⁇ ) ⁇ p ⁇ ( s 1 , ... ⁇ , s n
- p( ⁇ ) is a prior probability of action ⁇
- ⁇ ) is a probability of S given action ⁇
- S, ⁇ ) is a probability of X given S and ⁇ .
- equation (1) can be written as:
- ⁇ * arg ⁇ ⁇ max ⁇ ⁇ ⁇ , S ⁇ ⁇ ⁇ p ⁇ ( ⁇ ) ⁇ p ⁇ ( s 1 , s 2 , ... ⁇ , s n
- ⁇ ) ⁇ ⁇ t 1 n ⁇ ⁇ p ⁇ ( x t
- s t ) is a probability for x t to be generated from salient posture (e.g., state) s t .
- salient posture e.g., state
- the first term of equation (2) can be a Markov Model with known states, a Visible Markov Model, or other suitable model.
- equation (2) can be represented as a set of weighted directed graphs, G that can be built upon the set of postures
- a k ⁇ p( ⁇ j
- ⁇ i ) ⁇ i,j 1 M can be a global transitional probability matrix of all actions that can be determined through use of the action graph 200 .
- G can be an action graph (such as the action graph 200 ).
- equation (2) a system that follows equation (2) can be described by a quadruplet
- ⁇ ⁇ p ( x
- G ( ⁇ ,A,A 1 ,A 2 , . . . ,A L )
- ⁇ ( ⁇ 1 , ⁇ 2 , . . . , ⁇ L ).
- the determiner component 104 is illustrated as comprising several components. It is to be understood, however, that the determiner component 104 may include more or fewer components, that functionality described as being undertaken by components may be combined or split into multiple components, and that some components may reside outside the determiner component 104 (e.g., as a separate function).
- the determiner component 104 includes a posture recognizer component 302 that can recognize a plurality of salient postures of an animate object in received video data.
- the posture recognizer component 302 can receive video data that includes a silhouette of the animate object and can extract features from the silhouette that are indicative of a particular salient posture.
- the determiner component 104 can normalize a silhouette and obtain resampled points of a resulting contour (e.g., a shape descriptor).
- a center of gravity may be located in the contour such to facilitate detecting motion in the silhouette (e.g., a motion vector).
- the posture recognizer component 302 can compare the shape descriptor and motion vector with learned postures and determine the posture of the animate object. For instance, the recognizer component 302 can determine the posture with a particular probability, wherein if the probability is above a threshold it can be determined that the animate object is at a particular posture. If the highest probability is below a threshold, it may be determined that the posture is not a learned posture.
- the determiner component 104 may also include a sequence determiner component 304 that determines a sequence of observed postures.
- the sequence determiner component 304 can receive multiple postures determined by the posture recognizer component 302 and place a subset of the postures in a sequence (in accordance with time).
- the sequence may relate to transition from one recognized posture to another.
- the animate object may be a human, and the human may be undertaking the action of walking.
- the posture recognizer component 302 can recognize numerous postures of the human while the human is walking, and the sequence determiner component 304 can receive such postures.
- the human may walk at a slower pace than most other humans, and therefore some recognized postures may be redundant.
- the sequence determiner component 304 can take into consideration variations such as the above when placing postures in a sequence.
- the determiner component 104 may also include a path determiner component 306 that can determine a most likely path in the action graph 108 ( FIG. 1 ) that corresponds to the determined sequence.
- a path determiner component 306 that can determine a most likely path in the action graph 108 ( FIG. 1 ) that corresponds to the determined sequence.
- the path determiner component 306 can then locate a most probable path in an action graph G that generates X.
- the posture recognizer component 302 can determine postures with a particular probability.
- the posture recognizer component 302 can determine that, with thirty percent certainty, the posture corresponds to a first learned posture, and that, with twenty percent certainty, the posture corresponds to a second learned posture.
- the action graph can include transitional probabilities between certain postures. Given such probabilities, the path determiner component 306 can locate a most likely path in the action graph that generates X.
- the determiner component 104 may also include a probability determiner component 308 that can determine a likelihood of each action ⁇ i given X, where ⁇ i ⁇ . Further, the determiner component 104 can include a selector component 310 that selects an action that has a highest likelihood as the action that corresponds to the received video data (the sequence X). In an example, the selector component 310 may only select an action if the probability determined by the probability determiner component 308 is above a threshold.
- the probability determiner component 308 can search for an Action Specific Viterbi Decoding (ASVD) in the action graph and can compute the likelihood for an action as follows:
- ASVD Action Specific Viterbi Decoding
- L( ⁇ i ) is the likelihood of X belonging to action ⁇ i .
- the selector component 310 can select ⁇ k as the action corresponding to X if the following condition is met:
- TH l is a threshold that can be manually set or can be learned.
- the probability determiner component 308 can search the action graph for a Viterbi path with respect to the global transitional probability (described above) and determine likelihoods for each action supported in the action graph. This can be referred to as Global Viterbi Decoding (GVD).
- VLD Global Viterbi Decoding
- the probability determiner component 308 can determine an action that generates s*, for example, through use of a unigram or bi-gram model, such as the following:
- the probability determiner component 308 can use equation (9), equation (10), or other suitable algorithm to determine the likelihood of an action to generate the path s*. As noted above, the selector component 310 may select a most likely action after the probability determiner component 308 has determined likelihoods that the sequence X corresponds to one or more actions.
- the determiner component 104 can include components that can decode an action using any of a variety of algorithms, including ASVD, Unigram with Global Viterbi Decoding (UGVD), Bi-gram with Global Viterbi Decoding (BGVD), Uni-gram with Maximum Likelihood Decoding (UMLD), and/or Bi-gram with Maximum Likelihood Decoding (BMLD).
- ASVD Unigram with Global Viterbi Decoding
- BGVD Bi-gram with Global Viterbi Decoding
- UMLD Uni-gram with Maximum Likelihood Decoding
- BMLD Bi-gram with Maximum Likelihood Decoding
- the system 400 includes the receiver component 102 that receives video data.
- the receiver component 102 may include a modifier component 402 that can modify the video to place the video in a format suitable for processing by the determiner component 104 .
- the video data may include video of a first resolution
- the determiner component 104 is configured to process video data of a second resolution.
- the modifier component 402 can alter the resolution from the first resolution to the second resolution.
- the determiner component 104 may be configured to process silhouettes of animate objects (e.g., humans), and the received video data may be full-color video.
- the modifier component 402 can extract silhouettes from the video data, sample the silhouettes to create a contour, orient the silhouette at a desired orientation.
- Other modifications are also contemplated and intended to fall within the scope of the hereto-appended claims.
- the determiner component 104 can receive the video data in a format suitable for processing and, as described above, can determine an action being undertaken by the animate object in the video data. More particularly, the determiner component 104 can access the action graph 108 and can decode an action corresponding to at least a portion of the video data through analysis of the action graph.
- the system 400 may also include an annotater component 404 that can annotate portions of the video data with information pertaining to an action that is determined to correspond to the portions of the video data.
- an annotater component 404 can annotate video provided to a security officer to highlight that an individual in the video is acting in a suspicious manner.
- Annotation undertaken by the annotater component 404 may include audio annotation (e.g., an audio alarm), annotating the video with text or graphics, etc.
- the system 500 includes a data store 502 that includes a plurality of posture samples 504 .
- the posture samples 504 can, for instance, be derived from kinematics and kinetics of animate object motion and/or automatically learned given sufficient training data. For instance, if silhouettes are used as training data, silhouettes may be clustered into M clusters.
- the system 500 additionally includes a learner component 506 that can receive the posture samples 504 and, based at least in part upon the posture samples (which take into consideration temporal information), the learner component 506 can learn a system 508 (e.g., the system F) that includes a learned action graph 510 .
- the system 508 may be used to determine motion of an animate object in a received video.
- a posture can represent a set of similar poses of an animate object, such as a human.
- the learner component 508 can take into consideration the temporal nature of animate object motion (such as human motion), the similarity between poses can measured, wherein such measurement may take into account segment and/or joint shape as well as motion.
- the system 600 includes a silhouette generator component 602 that receives video data, wherein the video data includes images of at least one animate object.
- the silhouette generator component 602 can receive the video data and automatically generate silhouette images of the at least one animate object in the video data.
- the system 600 can additionally include a normalizer component 604 that can perform a scale normalization on received silhouettes. Such scale normalization undertaken by the normalize component 604 can account for changes in body size, for instance.
- a resampler component 606 may also be included in the system 600 .
- the resampler component 606 can resample the normalized silhouette to create multiple points along a silhouette contour.
- a point selector component 608 can then select a relatively small number of points along the contour to create a shape descriptor 610 (e.g., a set of points that describes the shape of the contour).
- the point selector component 608 may select points based at least in part upon noise and computational efficiency.
- the system 600 can also include an orientation estimator component 612 that can detect a change in orientation of the animate object and local motion of a gravity center of the animate object.
- an orientation estimator component 612 can estimate the change in motion of the human body by fitting an ellipse into the resampled silhouette shape. The estimated change in motion can be used by the learner component 506 ( FIG. 5 ) to learn the system 508 and the action graph 510 .
- the learner component 506 may include a shape dissimilarity determiner component 702 that determines dissimilarity between different shape descriptors.
- the contour of a silhouette can be normalized and resampled to a relatively small number of points.
- f sp ⁇ x 1 ,x 2 , . . . ,x b ⁇
- Dissimilarity of the two shapes can be defined as:
- d sp 1 1 + ⁇ - a ⁇ ( d h ⁇ ( f sp , f sp ′ ) - c ) , ( 12 )
- d h (X, Y) is a Hausdorff distance between X and Y; a and c are two constants.
- the learner component 506 can additionally include a motion dissimilarity determiner component 704 that determines motion dissimilarity between motion feature vectors of silhouettes.
- the dissimilarity of x and x′ in terms of motion can be defined as follows:
- the learner component 506 can also include a clusterer component 706 that can cluster silhouettes based at least in part upon the aforementioned dissimilarities. More specifically, dissimilarity of two silhouettes can be defined as a product of motion and shape dissimilarity:
- the clusterer component 706 can use any suitable clustering algorithm to cluster the J silhouettes into M clusters.
- the clusterer component 606 may use Normalized Cuts (NCuts), Dominant Sets (DS), Non-Euclidean Relational Fuzzy (NERF) C-Means, and/or other suitable clustering algorithm in connection with clustering silhouettes.
- the learner component 506 may further include an estimator component 708 that estimates salient postures that can be used in an action graph. For instance, after clustering, the estimator component 708 can fit a Gaussian Mixture Model (GMM) using a suitable expectation and maximization (EM) algorithm to the shape component of a cluster to represent spatial distribution of contours of silhouettes belonging to a particular posture cluster. The estimator component 708 can fit another Gaussian to the motion component of the cluster to obtain a compact representation of a model of a posture. This can be represented as follows:
- GMM Gaussian Mixture Model
- EM expectation and maximization
- s ) N ⁇ ( y mt ; ⁇ mt , s ; ⁇ mt , s ) ( 16 )
- s) is a GMM with C components for shape and p mt (y mt
- s represents s salient postures/states (or clusters of silhouettes)
- N(.) is a Gaussian function
- y mt represents the motion feature vector
- ⁇ mt,s is a mean motion vector for salient posture s
- ⁇ mt,s is a 3 ⁇ 3 matrix denoting covariance of the motion features
- y sp represents 2D coordinates of a point on the contours of silhouettes
- ⁇ k,s is the center of the kth Gaussian for posture S
- ⁇ k,s is a 2 ⁇ 2 covariance matrix
- an estimated model for a posture that can be used in an action graph (e.g.
- x is a silhouette
- y mt and y sp i represent respectively the motion feature and the ith point on the resampled contour of x.
- the learner component 506 can also include a linker component 710 that can link learned postures (posture models) with transitional probabilities.
- the action-specific and global transitional probabilities can be defined as follows:
- x t - 1 ) ⁇ t 1 J ⁇ p ⁇ ( ⁇ i
- x t - 1 , ⁇ l ) ⁇ t 1 J l ⁇ p ⁇ ( ⁇ i
- J is a total number of training silhouettes for all actions and J l is a number of silhouettes contained in training samples for the action ⁇ l .
- ⁇ l ) can be obtained through marginalization of p( ⁇ i
- the learner component 506 can extend an existing system to recognize a new action.
- both the action graph and posture models may desirably be updated.
- the learner component 506 can limit the addition of the new action to insertion of new postures that describe ⁇ L+1 into the action graph, modification of A, and insertion of A L+1 .
- ⁇ includes all postures that are required to describe the action ⁇ L+1 .
- postures can be shared and new paths can be inserted into the action graph by updating A and A L+1 .
- ⁇ does not include all postures that are required to describe the action ⁇ L+1 .
- new postures can be created for ⁇ L+1 and the action graph can be expanded by updating A and A L+1 .
- An example approach is to locate salient postures for a new action first and thereafter to decide whether such postures have already been learned by the learning component 506 by comparing the located postures to those residing in the existing action graph.
- the clusterer component 706 can operate as described above to generate clusters of silhouettes (or other training samples).
- Similarity can be determined in any suitable manner.
- postures can be modeled by a single Gaussian for motion and a GMM for shape
- similarity between two postures can be measured by Kullback-Leibler (KL) divergence.
- KL divergence for motion between postures s and s′ can be defined as follows:
- p mt ′ ) ⁇ D ⁇ ( N ⁇ ( y mt ; ⁇ mt ; ⁇ mt )
- N ⁇ ( y mt ; ⁇ mt ′ ; ⁇ mt ′ ) ) ⁇ 1 2 [ log ⁇ ⁇ det ⁇ ( ⁇ mt ′ ) det ⁇ ( ⁇ mt ) + tr ⁇ ( ⁇ mt - 1 ⁇ ⁇ mt ) - d + ⁇ ( ⁇ mt - ⁇ mt ′ ) T ⁇ ⁇ mt - 1 ⁇ ( ⁇ mt - ⁇ mt ′ ) ] ,
- KL(p ⁇ p′) represents a KL-divergence between distribution p and p′
- D(N ⁇ N′) is the KL-divergence between two Gaussians, N and N′.
- KL-divergence for shape between postures s and s′ can be defined as follows:
- p sp ′ ) ⁇ a ⁇ ⁇ a ⁇ log ⁇ ⁇ ⁇ a ′ ⁇ ⁇ a ′ ⁇ ⁇ - D ⁇ ( N a
- s′ may be deemed similar to s if the following condition is met:
- K L mt , ⁇ KL mt , K L sp , and ⁇ KL sp are the mean and standard deviation of the KL-divergences of all pairs of postures in the system ⁇ before updating, ⁇ sp ⁇ (0,1] and ⁇ mt ⁇ (0,1] are constants.
- the learner component 506 can further include a union component 804 that merges ⁇ and ⁇ ′. More particularly, the union component can create ⁇ new as the union of ⁇ and ⁇ ′, such that A new is the posture models (learned postures) of ⁇ new .
- the learner component 506 additionally includes a system estimator component 806 that estimates the transitional probabilities A L+1 and A′ from the K training samples for ⁇ L+1 based at least in part upon ⁇ new .
- the system estimator component 806 can update A as follows:
- ⁇ (0, 1) is a weighting factor controlling the contribution of new action samples to the global transition. Since the number of training samples K may be relatively small compared to a number of samples used to train A, A′ is often much less reliable than A. Accordingly, the system estimator component 806 can limit the contribution of A′ to final global transitional probabilities by the factor ⁇ , which can be selected to reflect the ratio of the size of the new training samples to the size of the samples used to estimate A.
- training samples for a new action may only capture a relatively small proportion of possible posture transitions that correspond to the new action. Accordingly, A L+1 may not be a reliable estimation of a true transition.
- the system estimator component 806 may employ a smoothing technique to facilitate compensating for a relatively small number of samples. In an example, the following linear model may be used to smooth A L+1 :
- Equation (22) can be interpreted as an interpolation of bi-gram and uni-gram transitional probabilities.
- the transitional probability can be set to be the uni-gram probability of the second posture of the bi-gram. If the estimator component 806 provides too much weight to the uni-gram probability, faulty estimation may result if s i is very frequent. Therefore, the estimator component 806 may decrease the value of the weight exponentially with a number of bi-gram observations.
- the learner component 506 can also include the linker component 710 , which can link learned postures (posture models) with transitional probabilities as described above.
- FIGS. 9-13 various example methodologies are illustrated and described. While the methodologies are described as being a series of acts that are performed in a sequence, it is to be understood that the methodologies are not limited by the order of the sequence. For instance, some acts may occur in a different order than what is described herein. In addition, an act may occur concurrently with another act. Furthermore, in some instances, not all acts may be required to implement a methodology described herein.
- the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media.
- the computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like.
- results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like.
- the methodology 900 starts at 902 , and at 904 video data is received, wherein the video data can include a plurality of video frames that comprise images of an animate object.
- a data store is accessed that comprises data representable by an action graph.
- the action graph can include a plurality of nodes that are representative of a plurality of postures of animate objects. These nodes can be referred to as posture models.
- at least one node of the action graph can correspond to multiple actions that are determinable through use of the action graph. For instance, a posture may be common between actions of walking and running, and a node can represent such posture with respect to both actions in the action graph.
- an action undertaken by the animate object in the plurality of video frames is determined based at least in part upon the action graph. For instance, determining the action may include extracting features from video data received at 904 . Further, determining the action may comprise determining at least one posture of the animate object in the video data based at least in part upon the extracted features. The methodology 900 then completes at 910 .
- FIG. 10 an example methodology 1000 for determining an action undertaken by a human being captured in video data is illustrated.
- the methodology 1000 starts at 1002 , and at 1004 a sequence of silhouettes are received. For instance, video data can be sampled and silhouettes of human beings can be generated.
- postures corresponding to the silhouettes can be recognized. For instance, contours of the human being in the video frame can be generated, and such contours can be compared with postures in the action graph.
- a most-likely path in the action graph that corresponds to the recognized postures is determined.
- the action graph can have corresponding transitional probabilities (global and local), and a most likely path can be determined based at least in part upon the recognized postured and the transitional probabilities.
- a most-likely action that corresponds to the determined path can be determined. For instance, a particular probability that a sequence of silhouettes in the video frame corresponds to a particular action can be determined, and if the probability is above a threshold the most-likely action can be output as a determined action.
- the methodology 1000 then completes at 1012 .
- the methodology 1100 starts at 1102 , and at 1104 training data is received.
- the training data may be a plurality of sequences of video data that includes an animate object moving in accordance with a particular action.
- the plurality of sequences of video data may be silhouettes, although other training data is also contemplated.
- shape and motion dissimilarities are determined from images of animate objects in the training data.
- postures are clustered based at least in part upon the determined shape and motion dissimilarities.
- transitional probabilities are estimated between clusters. Estimation of transitional probabilities has been described in detail above.
- clusters e.g., posture models
- the methodology 1100 completes at 1114 .
- the methodology 1200 starts at 1202 , and at 1204 a video of an animate object undertaking a new action is received, wherein “new action” refers to an action that has not yet been supported in an underlying action graph.
- new action refers to an action that has not yet been supported in an underlying action graph.
- postures that describe the new action are determined, and at 1208 the determined actions are compared with postures existent in the underlying action graph.
- determined postures that are found to be similar to postures existent in the underlying action graph are removed from the action graph.
- the methodology 1200 then completes at 1212 .
- the methodology 1300 starts at 1302 , and at 1304 a plurality of video frames are received.
- the plurality of video frames can include a sequence of silhouettes of a human being.
- a plurality of postures of the human being in the sequence of silhouettes are determined.
- the plurality of postures are compared with postures represented in an action graph.
- the action graph can include multiple postures pertaining to numerous actions.
- a first posture in the action graph may be linked to a second posture in the action graph by a probability that for a first action the human being will transition from the first posture to the second posture.
- at least one posture in the action graph can correspond to more than one action.
- a most likely action that corresponds to the determined plurality of postures can be determined based at least in part upon the comparison.
- the most likely action can be output as a determined action.
- the methodology 1300 completes at 1314 .
- FIG. 14 an example depiction of a silhouette 1400 of a human being is illustrated.
- the silhouette may be received in connection with determining an action undertaken by a human that is represented by the silhouette.
- the silhouette may be used as training data.
- FIG. 15 an example resampled image 1500 is depicted, wherein the resampled image includes multiple points that represent a contour.
- a contour may be used in connection with determining shape dissimilarity between the contour and another contour, for example.
- an example oriented image 1600 is depicted, wherein an ellipse is fit over the contour and a center of gravity is discerned. Such information can be used in connection with determining dissimilarity of motion between contours.
- FIG. 17 a high-level illustration of an example computing device 1700 that can be used in accordance with the systems and methodologies disclosed herein is illustrated.
- the computing device 1700 may be used in a system that can be used to determine an action undertaken by an animate object in video data and/or used to learn a system that can be used to automatically determine actions in video data.
- the computing device 1700 includes at least one processor 1702 that executes instructions that are stored in a memory 1704 .
- the instructions may be, for instance, instructions for implementing functionality described as being carried out by one or more components discussed above or instructions for implementing one or more of the methods described above.
- the processor 1702 may access the memory by way of a system bus 1706 .
- the memory 1204 may also store images, one or more action graphs, etc.
- the computing device 1700 additionally includes a data store 1708 that is accessible by the processor 1702 by way of the system bus 1706 .
- the data store 1708 may include executable instructions, silhouettes, training data, etc.
- the computing device 1700 also includes an input interface 1710 that allows external devices to communicate with the computing device 1700 .
- the input interface 1710 may be used to receive instructions from an external computer device, receive video data from a video source, etc.
- the computing device 1700 also includes an output interface 1712 that interfaces the computing device 1700 with one or more external devices.
- the computing device 1700 may transmit data to a personal computer by way of the output interface 1712 .
- the computing device 1700 may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by the computing device 1700 .
- a system or component may be a process, a process executing on a processor, or a processor. Additionally, a component or system may be localized on a single device or distributed across several devices.
Abstract
Description
- Popularity of video surveillance systems has increased over the last several years. Such popularity increase can be attributed, at least in part, to advancements in video technology, reduction in price of video cameras, as well as increase in video storage capacity. For instance, many consumer-level video cameras can generate relatively high resolution video data, and such cameras often are equipped with hard drives that can be used to retain several hours of video data. Furthermore, even if a video camera is not equipped with a hard drive, the video camera can be placed in communication with a data store (e.g., by way of a firewire cable) and video data can be directed to the data store for short-term or long-term storage. Thus, capturing video and storing video are relatively inexpensive.
- Compensating a human being, however, to monitor video data captured by a surveillance camera remains costly. For instance, many retail stores have video surveillance cameras that transmit video to a control room that includes multiple display screens, such that video from different surveillance cameras are provided to different display screens. One or more human beings monitor the display screens in search of suspicious or illegal activity and dispatch a security officer to a particular location if suspicious or illegal activity is observed on one of the display screens. Use of a human, however, is expensive, as a retail store must compensate the human being that is monitoring the display screens. Furthermore, a brief lapse in concentration can result in misappropriation of valuable goods.
- Accordingly, systems have been developed that can be used to analyze video data and automatically determine particular actions that are being undertaken by an individual in the video data. Such systems, however, are generally inefficient to operate and difficult to train.
- The following is a brief summary of subject matter that is described in greater detail herein. This summary is not intended to be limiting as to the scope of the claims.
- Described herein are various technologies pertaining to automatic recognition of an action of an animate object in video data. For instance, the animate object may be a human, an animal, a projected object, and/or other suitable animate object. An action graph is used in connection with automatically recognizing actions. The action graph includes nodes that represent multiple posture models (states) that are representative of different portions of an action (e.g., a particular body position). Further, the action graph includes transitional probabilities that describe a probability that an animate object will transfer between postures for various actions. Pursuant to an example, the action graph can include postures that are shared between multiple actions that can be determined through use of the action graph.
- With respect to determining an action, video data that includes a plurality of video frames can be received. In an example, the video frames may include silhouettes of the animate object. Postures of the animate object are recognized by comparing information derived from the video frames with posture models in the action graph. By analyzing transitional probabilities of the action graph, a most likely path of postures can be ascertained. Once the most likely path is determined, probabilities of the path corresponding to a particular action can be ascertained. If a probability of the path corresponding to an action is above a threshold, the action can be the determined action.
- The action graph can be automatically learned based upon training data. More particularly, training data that includes multiple postures of various actions can be received, and clusters of postures can be generated. For instance, clustering postures can be based at least in part upon a determined amount of shape and motion dissimilarity between postures. Once the clusters are ascertained, transitional probabilities corresponding to the clusters can be learned for multiple actions.
- Other aspects will be appreciated upon reading and understanding the attached figures and description.
-
FIG. 1 is a functional block diagram of an example system that facilitates automatically recognizing an action of an animate object in video data. -
FIG. 2 is an example depiction of an action graph. -
FIG. 3 is an example depiction of a component that can be used in connection with recognizing an action of an animate object in video data. -
FIG. 4 is a functional block diagram of an example system that facilitates annotating video based at least in part upon a recognized action of an animate object in video data. -
FIG. 5 is a functional block diagram of an example system that facilitates automatically learning an action graph. -
FIG. 6 is a functional block diagram of an example system that facilitates preparing data for use in connection with learning an action graph. -
FIG. 7 is an example depiction of a component that can be used to learn an action graph. -
FIG. 8 is an example depiction of a component that can be used to learn a new action in an existing action graph. -
FIG. 9 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data. -
FIG. 10 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data. -
FIG. 11 is a flow diagram that illustrates an example methodology for learning an action graph. -
FIG. 12 is a flow diagram that illustrates an example methodology for updating an action graph with a new action. -
FIG. 13 is a flow diagram that illustrates an example methodology for recognizing an action of an animate object in video data. -
FIG. 14 is an example silhouette. -
FIG. 15 is an example shape contour. -
FIG. 16 is an example depiction of a shape contour fitted with an ellipse. -
FIG. 17 is an example computing system. - Various technologies pertaining to action recognition in general, and automatic human action recognition in particular, will now be described with reference to the drawings, where like reference numerals represent like elements throughout. In addition, several functional block diagrams of example systems are illustrated and described herein for purposes of explanation; however, it is to be understood that functionality that is described as being carried out by certain system components may be performed by multiple components. Similarly, for instance, a component may be configured to perform functionality that is described as being carried out by multiple components.
- With reference now to
FIG. 1 , anexample system 100 that facilitates automatically determining an action of an animate object in received video data is illustrated. In an example, the animate object may be a human, an animal, a moving object (e.g., a thrown ball), or other suitable animate object. Thesystem 100 may be retained in a video camera unit or may be in a separate computing device. Thesystem 100 includes areceiver component 102 that receives video data, wherein the video data includes images of an animate object. For instance, the video data may be a video feed that is received in real-time from a video camera. In another example, the video data may be received from a data storage device. Furthermore, the video data may be received in any suitable video format, may be compressed or uncompressed, sampled or unsampled, etc. In a particular example, the video data may include a silhouette of a human that is undertaking a particular action, such as walking, running, jumping, sliding, etc. - The
system 100 also includes adeterminer component 104 that is in communication with thereceiver component 102 and receives the video data. Thedeterminer component 104 can automatically determine an action undertaken by the animate object in the received video data. More specifically, in response to receiving the video data, thedeterminer component 104 can detect one or more postures of the animate object in the video data. A posture may be a particular position of segments of the animate object, a particular position of joints of the animate object, a spatiotemporal position, etc. Further, thedeterminer component 104 can access adata store 106 that includes anaction graph 108, wherein theaction graph 108 comprises a plurality of nodes that are representative of multiple possible postures of the animate object. Moreover, at least one node in theaction graph 108 can be shared amongst multiple actions. In addition, theaction graph 108 can include edges that are representative of probabilities of transition between nodes (postures) for different actions. - In operation, the
receiver component 102 can receive video data of an animate object. For instance, the animate object may be a human in the act of running. The video data can be received by thedeterminer component 104, and thedeterminer component 104 can detect multiple postures of the human (captured in the received video) in a sequence. Thedeterminer component 104 can access theaction graph 108 and compare the sequence of postures with sequences of postures of actions determinable through use of the action graph. Based at least in part upon the comparison, thedeterminer component 104 can output a determined action. For instance, the output action can be output to a data repository and stored therein, output to a display device, output to a printer, and/or the like. - The
system 100 may be used in a variety of applications. For instance, thesystem 100 may be utilized in a security application, wherein thesystem 100 can detect suspicious activity of an individual. In a detailed example, thesystem 100 may be used in a retail establishment in connection with detecting shoplifting. In another example, thesystem 100 may be used in an airport to detect suspicious activity. In still yet another example, thesystem 100 may be used to detect actions of animals (e.g., in a zoo to determine whether an animal is beginning to become aggressive). Of course, it is to be understood that other applications are contemplated. - Now referring to
FIG. 2 , an example depiction of anaction graph 200 is presented. Theaction graph 200 is shown for purposes of explanation as three separate graphs (afirst graph 202 that corresponds to an action of running, asecond graph 204 that corresponds to an action of walking, and athird graph 206 that corresponds to an action of sliding). Further, it can be noted that thegraphs graph 200. - In action graphs in general, each action can be encoded in one or multiple paths between postures. In the
example graph 200, the three actions of running, walking, and sliding share nine postures (e.g., states in a state model). It can be discerned that one of the actions may undergo a subset of all postures corresponding to the action. For instance, a human that undertakes the action of running may go through postures S1, S4, S3 and S6 (but not S5 and S8, which are also postures that correspond to running). Similarly, a human that undertakes the action of walking may go through postures S6, S4, S0, S7 and S5. In another example, a human that undertakes the action of sliding may go through postures S6, S2, S4, S7, and S8. It can thus be discerned that the three example actions of running, walking, and sliding that can be determined through use of theaction graph 200 can share postures, and each action can have numerous paths in the action graph. In addition, action paths in thegraph 200 may be cyclic, and therefore there may be no specific beginning and ending postures for the action from the recognition point of view. - Links between postures represented in the
action graph 200 can have corresponding probabilities that, for a particular action, an animate object will transition from one posture to another. Thus, for example, there may be a particular probability that human will transition from posture S1 to posture S4 when the human is running. Therefore, when the determiner component 104 (FIG. 1 ) detects a sequence of postures, thedeterminer component 104 can analyze theaction graph 200 and determine a most-likely action based upon the detected postures and the probabilities and postures of theaction graph 200. - The following is a mathematical description of action graphs, wherein silhouettes are employed to obtain postures of an animate object in video data, and wherein the action graph is based at least in part upon postures learned from silhouettes. It is to be understood, however, that other forms of animate objects may be used in connection with detecting postures and/or learning postures. For instance, rather than using silhouettes, full-color images in video data may be used to detect postures or learn postures. In another example, grayscale images in video data may be used to detect postures and/or learn postures.
- In an example, X={x1, x2, . . . , xn} can be a sequence of n silhouettes, Ω={ω1, ω2, . . . ωM} can be a set of M salient postures that constitute actions that are desirably determined by way of the
action graph 200. A corresponding posture sequence derived from X can be denoted as S={s1, s2, . . . , sn}, where st∈Ω, t=1,2, . . . , n. Furthermore, Ψ={ψ1, ψ2, . . . , ψL} can denote a set of L actions, and X can be generated from one of the L actions. A most likely action that can generate the observation of X can be formatted as: -
- where p(ψ) is a prior probability of action ψ, p(S|ψ) is a probability of S given action ψ, and p(X|S,ψ) is a probability of X given S and ψ.
- Further, it can be assumed that i) xt is statistically independent of ψ given S; ii) xt statistically depends only on st; and iii) st is independent of the future states and only depends on its previous state st−1. Accordingly, equation (1) can be written as:
-
- where p(xt|st) is a probability for xt to be generated from salient posture (e.g., state) st. Further, it can be assumed that the set of postures can be known from or computed from training data, and the first term of equation (2) can be a Markov Model with known states, a Visible Markov Model, or other suitable model.
- Thus, equation (2) can be represented as a set of weighted directed graphs, G that can be built upon the set of postures
-
G={Ω,A,A1,A2, . . . ,AL} (3) - where each posture can serve as a node, Ak={p(ωj|ωi,ψk)}i,j=1:M k=1:L can be a transitional probability matrix of the kth action and A={p(ωj|ωi)}i,j=1 M can be a global transitional probability matrix of all actions that can be determined through use of the
action graph 200. Thus, G can be an action graph (such as the action graph 200). - With the graphical interpretation (e.g., the
action graph 200 or other suitable action graph), a system that follows equation (2) can be described by a quadruplet, -
Γ=(Ω,Λ,G,Ψ) (4) -
where -
Ω={ω1,ω2, . . . ,ωM} (5) -
Λ={p(x|ω 1),p(x|ω 2), . . . ,p(x|ω M)} -
G=(Ω,A,A1,A2, . . . ,AL) -
Ψ=(ψ1,ψ2, . . . ,ψL). - Turning now to
FIG. 3 , an example depiction of thedeterminer component 104 is illustrated. Thedeterminer component 104 is illustrated as comprising several components. It is to be understood, however, that thedeterminer component 104 may include more or fewer components, that functionality described as being undertaken by components may be combined or split into multiple components, and that some components may reside outside the determiner component 104 (e.g., as a separate function). - As illustrated, the
determiner component 104 includes aposture recognizer component 302 that can recognize a plurality of salient postures of an animate object in received video data. Pursuant to an example, theposture recognizer component 302 can receive video data that includes a silhouette of the animate object and can extract features from the silhouette that are indicative of a particular salient posture. For instance, thedeterminer component 104 can normalize a silhouette and obtain resampled points of a resulting contour (e.g., a shape descriptor). Further, a center of gravity may be located in the contour such to facilitate detecting motion in the silhouette (e.g., a motion vector). Theposture recognizer component 302 can compare the shape descriptor and motion vector with learned postures and determine the posture of the animate object. For instance, therecognizer component 302 can determine the posture with a particular probability, wherein if the probability is above a threshold it can be determined that the animate object is at a particular posture. If the highest probability is below a threshold, it may be determined that the posture is not a learned posture. - The
determiner component 104 may also include asequence determiner component 304 that determines a sequence of observed postures. For instance, thesequence determiner component 304 can receive multiple postures determined by theposture recognizer component 302 and place a subset of the postures in a sequence (in accordance with time). For example, the sequence may relate to transition from one recognized posture to another. In another example, the animate object may be a human, and the human may be undertaking the action of walking. Theposture recognizer component 302 can recognize numerous postures of the human while the human is walking, and thesequence determiner component 304 can receive such postures. The human, however, may walk at a slower pace than most other humans, and therefore some recognized postures may be redundant. Thesequence determiner component 304 can take into consideration variations such as the above when placing postures in a sequence. - The
determiner component 104 may also include apath determiner component 306 that can determine a most likely path in the action graph 108 (FIG. 1 ) that corresponds to the determined sequence. Continuing with the example described with respect toFIG. 2 , an action of a sequence X={x1,x2, . . . ,xn} can be received by thepath determiner component 306. The path determinercomponent 306 can then locate a most probable path in an action graph G that generates X. In an example, theposture recognizer component 302 can determine postures with a particular probability. For instance, theposture recognizer component 302 can determine that, with thirty percent certainty, the posture corresponds to a first learned posture, and that, with twenty percent certainty, the posture corresponds to a second learned posture. The action graph can include transitional probabilities between certain postures. Given such probabilities, thepath determiner component 306 can locate a most likely path in the action graph that generates X. - The
determiner component 104 may also include aprobability determiner component 308 that can determine a likelihood of each action ψi given X, where ψi∈Ψ. Further, thedeterminer component 104 can include aselector component 310 that selects an action that has a highest likelihood as the action that corresponds to the received video data (the sequence X). In an example, theselector component 310 may only select an action if the probability determined by theprobability determiner component 308 is above a threshold. - Pursuant to an example, the
probability determiner component 308 can search for an Action Specific Viterbi Decoding (ASVD) in the action graph and can compute the likelihood for an action as follows: -
L(ψi)=maxψi ∈Ψ,s∈Ω p(ψi)Πt=1 n p(s t |s t−1,ψi)Πt=1 n p(x t |s t), (6) - where L(ψi) is the likelihood of X belonging to action ψi. The
selector component 310 can select ψk as the action corresponding to X if the following condition is met: -
- where THl is a threshold that can be manually set or can be learned.
- In another example, the
probability determiner component 308 can search the action graph for a Viterbi path with respect to the global transitional probability (described above) and determine likelihoods for each action supported in the action graph. This can be referred to as Global Viterbi Decoding (GVD). - In GVD, the most likely path is the path s*={s*1,s*2, . . . ,s*n} that satisfies
-
s*=arg maxst ∈ΩΠt=1 n p(s t |s t−1)p(x t |s t). (8) - The
probability determiner component 308 can determine an action that generates s*, for example, through use of a unigram or bi-gram model, such as the following: -
L(ψi)=arg maxψi ∈Ψ p(ψi)Πt=1 n p(s* t|ψi) (unigram) (9) -
L(ψi)=arg maxψi ∈Ψ p(ψi)Πt=1 n p(s* t |s* i−1,ψi) (bi-gram) (10) - In yet another example, the
probability determiner component 308 can use Maximum Likelihood Decoding (MLD) in connection with determining likelihoods with respect to different actions. More particularly, theprobability determiner component 308 can search for a sequence of most likely postures in the action graph rather than a most likely sequence of postures (Viterbi path), e.g., -
s*=arg maxs st ∈ΩΠt=1 n =p(x t |s t). (11) - The
probability determiner component 308 can use equation (9), equation (10), or other suitable algorithm to determine the likelihood of an action to generate the path s*. As noted above, theselector component 310 may select a most likely action after theprobability determiner component 308 has determined likelihoods that the sequence X corresponds to one or more actions. - From the above it can be discerned that the
determiner component 104 can include components that can decode an action using any of a variety of algorithms, including ASVD, Unigram with Global Viterbi Decoding (UGVD), Bi-gram with Global Viterbi Decoding (BGVD), Uni-gram with Maximum Likelihood Decoding (UMLD), and/or Bi-gram with Maximum Likelihood Decoding (BMLD). - With reference now to
FIG. 4 , anexample system 400 that facilitates annotating video in accordance with a recognized action in the video is illustrated. Thesystem 400 includes thereceiver component 102 that receives video data. Thereceiver component 102 may include amodifier component 402 that can modify the video to place the video in a format suitable for processing by thedeterminer component 104. For instance, the video data may include video of a first resolution, while thedeterminer component 104 is configured to process video data of a second resolution. Themodifier component 402 can alter the resolution from the first resolution to the second resolution. In another example, thedeterminer component 104 may be configured to process silhouettes of animate objects (e.g., humans), and the received video data may be full-color video. Themodifier component 402 can extract silhouettes from the video data, sample the silhouettes to create a contour, orient the silhouette at a desired orientation. Other modifications are also contemplated and intended to fall within the scope of the hereto-appended claims. - The
determiner component 104 can receive the video data in a format suitable for processing and, as described above, can determine an action being undertaken by the animate object in the video data. More particularly, thedeterminer component 104 can access theaction graph 108 and can decode an action corresponding to at least a portion of the video data through analysis of the action graph. - The
system 400 may also include anannotater component 404 that can annotate portions of the video data with information pertaining to an action that is determined to correspond to the portions of the video data. For instance, thesystem 400 may be used in a security context, and a suspicious action can be detected by thedeterminer component 104. Theannotater component 404 can annotate video provided to a security officer to highlight that an individual in the video is acting in a suspicious manner. Annotation undertaken by theannotater component 404 may include audio annotation (e.g., an audio alarm), annotating the video with text or graphics, etc. - With reference now to
FIG. 5 , asystem 500 that facilitates learning a system that includes an action graph is illustrated, wherein the system can be used to determine actions of an animate object in video. Thesystem 500 includes adata store 502 that includes a plurality ofposture samples 504. Theposture samples 504 can, for instance, be derived from kinematics and kinetics of animate object motion and/or automatically learned given sufficient training data. For instance, if silhouettes are used as training data, silhouettes may be clustered into M clusters. - The
system 500 additionally includes alearner component 506 that can receive theposture samples 504 and, based at least in part upon the posture samples (which take into consideration temporal information), thelearner component 506 can learn a system 508 (e.g., the system F) that includes a learnedaction graph 510. Thesystem 508 may be used to determine motion of an animate object in a received video. As noted above, a posture can represent a set of similar poses of an animate object, such as a human. Thelearner component 508 can take into consideration the temporal nature of animate object motion (such as human motion), the similarity between poses can measured, wherein such measurement may take into account segment and/or joint shape as well as motion. - With reference now to
FIG. 6 , anexample system 600 that facilitates obtaining posture samples is illustrated. Thesystem 600 includes asilhouette generator component 602 that receives video data, wherein the video data includes images of at least one animate object. Thesilhouette generator component 602 can receive the video data and automatically generate silhouette images of the at least one animate object in the video data. - The
system 600 can additionally include anormalizer component 604 that can perform a scale normalization on received silhouettes. Such scale normalization undertaken by the normalizecomponent 604 can account for changes in body size, for instance. - A
resampler component 606 may also be included in thesystem 600. Theresampler component 606 can resample the normalized silhouette to create multiple points along a silhouette contour. Apoint selector component 608 can then select a relatively small number of points along the contour to create a shape descriptor 610 (e.g., a set of points that describes the shape of the contour). Thepoint selector component 608 may select points based at least in part upon noise and computational efficiency. - The
system 600 can also include anorientation estimator component 612 that can detect a change in orientation of the animate object and local motion of a gravity center of the animate object. In an example, if the shape descriptor describes a human, motion of the human include a change of orientation of the human body and local motion of gravity center of the human body. Theorientation estimator component 612 can estimate the change in motion of the human body by fitting an ellipse into the resampled silhouette shape. The estimated change in motion can be used by the learner component 506 (FIG. 5 ) to learn thesystem 508 and theaction graph 510. - Now referring to
FIG. 7 , an example depiction of thelearner component 506 is depicted. Thelearner component 506 may include a shapedissimilarity determiner component 702 that determines dissimilarity between different shape descriptors. As described above, the contour of a silhouette can be normalized and resampled to a relatively small number of points. For instance, it can be assumed that fsp={x1,x2, . . . ,xb} and f′sp={x′1,x′2, . . . ,x′b} are two shapes that are described by a set of b points (e.g., points selected by the point selector component 608). Dissimilarity of the two shapes can be defined as: -
- where dh(X, Y) is a Hausdorff distance between X and Y; a and c are two constants.
- The
learner component 506 can additionally include a motiondissimilarity determiner component 704 that determines motion dissimilarity between motion feature vectors of silhouettes. As noted above, motion features can include a change of orientation of an animate object and local motion of its gravity center. The orientation of the animate object can be estimated by fitting an ellipse into a silhouette shape. It can be assumed that fm=(δx,δy,δθ) and f′m=(δx′,δy′,δθ′) are motion vectors of silhouettes x and x′, respectively. The dissimilarity of x and x′ in terms of motion can be defined as follows: -
- where corr(.,.) represents correlation
- The
learner component 506 can also include aclusterer component 706 that can cluster silhouettes based at least in part upon the aforementioned dissimilarities. More specifically, dissimilarity of two silhouettes can be defined as a product of motion and shape dissimilarity: -
d=d sp *d mt. (14) - Values of dissimilarity may be placed in a form suitable for processing, such as in a matrix. For instance, D=[dij]i,j=1 J can be a dissimilarity matrix of all pairs of J training silhouettes, where D is a J×J symmetric matrix. The
clusterer component 706 can use any suitable clustering algorithm to cluster the J silhouettes into M clusters. For instance, theclusterer component 606 may use Normalized Cuts (NCuts), Dominant Sets (DS), Non-Euclidean Relational Fuzzy (NERF) C-Means, and/or other suitable clustering algorithm in connection with clustering silhouettes. - The
learner component 506 may further include anestimator component 708 that estimates salient postures that can be used in an action graph. For instance, after clustering, theestimator component 708 can fit a Gaussian Mixture Model (GMM) using a suitable expectation and maximization (EM) algorithm to the shape component of a cluster to represent spatial distribution of contours of silhouettes belonging to a particular posture cluster. Theestimator component 708 can fit another Gaussian to the motion component of the cluster to obtain a compact representation of a model of a posture. This can be represented as follows: -
- where psp(ysp|s) is a GMM with C components for shape and pmt(ymt|s) is a Gaussian for motion, and where s represents s salient postures/states (or clusters of silhouettes), N(.) is a Gaussian function, ymt represents the motion feature vector; μmt,s is a mean motion vector for salient posture s, Σmt,s is a 3×3 matrix denoting covariance of the motion features, ysp represents 2D coordinates of a point on the contours of silhouettes, μk,s is the center of the kth Gaussian for posture S, Σk,s is a 2×2 covariance matrix, πk,s is a mixture proportion such that Σk=1 Cπk,s=1. Accordingly, an estimated model for a posture that can be used in an action graph (e.g., a posture model) can be defined as:
-
p(x|s)=p mt(y mt |s)Πi=1 b p sp(y sp i |s) (17) - where x is a silhouette, ymt and ysp i represent respectively the motion feature and the ith point on the resampled contour of x.
- The
learner component 506 can also include alinker component 710 that can link learned postures (posture models) with transitional probabilities. For example, thelinker component 710 can estimate action-specific and global-transitional probability matrices {Ai}i=1 L and A from training samples given statistical independence assumptions described above. The action-specific and global transitional probabilities can be defined as follows: -
- where J is a total number of training silhouettes for all actions and Jl is a number of silhouettes contained in training samples for the action ψl. p(ωi) and p(ωi|ψl) can be obtained through marginalization of p(ωi|ωj) and p(ωi|ωj,ψl) respectively.
- Now referring to
FIG. 8 , another example depiction of thelearner component 506 is illustrated. In this example depiction, thelearner component 506 can extend an existing system to recognize a new action. For instance, Γ={Ω,G,Λ,Ψ} can be a system that has been trained to recognize L actions of an animate object. It may be desirable to add a new action ψL+1 to the system Γ. In an example, the new action ψL+1 may have K training sequences of silhouettes, {yt k}t=1:Tk k=1:K, where Tk is a number of frames in the kth training sequence. When a new action is included in the system, both the action graph and posture models may desirably be updated. Thelearner component 506 can limit the addition of the new action to insertion of new postures that describe ψL+1 into the action graph, modification of A, and insertion of AL+1. - In general, two cases can be considered: 1) Ω includes all postures that are required to describe the action ψL+1. In this case, postures can be shared and new paths can be inserted into the action graph by updating A and AL+1. 2) Ω does not include all postures that are required to describe the action ψL+1. In this instance, new postures can be created for ψL+1 and the action graph can be expanded by updating A and AL+1.
- Thus, it can be discerned that it may be desirable to determine whether new postures are desired and how to create any new postures. An example approach is to locate salient postures for a new action first and thereafter to decide whether such postures have already been learned by the
learning component 506 by comparing the located postures to those residing in the existing action graph. - To that end, the
learning component 506 can include theclusterer component 706 that clusters silhouettes pertaining to the new action into m postures Ω′={ω′1,ω′2, . . . ,ω′m}, whose prototypes are Λ′={p′(x|ω′1),p′(x|ω′2), . . . ,p′(x|ω′m)}. Theclusterer component 706 can operate as described above to generate clusters of silhouettes (or other training samples). - The
learner component 506 can also include acomparator component 802 that compares each new posture ω′i,i=1 . . . , m with each existing posture in Ω. If thecomparator component 802 determines that ω′i is similar to a posture existent in Ω, then the comparator component can discard ω′i. If thecomparator component 802 determines that ω′i is not similar to a posture existent in Ω, then thecomparator component 802 can cause ω′i to be retained in Ω′. - Similarity can be determined in any suitable manner. In an example, since postures can be modeled by a single Gaussian for motion and a GMM for shape, similarity between two postures can be measured by Kullback-Leibler (KL) divergence. KL divergence for motion between postures s and s′ can be defined as follows:
-
- where KL(p∥p′) represents a KL-divergence between distribution p and p′; D(N∥N′) is the KL-divergence between two Gaussians, N and N′.
- KL-divergence for shape between postures s and s′ can be defined as follows:
-
- In an example, s′ may be deemed similar to s if the following condition is met:
-
(KL mt −KL mt)<αsp*σKLmt or (KL sp −KL sp)<αmt*σKLsp , - where K
L mt, σKLmt , KL sp, and σKLsp are the mean and standard deviation of the KL-divergences of all pairs of postures in the system Γ before updating, αsp∈(0,1] and αmt∈(0,1] are constants. - The
learner component 506 can further include aunion component 804 that merges Ω and Ω′. More particularly, the union component can create Ωnew as the union of Ω and Ω′, such that Anew is the posture models (learned postures) of Ωnew. - The
learner component 506 additionally includes asystem estimator component 806 that estimates the transitional probabilities AL+1 and A′ from the K training samples for ψL+1 based at least in part upon Λnew. Thesystem estimator component 806 can update A as follows: -
A new =A+β*A′, (21) - where β∈(0, 1) is a weighting factor controlling the contribution of new action samples to the global transition. Since the number of training samples K may be relatively small compared to a number of samples used to train A, A′ is often much less reliable than A. Accordingly, the
system estimator component 806 can limit the contribution of A′ to final global transitional probabilities by the factor β, which can be selected to reflect the ratio of the size of the new training samples to the size of the samples used to estimate A. - In some instances, training samples for a new action may only capture a relatively small proportion of possible posture transitions that correspond to the new action. Accordingly, AL+1 may not be a reliable estimation of a true transition. The
system estimator component 806 may employ a smoothing technique to facilitate compensating for a relatively small number of samples. In an example, the following linear model may be used to smooth AL+1: -
- where si,sj∈Ωnew and p(sj,si,ψL+1) is a joint probability of a frame being in posture sj followed by another frame being in posture si. Equation (22) can be interpreted as an interpolation of bi-gram and uni-gram transitional probabilities. For unseen events, the transitional probability can be set to be the uni-gram probability of the second posture of the bi-gram. If the
estimator component 806 provides too much weight to the uni-gram probability, faulty estimation may result if si is very frequent. Therefore, theestimator component 806 may decrease the value of the weight exponentially with a number of bi-gram observations. - The
learner component 506 can also include thelinker component 710, which can link learned postures (posture models) with transitional probabilities as described above. - With reference now to
FIGS. 9-13 , various example methodologies are illustrated and described. While the methodologies are described as being a series of acts that are performed in a sequence, it is to be understood that the methodologies are not limited by the order of the sequence. For instance, some acts may occur in a different order than what is described herein. In addition, an act may occur concurrently with another act. Furthermore, in some instances, not all acts may be required to implement a methodology described herein. - Moreover, the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like. Still further, results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like.
- Referring now to
FIG. 9 , anexample methodology 900 for recognizing an action being undertaken by an animate object in a video is illustrated. Themethodology 900 starts at 902, and at 904 video data is received, wherein the video data can include a plurality of video frames that comprise images of an animate object. - At 906, a data store is accessed that comprises data representable by an action graph. For instance, the action graph can include a plurality of nodes that are representative of a plurality of postures of animate objects. These nodes can be referred to as posture models. Further, at least one node of the action graph can correspond to multiple actions that are determinable through use of the action graph. For instance, a posture may be common between actions of walking and running, and a node can represent such posture with respect to both actions in the action graph.
- At 908, an action undertaken by the animate object in the plurality of video frames is determined based at least in part upon the action graph. For instance, determining the action may include extracting features from video data received at 904. Further, determining the action may comprise determining at least one posture of the animate object in the video data based at least in part upon the extracted features. The
methodology 900 then completes at 910. - Turning now to
FIG. 10 , anexample methodology 1000 for determining an action undertaken by a human being captured in video data is illustrated. Themethodology 1000 starts at 1002, and at 1004 a sequence of silhouettes are received. For instance, video data can be sampled and silhouettes of human beings can be generated. - At 1006, postures corresponding to the silhouettes can be recognized. For instance, contours of the human being in the video frame can be generated, and such contours can be compared with postures in the action graph.
- At 1008, a most-likely path in the action graph that corresponds to the recognized postures is determined. For instance, the action graph can have corresponding transitional probabilities (global and local), and a most likely path can be determined based at least in part upon the recognized postured and the transitional probabilities.
- At 1010, a most-likely action that corresponds to the determined path can be determined. For instance, a particular probability that a sequence of silhouettes in the video frame corresponds to a particular action can be determined, and if the probability is above a threshold the most-likely action can be output as a determined action. The
methodology 1000 then completes at 1012. - Referring now to
FIG. 11 , amethodology 1100 for learning an action graph is illustrated. Themethodology 1100 starts at 1102, and at 1104 training data is received. Pursuant to an example, the training data may be a plurality of sequences of video data that includes an animate object moving in accordance with a particular action. For instance, the plurality of sequences of video data may be silhouettes, although other training data is also contemplated. - At 1106, shape and motion dissimilarities are determined from images of animate objects in the training data. At 1108, postures are clustered based at least in part upon the determined shape and motion dissimilarities.
- At 1110, transitional probabilities are estimated between clusters. Estimation of transitional probabilities has been described in detail above. At 1112 clusters (e.g., posture models) are linked. The
methodology 1100 completes at 1114. - With reference now to
FIG. 12 , anexample methodology 1200 that facilitates adding an action to an existing action graph is illustrated. Themethodology 1200 starts at 1202, and at 1204 a video of an animate object undertaking a new action is received, wherein “new action” refers to an action that has not yet been supported in an underlying action graph. At 1206, postures that describe the new action are determined, and at 1208 the determined actions are compared with postures existent in the underlying action graph. At 1210 determined postures that are found to be similar to postures existent in the underlying action graph are removed from the action graph. Themethodology 1200 then completes at 1212. - Now referring to
FIG. 13 , anexample methodology 1300 that facilitates outputting a most-likely determined action is illustrated. Themethodology 1300 starts at 1302, and at 1304 a plurality of video frames are received. For instance, the plurality of video frames can include a sequence of silhouettes of a human being. At 1306, a plurality of postures of the human being in the sequence of silhouettes are determined. - At 1308, the plurality of postures are compared with postures represented in an action graph. Pursuant to an example, the action graph can include multiple postures pertaining to numerous actions. Further, a first posture in the action graph may be linked to a second posture in the action graph by a probability that for a first action the human being will transition from the first posture to the second posture. In yet another example, at least one posture in the action graph can correspond to more than one action.
- At 1310, a most likely action that corresponds to the determined plurality of postures can be determined based at least in part upon the comparison. At 1312, the most likely action can be output as a determined action. The
methodology 1300 completes at 1314. - Referring briefly to
FIG. 14 , an example depiction of asilhouette 1400 of a human being is illustrated. The silhouette may be received in connection with determining an action undertaken by a human that is represented by the silhouette. In another example, the silhouette may be used as training data. - Turning to
FIG. 15 , anexample resampled image 1500 is depicted, wherein the resampled image includes multiple points that represent a contour. Such a contour may be used in connection with determining shape dissimilarity between the contour and another contour, for example. - With reference to
FIG. 16 , an example orientedimage 1600 is depicted, wherein an ellipse is fit over the contour and a center of gravity is discerned. Such information can be used in connection with determining dissimilarity of motion between contours. - Now referring to
FIG. 17 , a high-level illustration of anexample computing device 1700 that can be used in accordance with the systems and methodologies disclosed herein is illustrated. For instance, thecomputing device 1700 may be used in a system that can be used to determine an action undertaken by an animate object in video data and/or used to learn a system that can be used to automatically determine actions in video data. Thecomputing device 1700 includes at least oneprocessor 1702 that executes instructions that are stored in amemory 1704. The instructions may be, for instance, instructions for implementing functionality described as being carried out by one or more components discussed above or instructions for implementing one or more of the methods described above. Theprocessor 1702 may access the memory by way of asystem bus 1706. In addition to storing executable instructions, thememory 1204 may also store images, one or more action graphs, etc. - The
computing device 1700 additionally includes adata store 1708 that is accessible by theprocessor 1702 by way of thesystem bus 1706. Thedata store 1708 may include executable instructions, silhouettes, training data, etc. Thecomputing device 1700 also includes aninput interface 1710 that allows external devices to communicate with thecomputing device 1700. For instance, theinput interface 1710 may be used to receive instructions from an external computer device, receive video data from a video source, etc. Thecomputing device 1700 also includes anoutput interface 1712 that interfaces thecomputing device 1700 with one or more external devices. For example, thecomputing device 1700 may transmit data to a personal computer by way of theoutput interface 1712. - Additionally, while illustrated as a single system, it is to be understood that the
computing device 1700 may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by thecomputing device 1700. - While the systems and methods discussed above have been described in connection with determining actions undertaken by an animate object, it is to be understood that concepts described herein may be extended to other domains. For instance, the systems and methods discussed above may be used in connection with voice detection, where nodes of the action graph can represent particular pitches of the human voice, and wherein transitional probabilities can be probabilities pertaining to changes in pitch for particular words or phrases. In another example, postures of a human mouth may be determined and used to recognize words spoken from the human mouth.
- As used herein, the terms “component” and “system” are intended to encompass hardware, software, or a combination of hardware and software. Thus, for example, a system or component may be a process, a process executing on a processor, or a processor. Additionally, a component or system may be localized on a single device or distributed across several devices.
- It is noted that several examples have been provided for purposes of explanation. These examples are not to be construed as limiting the hereto-appended claims. Additionally, it may be recognized that the examples provided herein may be permutated while still falling under the scope of the claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/183,078 US8396247B2 (en) | 2008-07-31 | 2008-07-31 | Recognizing actions of animate objects in video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/183,078 US8396247B2 (en) | 2008-07-31 | 2008-07-31 | Recognizing actions of animate objects in video |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100027835A1 true US20100027835A1 (en) | 2010-02-04 |
US8396247B2 US8396247B2 (en) | 2013-03-12 |
Family
ID=41608401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/183,078 Expired - Fee Related US8396247B2 (en) | 2008-07-31 | 2008-07-31 | Recognizing actions of animate objects in video |
Country Status (1)
Country | Link |
---|---|
US (1) | US8396247B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2899706B1 (en) * | 2014-01-28 | 2016-12-07 | Politechnika Poznanska | Method and system for analyzing human behavior in an intelligent surveillance system |
US20170213436A1 (en) * | 2016-01-26 | 2017-07-27 | Flir Systems, Inc. | Systems and methods for behavioral based alarms |
US10186257B1 (en) * | 2014-04-24 | 2019-01-22 | Nvoq Incorporated | Language model for speech recognition to account for types of disfluency |
CN111265218A (en) * | 2018-12-05 | 2020-06-12 | 阿里巴巴集团控股有限公司 | Motion attitude data processing method and device and electronic equipment |
CN111680651A (en) * | 2020-06-12 | 2020-09-18 | 武汉星巡智能科技有限公司 | Non-contact vital sign detection method, device, storage medium and system |
US11514722B2 (en) * | 2020-11-12 | 2022-11-29 | Disney Enterprises, Inc. | Real time kinematic analyses of body motion |
US11790652B2 (en) | 2020-10-29 | 2023-10-17 | Disney Enterprises, Inc. | Detection of contacts among event participants |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130243077A1 (en) * | 2012-03-13 | 2013-09-19 | Canon Kabushiki Kaisha | Method and apparatus for processing moving image information, and method and apparatus for identifying moving image pattern |
US9436890B2 (en) | 2014-01-23 | 2016-09-06 | Samsung Electronics Co., Ltd. | Method of generating feature vector, generating histogram, and learning classifier for recognition of behavior |
US10599919B2 (en) * | 2015-12-31 | 2020-03-24 | Microsoft Technology Licensing, Llc | Detection of hand gestures using gesture language discrete values |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5663514A (en) * | 1995-05-02 | 1997-09-02 | Yamaha Corporation | Apparatus and method for controlling performance dynamics and tempo in response to player's gesture |
US5886788A (en) * | 1996-02-09 | 1999-03-23 | Sony Corporation | Apparatus and method for detecting a posture |
US6249606B1 (en) * | 1998-02-19 | 2001-06-19 | Mindmaker, Inc. | Method and system for gesture category recognition and training using a feature vector |
US20040120581A1 (en) * | 2002-08-27 | 2004-06-24 | Ozer I. Burak | Method and apparatus for automated video activity analysis |
US20040131254A1 (en) * | 2000-11-24 | 2004-07-08 | Yiqing Liang | System and method for object identification and behavior characterization using video analysis |
US20040151366A1 (en) * | 2003-02-04 | 2004-08-05 | Nefian Ara V. | Gesture detection from digital video images |
US20050084141A1 (en) * | 2003-08-29 | 2005-04-21 | Fuji Xerox Co., Ltd. | Action recognition apparatus and apparatus for recognizing attitude of object |
US20060062478A1 (en) * | 2004-08-16 | 2006-03-23 | Grandeye, Ltd., | Region-sensitive compression of digital video |
US20060093188A1 (en) * | 2002-02-22 | 2006-05-04 | Microsoft Corporation | Probabilistic exemplar-based pattern tracking |
US7088846B2 (en) * | 2003-11-17 | 2006-08-08 | Vidient Systems, Inc. | Video surveillance system that detects predefined behaviors based on predetermined patterns of movement through zones |
-
2008
- 2008-07-31 US US12/183,078 patent/US8396247B2/en not_active Expired - Fee Related
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5663514A (en) * | 1995-05-02 | 1997-09-02 | Yamaha Corporation | Apparatus and method for controlling performance dynamics and tempo in response to player's gesture |
US5886788A (en) * | 1996-02-09 | 1999-03-23 | Sony Corporation | Apparatus and method for detecting a posture |
US6249606B1 (en) * | 1998-02-19 | 2001-06-19 | Mindmaker, Inc. | Method and system for gesture category recognition and training using a feature vector |
US20040131254A1 (en) * | 2000-11-24 | 2004-07-08 | Yiqing Liang | System and method for object identification and behavior characterization using video analysis |
US20060093188A1 (en) * | 2002-02-22 | 2006-05-04 | Microsoft Corporation | Probabilistic exemplar-based pattern tracking |
US20040120581A1 (en) * | 2002-08-27 | 2004-06-24 | Ozer I. Burak | Method and apparatus for automated video activity analysis |
US7200266B2 (en) * | 2002-08-27 | 2007-04-03 | Princeton University | Method and apparatus for automated video activity analysis |
US20040151366A1 (en) * | 2003-02-04 | 2004-08-05 | Nefian Ara V. | Gesture detection from digital video images |
US20050084141A1 (en) * | 2003-08-29 | 2005-04-21 | Fuji Xerox Co., Ltd. | Action recognition apparatus and apparatus for recognizing attitude of object |
US7088846B2 (en) * | 2003-11-17 | 2006-08-08 | Vidient Systems, Inc. | Video surveillance system that detects predefined behaviors based on predetermined patterns of movement through zones |
US20060062478A1 (en) * | 2004-08-16 | 2006-03-23 | Grandeye, Ltd., | Region-sensitive compression of digital video |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2899706B1 (en) * | 2014-01-28 | 2016-12-07 | Politechnika Poznanska | Method and system for analyzing human behavior in an intelligent surveillance system |
US10186257B1 (en) * | 2014-04-24 | 2019-01-22 | Nvoq Incorporated | Language model for speech recognition to account for types of disfluency |
US20170213436A1 (en) * | 2016-01-26 | 2017-07-27 | Flir Systems, Inc. | Systems and methods for behavioral based alarms |
US10140832B2 (en) * | 2016-01-26 | 2018-11-27 | Flir Systems, Inc. | Systems and methods for behavioral based alarms |
CN111265218A (en) * | 2018-12-05 | 2020-06-12 | 阿里巴巴集团控股有限公司 | Motion attitude data processing method and device and electronic equipment |
CN111680651A (en) * | 2020-06-12 | 2020-09-18 | 武汉星巡智能科技有限公司 | Non-contact vital sign detection method, device, storage medium and system |
US11790652B2 (en) | 2020-10-29 | 2023-10-17 | Disney Enterprises, Inc. | Detection of contacts among event participants |
US11514722B2 (en) * | 2020-11-12 | 2022-11-29 | Disney Enterprises, Inc. | Real time kinematic analyses of body motion |
US20230069401A1 (en) * | 2020-11-12 | 2023-03-02 | Disney Enterprises, Inc. | Real Time Kinematic Analyses of Body Motion |
US11741756B2 (en) * | 2020-11-12 | 2023-08-29 | Disney Enterprises, Inc. | Real time kinematic analyses of body motion |
Also Published As
Publication number | Publication date |
---|---|
US8396247B2 (en) | 2013-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8396247B2 (en) | Recognizing actions of animate objects in video | |
US11854240B2 (en) | Vision based target tracking that distinguishes facial feature targets | |
US10860863B2 (en) | Vision based target tracking using tracklets | |
US8917907B2 (en) | Continuous linear dynamic systems | |
US10210391B1 (en) | Method and system for detecting actions in videos using contour sequences | |
JP6625220B2 (en) | Method and system for detecting the action of an object in a scene | |
US10402655B2 (en) | System and method for visual event description and event analysis | |
US8050453B2 (en) | Robust object tracking system | |
US11810435B2 (en) | System and method for audio event detection in surveillance systems | |
US11526698B2 (en) | Unified referring video object segmentation network | |
US9798923B2 (en) | System and method for tracking and recognizing people | |
US20090312985A1 (en) | Multiple hypothesis tracking | |
EP2905724A2 (en) | Object detection system and method | |
US20050036676A1 (en) | Systems and methods for training component-based object identification systems | |
Liu et al. | Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction | |
Afsar et al. | Automatic human action recognition from video using hidden markov model | |
Alp et al. | Action recognition using MHI based Hu moments with HMMs | |
US9014420B2 (en) | Adaptive action detection | |
US20100315506A1 (en) | Action detection in video through sub-volume mutual information maximization | |
US20210248470A1 (en) | Many or one detection classification systems and methods | |
US11809988B2 (en) | Artificial intelligence system for classification of data based on contrastive learning | |
CN113269038A (en) | Multi-scale-based pedestrian detection method | |
AT&T | LipActs: Efficient Representations for Visual Speakers | |
Mestri et al. | Analysis of feature extraction and classification models for lip-reading | |
Yang et al. | Hierarchical sliding slice regression for vehicle viewing angle estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION,WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, ZHENGYOU;LI, WANQING;LIU, ZICHENG;SIGNING DATES FROM 20080728 TO 20080730;REEL/FRAME:021438/0117 Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, ZHENGYOU;LI, WANQING;LIU, ZICHENG;SIGNING DATES FROM 20080728 TO 20080730;REEL/FRAME:021438/0117 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034564/0001 Effective date: 20141014 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210312 |