US20120129605A1 - Method and device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system - Google Patents

Method and device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system Download PDF

Info

Publication number
US20120129605A1
US20120129605A1 US13/300,509 US201113300509A US2012129605A1 US 20120129605 A1 US20120129605 A1 US 20120129605A1 US 201113300509 A US201113300509 A US 201113300509A US 2012129605 A1 US2012129605 A1 US 2012129605A1
Authority
US
United States
Prior art keywords
interest
image
points
region
movement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/300,509
Inventor
Nicolas Livet
Thomas Pasquier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Total Immersion
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Total Immersion filed Critical Total Immersion
Assigned to TOTAL IMMERSION reassignment TOTAL IMMERSION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Pasquier, Thomas, LIVET, NICOLAS
Publication of US20120129605A1 publication Critical patent/US20120129605A1/en
Assigned to QUALCOMM CONNECTED EXPERIENCES, INC. reassignment QUALCOMM CONNECTED EXPERIENCES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TOTAL IMMERSION, SA
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: QUALCOMM CONNECTED EXPERIENCES, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • G06F3/005Input arrangements through a video camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention concerns the detection of objects by the analysis of images, and their tracking, in a video stream representing a sequence of images and more particularly a method and a device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system.
  • Augmented reality in particular seeks to insert one or more virtual objects in images of a video stream representing a sequence of images.
  • the position and orientation of those virtual objects may be determined by data that are external to the scene represented by the images, for example coordinates obtained directly from a game scenario, or by data linked to certain elements of that scene, for example coordinates of a particular point in the scene such as the hand of a player.
  • data linked to certain elements of that scene for example coordinates of a particular point in the scene such as the hand of a player.
  • the operations of tracking elements and embedding virtual objects in the real images may be executed by different computers or by the same computer.
  • OCETRE ODETRE
  • HLONICS HOLONICS
  • An example of such approaches is in particular described in the document entitled “ Holographic and action capture techniques ”, T. Rodriguez, A. Cabo de Leon, B. Uzzan, N. Livet, E. Boyer, F. Geffray, T. Balogh, Z. Megyesi and A. Barsi, August 2007, SIGGRAPH '07, ACM SIGGRAPH 2007, Emerging Technologies. It is to be noted that these applications may enable the geometry of the real scene to be reproduced but do not currently enable precise movements to be identified. Furthermore, to meet real time constraints, it is necessary to set up complex and costly hardware architectures.
  • an image is typically captured from a webcam type video camera connected to a computer or to a console.
  • this image is generally analyzed by an object tracking algorithm, also referred to as blobs tracking, to compute in real time the contours of certain elements of the user who is moving in the image by using, in particular, an optical flow algorithm.
  • object tracking algorithm also referred to as blobs tracking
  • the position of those shapes in the image enables certain parts of the displayed image to be modified or deformed. This solution thus enables the disturbance in a zone of the image to be located in two degrees of freedom.
  • the invention enables at least one of the problems set forth above to be solved.
  • the number of degrees of freedom defining the movements of each tracked object may be set for each region of interest.
  • said step of determining a movement comprises a step of determining and matching at least one pair of points of interest in said at least one first and second images, at least one point of said at least one pair of points of interest belonging to said mask of interest.
  • Said transformation preferably implements a weighting function based on a distance between two points of interest from the same pairs of points of interest of said plurality of pairs of points of interest in order to improve the estimation of the movement of the tracked object.
  • Said step of comparing said at least one first and second regions of interest comprises a step of performing subtraction, point by point, of values of corresponding points of said at least one first and second regions of interest and a step of comparing a result of said subtraction to a predetermined threshold.
  • the method further comprises a step of estimating at least one modified second region of interest in said at least one second image, said at least one modified second region of interest of said at least one second image being estimated according to said at least one first region of interest of said at least one first image and of said at least one second region of interest of said at least one second image.
  • the method according to the invention thus makes it possible to anticipate the processing of the following image for the object tracking.
  • Said estimation of said at least one modified second region of interest of said at least one second image for example implements an object tracking algorithm of KLT type.
  • a scale factor may, for example, characterize a mouse click.
  • the movements of at least two objects situated in the field of said image sensor are determined, whether or not said predetermined action is triggered being determined according to a combination of the movements associated with said at least two objects. It is thus possible to determine a movement of an object on the basis of movements of other objects, in particular other objects subjected to constraints of relative position.
  • the invention is also directed to a computer program comprising instructions adapted to the implementation of each of the steps of the method described earlier when said program is executed on a computer as well as a device comprising means adapted to the implementation of each of the steps of the method described earlier.
  • a computer program comprising instructions adapted to the implementation of each of the steps of the method described earlier when said program is executed on a computer as well as a device comprising means adapted to the implementation of each of the steps of the method described earlier.
  • FIG. 2 comprising FIGS. 2 a to 2 d , illustrates examples of variation in a region of interest of an image with the corresponding region of interest of a following image;
  • FIG. 3 is a diagrammatic illustration of the determination of a movement of an object of which at least one part is represented in a region and in a mask of interest of two consecutive images;
  • FIG. 4 is a diagrammatic illustration of certain steps implemented in accordance with the invention to identify, in continuous operation, variations in position of objects between two consecutive (or close) images of a sequence of images;
  • FIG. 5 illustrates certain aspects of the invention when four parameters characterize a movement of an object tracked in consecutive (or close) images of a sequence of images;
  • FIG. 6 illustrates an example of implementation of the invention in the context of a driving simulation game in which two regions of interest enable the tracking of a user's hands in real time, characterizing a vehicle steering wheel movement, in a sequence of images;
  • the regions of interest are, preferably, defined as two-dimensional shapes, in an image. These shapes are, for example, rectangles or circles. They are preferably constant and predetermined.
  • the regions of interest may be characterized by points of interest, that is to say singular points, such as points having a high luminance gradient, and the initial position of the regions of interest may be predetermined, be determined by a user, by an event such as the appearance of a shape or a color or according to predefined features, for example using key images. These regions may also be moved depending on the movement of tracked objects or have a fixed position and orientation in the image.
  • the use of several regions of interest makes it possible, for example, to observe several concomitant interactions of a user (a region of interest may correspond to each of his hands) and/or several concomitant interactions of several users.
  • the points of interest are used in order to find the variation of the regions of interest, in a stream of images, from one image to a following (or close) image, according to techniques of tracking points of interest based, for example, on algorithms known under the name of FAST, for the detection, and KLT (initials of Kanade, Lucas and Tomasi), for tracking in the following image.
  • the points of interest of a region of interest may vary over the images analyzed, in particular according to the distortion of the objects tracked and their movements which may mask parts of the scene represented in the images and/or make parts of those objects leave the zones of interest.
  • the objects whose movements may create an interaction are tracked in each region of interest according to a mechanism for tracking points of interest in masks defined in the regions of interest.
  • FIG. 1 comprising FIGS. 1 a and 1 b , illustrates two successive images of a stream of images that may be used to determine the movement of objects and the interaction of a user.
  • Image 100 - 2 of FIG. 1 b represents an image following the image 100 - 1 of FIG. 1 a in a sequence of images. It is possible to define, in the image 100 - 2 , a region of interest 105 - 2 , corresponding to the position and to the dimensions of the region of interest 105 - 1 defined in the preceding image, in which disturbances may be estimated.
  • the region of interest 105 - 1 is thus compared to the region of interest 105 - 2 of FIG. 1 b , for example by subtracting those image parts one from another, pixel by pixel (pixel being an acronym for PICture ELement), in order to extract therefrom a map of pixels that are considered to be in movement. These pixels in movement constitute a mask of pixels of interest (presented in FIG. 2 ).
  • Points of interest may be determined in the image 100 - 1 , in particular in the region of interest 105 - 1 , according to standard algorithms for image analysis. These points of interest may be advantageously detected at positions in the region of interest which belong to the mask of pixels of interest.
  • the points of interest 110 defined in the region of interest 105 - 1 are tracked in the image 100 - 2 , preferably in the region of interest 105 - 2 , for example using the KLT tracking principles by comparing portions of the images 100 - 1 and 100 - 2 that are associated with the neighborhoods of the points of interest.
  • the determination of points of interest in an image is, preferably, limited to the zone corresponding to the corresponding region of interest as located on the current image or to a zone comprising all or part thereof when a mask of interest of pixels in movement is defined in that region of interest.
  • estimation is made of information characterizing the relative positions and orientations of the objects to track (for example the hand referenced 120 - 1 in FIG. 1 a ) in relation to a reference linked to the video camera from which the images come.
  • information is, for example two-dimensional position information (x, y), orientation information ( ⁇ ) and information on distance to the video camera, that is to say scale(s) of the objects to track.
  • FIG. 2 c illustrates the image 215 resulting from the comparison of the regions of interest 200 - 1 and 200 - 2 .
  • the black part forming a mask of interest, represents the pixels whose difference is greater than a predetermined threshold whereas the white part represents the pixels whose difference is less than that threshold.
  • the black part comprises in particular the part referenced 220 corresponding to the difference in position of the hand 205 between the regions of interest 200 - 1 and 200 - 2 . It also comprises the part 225 corresponding to the difference in position of the object 210 between those regions of interest.
  • the part 230 corresponds to the part of the hand 205 present in both these regions of interest.
  • a skeletonizing step making it possible in particular to eliminate adjoining movements such as the movement referenced 225 is, preferably, carried out before analyzing the movement of the points of interest belonging to the mask of interest.
  • This skeletonizing step may take the form of a morphological processing operation such as for example operations of opening or closing applied to the mask of interest.
  • the mask of interest obtained is modified in order to eliminate the parts situated around the points of interest identified recursively between the image from which is extracted the region of interest 200 - 1 and the image preceding it.
  • FIG. 2 d thus illustrates the mask of interest represented in FIG. 2 c , here referenced 235 , to which the parts 240 situated around the points of interest identified by 245 have been eliminated.
  • the parts 240 are, for example, circular. They are of predetermined radius here.
  • the mask of interest 235 thus has cropped from it zones in which are situated already detected points of interest and where it is thus not necessary to detect new ones.
  • this modified mask of interest 235 has just excluded a part of the mask of interest 220 in order to avoid the accumulation of points of interest in the same zone of the region of interest.
  • the mask of interest 235 may be used to identify points of interest whose movements may be analyzed in order to trigger, the case arising, a particular action.
  • FIG. 3 is again a diagrammatic illustration of the determination of a movement of an object of which at least one part is represented in a region and a mask of interest of two consecutive (or close) images.
  • the image 300 here corresponds to the mask of interest resulting from the comparison of the regions of interest 200 - 1 and 200 - 2 as described with reference to FIG. 2 d .
  • a skeletonizing step has been carried out to eliminate the disturbances (in particular the disturbance 225 ).
  • the image 300 comprises a mask 305 which may be used for identifying new points of interest whose movements characterize the movement of objects in that region of interest.
  • Reference 310 - 1 designates this point of interest according to its position in the region of interest 200 - 1
  • reference 310 - 2 designates that point of interest according to its position in the region of interest 200 - 2 .
  • FIG. 4 is a diagrammatic illustration of certain steps implemented in accordance with the invention to identify, in continuous operation, variations in arrangement of objects between two consecutive (or close) images of a sequence of images.
  • the images here are acquired via an image sensor such as a video camera, in particular a video camera of webcam type, connected to a computer system implementing the method described here.
  • an image sensor such as a video camera, in particular a video camera of webcam type
  • a first step of initializing is executed.
  • An object of this step is in particular to define features of at least one region of interest, for example a shape, a size and an initial position.
  • a region of interest not to be defined in an initial state, the system being on standby for a triggering event, for example a particular movement of the user facing the video camera (the moving pixels in the image are analyzed in search for a particular movement), the location of a particular color such as the color of skin or the recognition of a particular predetermined object whose position defines that of the region of interest.
  • a triggering event for example a particular movement of the user facing the video camera (the moving pixels in the image are analyzed in search for a particular movement), the location of a particular color such as the color of skin or the recognition of a particular predetermined object whose position defines that of the region of interest.
  • the size and the shape of the region of interest may be predefined or be determined according to features of the detected event.
  • the initializing step 410 may thus take several forms depending on the object to track in the image sequence and depending on the application implemented.
  • the initial position of the region of interest is predetermined (off-line determination) and the tracking algorithm is on standby for a disturbance.
  • step 415 a region of interest whose features have been determined beforehand (on initialization or in the preceding image) is positioned in the current image to extract the corresponding image part. If the current image is the first image of the video stream to be processed, that image becomes the preceding image, a new image current is acquired and step 415 is repeated.
  • step 460 is only carried out if there are validated points of interest. As indicated earlier, this step consists in eliminating zones from the mask created, for example disks of a predetermined diameter, around points of interest validated beforehand.
  • Points of interest are then searched for in the region of the preceding image corresponding to the mask of interest so defined (step 435 ), the mask of interest here being the mask of interest created at step 430 or the mask of interest created at step 430 and modified during step 460 .
  • the search for points of interest is, for example, limited to the detection of twenty points of interest. Naturally, this number may be different and may be estimated according to the size of the mask of interest.
  • This search is advantageously carried out with the algorithm known by the name FAST.
  • FAST the algorithm known by the name FAST.
  • a Bresenham circle for example with a perimeter of 16 pixels, is constructed around each pixel of the image. If k contiguous pixels (k typically having a value of 9, 10, 11 or 12) contained in that circle all have either greater intensity than the central pixel, or all have lower intensity than the central pixel, that central pixel is considered as a point of interest. It is also possible to identify points of interest with an approach based on image gradients as provided in the approach known by the name of Harris points detection.
  • the points of interest detected in the preceding image according to the mask of interest as well as, where applicable, the points of interest detected and validated beforehand are used to identify the corresponding points of interest in the current image.
  • a search for corresponding points of interest in the current image is thus carried out (step 440 ), preferably using a method known under the name of optical flow.
  • This technique gives better robustness when the image is blurred, in particular thanks to the use of pyramids of images smoothed by a Gaussian filter. This is for example the approach implemented by Lucas, Kanade and Tomasi in the algorithm known under the name KLT.
  • movement parameters are estimated for objects tracked in the region of interest of the preceding image relative to the region of interest of the current image (step 445 ).
  • Such parameters also termed degrees of freedom, comprise, for example, a parameter of translation along the x-axis, a parameter of translation along the y-axis, a rotation parameter and/or a scale parameter, the transformation making a set of bi-directional points pass from one plane to another, grouping together these four parameters, being termed the similarity.
  • NLSE Nonlinear Least Squares Error
  • Gauss-Newton method This method is directed to minimizing a re-projection error over the set of the tracked points of interest.
  • NLSE Nonlinear Least Squares Error
  • Gauss-Newton method Gauss-Newton method
  • a threshold typically expressed in pixels and having a predetermined value, is advantageously used to authorize a certain margin of error between the theoretical position of the point in the current image (obtained by applying the parameters of step 445 ) and its real position (obtained by the tracking method of step 440 ).
  • the valid points of interest here referenced 455 , are considered as belonging to an object whose movement is tracked whereas the non-valid points (also termed outliers), are considered as belonging to the image background or to portions of an object which are not visible in the image.
  • the valid points of interest are tracked in the following image and are used to modify the mask of interest created by comparison of a region of interest of the current image with the corresponding region of interest of the following image (step 460 ) in order to exclude from the portions of mask, pixels in movement between the current and following images as described with reference to FIG. 2 d .
  • This modified mask of interest makes it possible to eliminate portions of images in which points of interest are recursively tracked.
  • the valid points of interest are thus kept for several processing operations on successive images and in particular enable stabilization of the tracking of objects.
  • the new region of interest (or modified region of interest) which is used for processing the current image and the following image is then estimated thanks to the previously estimated degrees of freedom (step 445 ). For example, if the degrees of freedom are x and y translations, the new position of the region of interest is estimated according to the previous position of the region of interest, using those two items of information. If a change (or changes) of scale is estimated and considered in this step, it is possible, according to the scenario considered, also to modify the size of the new region of interest which is used in the current and following images of the video stream.
  • the estimation of a change (or changes) of scale is used for detecting the triggering of an action in similar manner to the click of a mouse.
  • changes of orientation particularly those around the viewing axis of the video camera (referred to as roll) in order, for example, to enable the rotation of a virtual element displayed in a scene or to control a button of “potentiometer” type in order, for example, to adjust a volume of sound of an application.
  • the algorithm preferably returns to the initializing step. Furthermore, loss of tracking leading to the initializing step being re-executed may be identified by measuring the movements of a user. Thus, it may be decided to reinitialize the method when those movements are stable or non-existent for a predetermined period or when a tracked object leaves the field of view of the image sensor.
  • FIG. 5 illustrates more precisely certain aspects of the invention when four parameters characterize a movement of an object tracked in consecutive (or close) images of a sequence of images; These four parameters here are a translation denoted (T x , T y ), a rotation denoted ⁇ around the optical axis of the image sensor and a scale factor denoted s. These four parameters represent a similarity which is the transformation enabling a point M to be transformed from a plane to a point M′.
  • X M′ s ⁇ ( X M ⁇ X O ) ⁇ cos( ⁇ ) ⁇ s ⁇ ( Y M ⁇ Y O ) ⁇ sin( ⁇ )+ T x +X O
  • Y M′ s ⁇ ( X M ⁇ X O ) ⁇ sin( ⁇ )+ s ⁇ ( Y M ⁇ Y O ) ⁇ cos( ⁇ )+ T y +Y O
  • the points M s and M s ⁇ respectively represent the transformation of the point M according to the change in scale s and the change of scale s combined with the rotation ⁇ , respectively.
  • the partial derivatives of each point considered that is to say the movements associated with each of those points, are weighted according to the associated movement.
  • the points of interest moving the most have greater importance in the estimation of the parameters, which avoids the points of interest linked to the background disturbing the tracking of objects.
  • Y O′ Y O +W GC ⁇ ( Y GC ⁇ Y O )+ W T ⁇ T y
  • (X GC , Y GC ) represent the center of gravity of the points of interest in the current image and W GC represents the weight on the influence of the current center of gravity and W T the weight on the influence of the translation.
  • the parameter W GC is positively correlated here with the velocity of movement of the tracked object whereas the parameter W T may be fixed depending on the desired influence of the translation.
  • FIG. 6 comprising FIGS. 6 a , 6 b and 6 c , illustrates an example of implementation of the invention in the context of a driving simulation game in which two regions of interest enable the tracking of a user's hands in real time, characterizing a vehicle steering wheel movement, in a sequence of images.
  • FIG. 6 a is a pictorial presentation of the context of the game
  • FIG. 6 b represents the display of the game as perceived by a user
  • FIG. 6 c illustrates the estimation of the movement parameters, or degrees of freedom, of the tracked objects to deduce therefrom a movement of a vehicle steering wheel.
  • FIG. 6 a comprises an image 600 extracted from the sequence of images provided by the image sensor used. The latter is placed facing the user, as if it were fastened to the windshield of the vehicle driven by the user.
  • This image 600 here contains a zone 605 comprising two circular regions of interest 610 and 615 associated with a steering wheel 620 drawn in overlay by computer graphics.
  • the image 600 also comprises elements of the real scene in which the user is situated.
  • the frame of reference Ow here corresponds to an overall frame of reference (“world” frame of reference)
  • the frame of reference Owh is a local frame of reference linked to the steering wheel 620
  • the frames of reference Oa 1 and Oa 2 are two local frames of reference linked to the regions of interest 610 and 615 .
  • the vectors Va 1 (Xva 1 , Yva 1 ) and Va 2 (Xva 2 , Yva 2 ) are the movement vectors resulting from the analysis of the movement of the user's hands in the regions of interest 610 and 615 , expressed in the frames of reference Oa 1 and Oa 2 , respectively.
  • ⁇ 1 and ⁇ 2 represent the rotation of the user's hands.
  • ⁇ 1 may be computed by the following relationship:
  • ⁇ 1 a tan2( Yva 1 wh, D/ 2)
  • ⁇ 2 may be computed in similar manner.
  • the new diameter D′ of the steering wheel is computed on the basis of its previous diameter D and on the basis of the movement of the user's hands (determined via the two regions of interest 610 and 615 ). It may be computed in the following manner:
  • the game scenario may in particular compute a corresponding computer graphics image.
  • FIG. 7 illustrates an example of a device which may be used to identify the movements of objects represented in images provided by a video camera and to trigger particular actions according to identified movements.
  • the device 700 is for example a mobile telephone of smartphone type, a personal digital assistant, a micro-computer or a workstation.
  • the device 700 preferably comprises a communication bus 702 to which are connected:
  • the device 700 may also have the following items:
  • the communication bus allows communication and interoperability between the different elements included in the device 700 or connected to it.
  • the representation of the bus is non-limiting and, in particular, the central processing unit may communicate instructions to any element of the device 700 directly or by means of another element of the device 700 .
  • the executable code of the programs can be received by the intermediary of the communication network 728 , via the interface 726 , in order to be stored in an identical fashion to that described previously.
  • program or programs may be loaded into one of the storage means of the device 700 before being executed.
  • the central processing unit 704 will control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, these instructions being stored on the hard disk 720 or in the read-only memory 706 or in the other aforementioned storage elements.
  • the program or programs which are stored in a non-volatile memory for example the hard disk 720 or the read only memory 706 , are transferred into the random-access memory 708 , which then contains the executable code of the program or programs according to the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.
  • the communication apparatus comprising the device according to the invention can also be a programmed apparatus.
  • This apparatus then contains the code of the computer program or programs for example fixed in an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit

Abstract

The invention relates in particular to the detection of interactions with a software application according to a movement of an object situated in the field of an image sensor. After having received a first and a second image and having identified a first region of interest in the first image, a second region of interest, corresponding to the first region of interest, is identified in the second image. The first and second regions of interest are compared and a mask of interest characterizing a variation of at least one feature of corresponding points in the first and second regions of interest is determined. A movement of the object is then determined from said mask of interest. The movement is analyzed and, in response, a predetermined action is triggered or not triggered.

Description

  • The present invention concerns the detection of objects by the analysis of images, and their tracking, in a video stream representing a sequence of images and more particularly a method and a device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system.
  • Augmented reality in particular seeks to insert one or more virtual objects in images of a video stream representing a sequence of images. According to the type of application, the position and orientation of those virtual objects may be determined by data that are external to the scene represented by the images, for example coordinates obtained directly from a game scenario, or by data linked to certain elements of that scene, for example coordinates of a particular point in the scene such as the hand of a player. When the nature of the objects present in the real scene has been identified and the position and the orientation have been determined by data linked to certain elements of that scene, it may be necessary to track those elements according to movements of the video camera or movements of those elements themselves in the scene. The operations of tracking elements and embedding virtual objects in the real images may be executed by different computers or by the same computer.
  • Furthermore, in such applications, it may be proposed to users to interact, in the real scene represented, at least partially, by the stream of images, with a computer system in order in particular to trigger particular actions or scenarios which for example enable the interaction with virtual elements superposed on the images.
  • The same applies in numerous other types of applications, for example in video game applications.
  • With these aims, it is necessary to identify particular movements such as hand movements to identify one or more predetermined commands. Such commands are comparable to those initiated by a computer pointing device such as a mouse.
  • The applicant has developed algorithms for visual tracking of textured objects, having varied geometries, not using any marker and whose originality lies in the matching of particular points between a current image of a video stream and a set of key images which are automatically obtained on initializing the system. However, such algorithms, described in French patent applications 0753482, 0752810, 0902764, 0752809 and 0957353, do not enable the detection of movements of objects that are not textured or that have a practically uniform texture such as the hands of a user. Furthermore, they are essentially directed to the tracking of rigid objects.
  • Although solutions are known enabling a user to interact with a computer system, in a scene represented by a sequence of images, those solutions are generally complex to implement.
  • More particularly, a first solution consists in using tactile sensors which are associated, for example, with the joints of a user or actor. Although this approach is often dedicated to movement tracking applications, in particular for cinematographic special effects, is it also possible to track the position and the orientation of an actor and, in particular, of his hands and feet to enable him to interact with a computer system in a virtual scene. However, the use of this technique proves to be costly since it requires the insertion, in the scene represented by the stream of images analyzed, of cumbersome sensors which may furthermore suffer from disturbance linked to their environment (for example electromagnetic interference)
  • Another solution, developed in particular in the European projects “OCETRE” and “HOLONICS” consists in using several image sources, for example several video cameras, to enable real time three dimensional reconstruction of the environment and of the spatial movements of the users. An example of such approaches is in particular described in the document entitled “Holographic and action capture techniques”, T. Rodriguez, A. Cabo de Leon, B. Uzzan, N. Livet, E. Boyer, F. Geffray, T. Balogh, Z. Megyesi and A. Barsi, August 2007, SIGGRAPH '07, ACM SIGGRAPH 2007, Emerging Technologies. It is to be noted that these applications may enable the geometry of the real scene to be reproduced but do not currently enable precise movements to be identified. Furthermore, to meet real time constraints, it is necessary to set up complex and costly hardware architectures.
  • Touch screens are also known for viewing augmented reality scenes which enable interactions of a user with a computer system to be determined. However, these screens are costly and poorly adapted to the applications of augmented reality.
  • As regards the interactions of users in the field of video games, an image is typically captured from a webcam type video camera connected to a computer or to a console. After having been stored in a memory of the system to which the video camera is connected, this image is generally analyzed by an object tracking algorithm, also referred to as blobs tracking, to compute in real time the contours of certain elements of the user who is moving in the image by using, in particular, an optical flow algorithm. The position of those shapes in the image enables certain parts of the displayed image to be modified or deformed. This solution thus enables the disturbance in a zone of the image to be located in two degrees of freedom.
  • However, the limits of this approach are mainly the lack of precision since it is not possible to maintain the proper execution of the process during a displacement of the video camera and the lack of semantics since it is not possible to distinguish the movements between the foreground and the background. Furthermore, this solution uses optical flow image analysis which, in particular, does not provide robustness to changes in lighting or noise.
  • Also known is an approach to real time detection of an interaction between a user and a computer system in an augmented reality scene, based on an image of a sequence of images, the interaction resulting from the modification of the appearance of the representation of an object present in the image. However, this method, described in particular in French patent application No. 0854382, does not enable precise movements of the user to be identified and only applies to sufficiently textured zones of the image.
  • The invention enables at least one of the problems set forth above to be solved.
  • The invention is thus directed to a computer method for detecting interactions with a software application according to a movement of at least one object situated in the field of an image sensor connected to a computer implementing the method, said image sensor providing a stream of images to said computer, the method comprising the following steps,
      • receiving at least one first image from said image sensor;
      • identifying at least one first region of interest in said first image, said at least one first region of interest corresponding to a part of said at least one first image;
      • receiving at least one second image from said image sensor;
      • identifying at least one second region of interest of said at least one second image, said at least one second region of interest corresponding to said at least one first region of interest of said at least one first image;
      • comparing said at least one first and second regions of interest and determining a mask of interest characterizing a variation of at least one feature of corresponding points in said at least one first and second regions of interest;
      • determining a movement of said at least one object from said mask of interest, said at least one object being at least partially represented in at least one of said at least one first and second regions of interest; and
      • analyzing said movement and, in response to said analyzing step, triggering or not triggering a predetermined action.
  • The method according to the invention thus enables objects to be tracked, in particular deformable objects with little texture, in particular for augmented reality applications. Furthermore, the limited quantity of processing enables the method to be implemented in devices having limited resources (in particular in terms of computation) such as mobile platforms. Moreover, the method may be used with an image sensor of low quality.
  • The method according to the invention enables fast movements of objects to be tracked, even in the presence of blur in the images acquired by the image sensor. In addition, the processing according to the method of the invention does not depend on specific color properties of the moving objects, and it is thus possible to track objects such as a hand or a textured object in movement in front of the image sensor used.
  • The number of degrees of freedom defining the movements of each tracked object may be set for each region of interest.
  • It is possible to track several zones of interest simultaneously in particular in order to enable multiple control. Thus, for example, the tracking of two hands enables the number of possible iterations between a user and a software application to be increased.
  • Advantageously, said step of determining a movement comprises a step of determining and matching at least one pair of points of interest in said at least one first and second images, at least one point of said at least one pair of points of interest belonging to said mask of interest. The method according to the invention thus enables the advantages linked to the tracking of points of interest to be combined while limiting the zones where those points are located in order to limit the processing and to concentrate on the tracked object.
  • According to a particular embodiment, said step of determining a movement comprises a step of determining and matching a plurality of pairs of points of interest in said at least one first and second images, at least one point of each of said pairs of points of interest belonging to said mask of interest, said movement being estimated on the basis of a transformation of a first set of points of interest into a second set of points of interest, the points of interest of said first and second sets belonging to said plurality of pairs of points of interest, the points of interest of said first set of points of interest furthermore belonging to said at least one first image and the points of interest of said second set of points of interest furthermore belonging to said at least one second image. The general movement of a part of an object may thus be determined from the movements of a set of points of interest.
  • Said transformation preferably implements a weighting function based on a distance between two points of interest from the same pairs of points of interest of said plurality of pairs of points of interest in order to improve the estimation of the movement of the tracked object.
  • Still according to a particular embodiment, the method further comprises a step of validating at least one point of interest of said at least one first image, belonging to said at least one pair of points of interest, according to said determined movement, said at least one validated point of interest being used to track said object in at least one third image following said at least one second image and said at least one validated point of interest being used for modifying a mask of interest created on the basis of said at least one second and third images. It is thus possible to use points of interest which are the same from image to image if they efficiently contribute to the general movement estimation of the tracked object. Furthermore, the validated points of interest are used to select new points of interest in order to avoid an excessive accumulation of points of interest in a limited region.
  • Said step of comparing said at least one first and second regions of interest comprises a step of performing subtraction, point by point, of values of corresponding points of said at least one first and second regions of interest and a step of comparing a result of said subtraction to a predetermined threshold. Such an embodiment makes it possible to combine the effectiveness of the method and limiting processing resources.
  • According to a particular embodiment, the method further comprises a step of detecting at least one predetermined feature in said at least one first image, said at least one first region of interest being at least partially identified in response to said detecting step. The method according to the invention may thus be automatically initialized or re-initialized according to elements of the content of the processed image. Such a predetermined feature is, for example, a predetermined shape and/or a predetermined color.
  • Advantageously, the method further comprises a step of estimating at least one modified second region of interest in said at least one second image, said at least one modified second region of interest of said at least one second image being estimated according to said at least one first region of interest of said at least one first image and of said at least one second region of interest of said at least one second image. The method according to the invention thus makes it possible to anticipate the processing of the following image for the object tracking. Said estimation of said at least one modified second region of interest of said at least one second image for example implements an object tracking algorithm of KLT type.
  • Said movement may in particular be characterized by a translation, a rotation and/or a scale factor.
  • When said movement is characterized by a scale factor, whether or not said predetermined action is triggered may be determined on the basis of said scale factor. Thus, a scale factor may, for example, characterize a mouse click.
  • According to a particular embodiment, the movements of at least two objects situated in the field of said image sensor are determined, whether or not said predetermined action is triggered being determined according to a combination of the movements associated with said at least two objects. It is thus possible to determine a movement of an object on the basis of movements of other objects, in particular other objects subjected to constraints of relative position.
  • The invention is also directed to a computer program comprising instructions adapted to the implementation of each of the steps of the method described earlier when said program is executed on a computer as well as a device comprising means adapted to the implementation of each of the steps of the method described earlier. The advantages of this computer program and of this method are similar to those referred to earlier.
  • Other advantages, objects and features of the present invention will emerge from the following detailed description, given by way of non-limiting example, relative to the accompanying drawings in which:
  • FIG. 1, comprising FIGS. 1 a and 1 b, illustrates two successive images of a stream of images that may be used to determine the movement of objects and the interaction of a user;
  • FIG. 2, comprising FIGS. 2 a to 2 d, illustrates examples of variation in a region of interest of an image with the corresponding region of interest of a following image;
  • FIG. 3 is a diagrammatic illustration of the determination of a movement of an object of which at least one part is represented in a region and in a mask of interest of two consecutive images;
  • FIG. 4 is a diagrammatic illustration of certain steps implemented in accordance with the invention to identify, in continuous operation, variations in position of objects between two consecutive (or close) images of a sequence of images;
  • FIG. 5 illustrates certain aspects of the invention when four parameters characterize a movement of an object tracked in consecutive (or close) images of a sequence of images;
  • FIG. 6, comprising FIGS. 6 a, 6 b and 6 c, illustrates an example of implementation of the invention in the context of a driving simulation game in which two regions of interest enable the tracking of a user's hands in real time, characterizing a vehicle steering wheel movement, in a sequence of images; and,
  • FIG. 7 illustrates an example of a device adapted to implement the invention.
  • In general terms, the invention concerns the tracking of objects in particular regions of images in a stream of images, those regions, termed regions of interest, comprising a part of the tracked objects and a part of the scene represented in the images. It has been observed that the analysis of regions of interest makes it possible to speed up the processing time and to improve the movement detection of objects.
  • The regions of interest are, preferably, defined as two-dimensional shapes, in an image. These shapes are, for example, rectangles or circles. They are preferably constant and predetermined. The regions of interest may be characterized by points of interest, that is to say singular points, such as points having a high luminance gradient, and the initial position of the regions of interest may be predetermined, be determined by a user, by an event such as the appearance of a shape or a color or according to predefined features, for example using key images. These regions may also be moved depending on the movement of tracked objects or have a fixed position and orientation in the image. The use of several regions of interest makes it possible, for example, to observe several concomitant interactions of a user (a region of interest may correspond to each of his hands) and/or several concomitant interactions of several users.
  • The points of interest are used in order to find the variation of the regions of interest, in a stream of images, from one image to a following (or close) image, according to techniques of tracking points of interest based, for example, on algorithms known under the name of FAST, for the detection, and KLT (initials of Kanade, Lucas and Tomasi), for tracking in the following image. The points of interest of a region of interest may vary over the images analyzed, in particular according to the distortion of the objects tracked and their movements which may mask parts of the scene represented in the images and/or make parts of those objects leave the zones of interest.
  • Furthermore, the objects whose movements may create an interaction are tracked in each region of interest according to a mechanism for tracking points of interest in masks defined in the regions of interest.
  • FIGS. 1 and 2 illustrate the general principle of the invention.
  • FIG. 1, comprising FIGS. 1 a and 1 b, illustrates two successive images of a stream of images that may be used to determine the movement of objects and the interaction of a user.
  • As illustrated in FIG. 1 a, image 100-1 represents a scene having fixed elements (not represented) such as elements of decor and mobile elements here linked to animate characters (real or virtual). The image 100-1 here comprises a region of interest 105-1. As indicated previously, several regions of interest may be processed simultaneously however, in the interest of clarity, a single region of interest is represented here, the processing of the regions of interest being similar for each of them. It is considered that the shape of the region of interest 105-1 as well as its initial position are predetermined.
  • Image 100-2 of FIG. 1 b represents an image following the image 100-1 of FIG. 1 a in a sequence of images. It is possible to define, in the image 100-2, a region of interest 105-2, corresponding to the position and to the dimensions of the region of interest 105-1 defined in the preceding image, in which disturbances may be estimated. The region of interest 105-1 is thus compared to the region of interest 105-2 of FIG. 1 b, for example by subtracting those image parts one from another, pixel by pixel (pixel being an acronym for PICture ELement), in order to extract therefrom a map of pixels that are considered to be in movement. These pixels in movement constitute a mask of pixels of interest (presented in FIG. 2).
  • Points of interest, generically referenced 110 in FIG. 1 a, may be determined in the image 100-1, in particular in the region of interest 105-1, according to standard algorithms for image analysis. These points of interest may be advantageously detected at positions in the region of interest which belong to the mask of pixels of interest.
  • The points of interest 110 defined in the region of interest 105-1 are tracked in the image 100-2, preferably in the region of interest 105-2, for example using the KLT tracking principles by comparing portions of the images 100-1 and 100-2 that are associated with the neighborhoods of the points of interest.
  • These matches denoted 115 between the image 100-1 and the image 100-2 make it possible to estimate the movements of the hand represented with the reference 120-1 in image 100-1 and the reference 120-2 in image 100-2. It is thus possible to obtain the new position of the hand in the image 100-2.
  • The movement of the hand may next be advantageously used to move the region of interest 105-2 from the image 100-2 to the modified region of interest 125 which may be used for estimating the movement of the hand in an image following the image 100-2 of the image stream. The method of tracking objects may thus continue recursively.
  • It is to be noted here that, as stated earlier, certain points of interest present in the image 100-1 have disappeared from the image 100-2 due, in particular, to the presence and movements of the hand.
  • The determination of points of interest in an image is, preferably, limited to the zone corresponding to the corresponding region of interest as located on the current image or to a zone comprising all or part thereof when a mask of interest of pixels in movement is defined in that region of interest.
  • According to a particular embodiment, estimation is made of information characterizing the relative positions and orientations of the objects to track (for example the hand referenced 120-1 in FIG. 1 a) in relation to a reference linked to the video camera from which the images come. Such information is, for example two-dimensional position information (x, y), orientation information (θ) and information on distance to the video camera, that is to say scale(s) of the objects to track.
  • Similarly, it is possible to track the modifications that have occurred in the region of interest 125 that is defined in the image 100-2 relative to the region of interest 105-1 of the image 100-1 according to a movement estimated between the image 100-2 and the following image of the stream of images. For these purposes, a new region of interest is first of all identified in the following image on the basis of the region of interest 125. When the region of interest has been identified, it is compared with the region of interest 125 in order to determine the modified elements, forming a mask comprising parts of objects whose movements must be determined.
  • FIG. 2, comprising FIGS. 2 a to 2 c, illustrates the variation of a region of interest of one image in comparison with the corresponding region of interest, at the same position, of a following image, as described with reference to FIG. 1. The image resulting from this comparison, having the same shape as the region of interest, is formed of pixels which here may take two states, a first state being associated, by default, with each pixel. A second state is associated with the pixels corresponding to the pixels of the regions of interest whose variation exceeds a predetermined threshold. This second state forms a mask used here to limit the search for points of interest to zones which are situated on tracked objects or that are close to those tracked objects in order to characterize the movement of the tracked objects and, possibly, to trigger particular actions.
  • FIG. 2 a represents a region of interest of a first image whereas FIG. 2 b represents the corresponding region of interest of a following image, at the same position. As illustrated in FIG. 2 a, the region of interest 200-1 comprises a hand 205-1 as well as another object 210-1. Similarly, the corresponding region of interest, referenced 200-2 and illustrated in FIG. 2 b, comprises the hand and the object, here referenced 205-2 and 210-2, respectively. The hand, generically referenced 205, has moved substantially whereas the object, generically referenced 210, has only moved slightly.
  • FIG. 2 c illustrates the image 215 resulting from the comparison of the regions of interest 200-1 and 200-2. The black part, forming a mask of interest, represents the pixels whose difference is greater than a predetermined threshold whereas the white part represents the pixels whose difference is less than that threshold. The black part comprises in particular the part referenced 220 corresponding to the difference in position of the hand 205 between the regions of interest 200-1 and 200-2. It also comprises the part 225 corresponding to the difference in position of the object 210 between those regions of interest. The part 230 corresponds to the part of the hand 205 present in both these regions of interest.
  • The image 215 represented in FIG. 2 c may be analyzed to deduce therefrom an interaction between the user who moved his hand in the field of the video camera from which come the images from which are extracted the regions of interest 200-1 and 200-2 and a computer system processing those images. Such an analysis may in particular consist in identifying the movement of points of interest belonging to the mask of interest so formed, the search for points of interest then preferably being limited to the mask of interest.
  • However, a skeletonizing step making it possible in particular to eliminate adjoining movements such as the movement referenced 225 is, preferably, carried out before analyzing the movement of the points of interest belonging to the mask of interest. This skeletonizing step may take the form of a morphological processing operation such as for example operations of opening or closing applied to the mask of interest.
  • Furthermore, advantageously, the mask of interest obtained is modified in order to eliminate the parts situated around the points of interest identified recursively between the image from which is extracted the region of interest 200-1 and the image preceding it.
  • FIG. 2 d thus illustrates the mask of interest represented in FIG. 2 c, here referenced 235, to which the parts 240 situated around the points of interest identified by 245 have been eliminated. The parts 240 are, for example, circular. They are of predetermined radius here.
  • The mask of interest 235 thus has cropped from it zones in which are situated already detected points of interest and where it is thus not necessary to detect new ones. In other words, this modified mask of interest 235 has just excluded a part of the mask of interest 220 in order to avoid the accumulation of points of interest in the same zone of the region of interest.
  • Again, the mask of interest 235 may be used to identify points of interest whose movements may be analyzed in order to trigger, the case arising, a particular action.
  • FIG. 3 is again a diagrammatic illustration of the determination of a movement of an object of which at least one part is represented in a region and a mask of interest of two consecutive (or close) images. The image 300 here corresponds to the mask of interest resulting from the comparison of the regions of interest 200-1 and 200-2 as described with reference to FIG. 2 d. However, a skeletonizing step has been carried out to eliminate the disturbances (in particular the disturbance 225). Thus, the image 300 comprises a mask 305 which may be used for identifying new points of interest whose movements characterize the movement of objects in that region of interest.
  • By way of illustration, the point of interest corresponding to the end of the user's index finger is shown. Reference 310-1 designates this point of interest according to its position in the region of interest 200-1 and reference 310-2 designates that point of interest according to its position in the region of interest 200-2. Thus, by using standard techniques for tracking points of interest, for example an algorithm for tracking by optical flow, it is possible, on the basis of the point of interest 310-1 of the region of interest 200-1, to find the corresponding point of interest 310-2 of the region of interest 200-2 and, consequently, to find the corresponding translation.
  • The analysis of the movements of several points of interest, in particular of the point of interest 310-1 and of the points of interest detected and validated beforehand, for example the points of interest 245, makes it possible to determine a set of movement parameters for the tracked object, in particular which are linked to a translation, a rotation and/or a change of scale.
  • FIG. 4 is a diagrammatic illustration of certain steps implemented in accordance with the invention to identify, in continuous operation, variations in arrangement of objects between two consecutive (or close) images of a sequence of images.
  • The images here are acquired via an image sensor such as a video camera, in particular a video camera of webcam type, connected to a computer system implementing the method described here.
  • After having acquired a current image 400 and if that image is the first to be processed, that is to say if a preceding image 405 from the same video stream has not been processed beforehand, a first step of initializing (step 410) is executed. An object of this step is in particular to define features of at least one region of interest, for example a shape, a size and an initial position.
  • As described earlier, a region of interest may be defined relative to a corresponding region of interest determined in a preceding image (in recursive phase of tracking, in this case the initializing 410 is not necessary) or according to predetermined features and/or particular events (corresponding to the initializing phase).
  • Thus, by way of illustration, it is possible for a region of interest not to be defined in an initial state, the system being on standby for a triggering event, for example a particular movement of the user facing the video camera (the moving pixels in the image are analyzed in search for a particular movement), the location of a particular color such as the color of skin or the recognition of a particular predetermined object whose position defines that of the region of interest. Like the position, the size and the shape of the region of interest may be predefined or be determined according to features of the detected event.
  • The initializing step 410 may thus take several forms depending on the object to track in the image sequence and depending on the application implemented.
  • It may in particular be a static initialization. In this case, the initial position of the region of interest is predetermined (off-line determination) and the tracking algorithm is on standby for a disturbance.
  • The initializing phase may also comprise a step of recognizing objects of a specific type. For example, the principles of detecting descriptors of Haar wavelet type may be implemented. The principle of these descriptors is in particular described in the paper by Viola and Jones, “Rapid object detection using boosted cascade of simple features”, Computer Vision and Pattern Recognition, 2001. These descriptors in particular enable the detection of a face, the eyes or a hand in an image or a part of an image. During the initializing phase, it is thus possible to search for particular objects either in the whole image in order to position the region of interest on the detected object or in a region of interest itself to trigger the tracking of the recognized object.
  • Another approach consists in segmenting an image and in identifying certain color properties and certain predefined shapes. When a shape and/or a segmented region of the processed image is similar to the object searched for, for example the color of the skin and the outline of the hand, the tracking process is initialized as described earlier.
  • In a following step (step 415), a region of interest whose features have been determined beforehand (on initialization or in the preceding image) is positioned in the current image to extract the corresponding image part. If the current image is the first image of the video stream to be processed, that image becomes the preceding image, a new image current is acquired and step 415 is repeated.
  • This image part thus extracted is then compared with the corresponding region of interest of the preceding image (step 420). Such a comparison may in particular consist of subtracting each pixel from the region of interest considered of the current image with the corresponding pixel of the corresponding region of interest of the preceding image.
  • The detection of the points in movement is thus carried out, according to this example, by the absolute difference of parts of the current image and of the preceding image. This difference makes it possible to create a mask of interest capable of being used to distinguish a moving object from the decor, which is essentially static. However, as the object/decor segmentation is not expected to be perfect, it is possible to update such a mask of interest recursively on the basis of the movements in order to identify the movements of the pixels of the tracked object and the movements of the pixels which belong to the background of the image.
  • Thresholding is then preferably carried out on the difference between pixels according to a predetermined threshold value (step 425). Such thresholding may, for example, be carried out on the luminance. If coding over 8 bits is used, its value is, for example, 100. It makes it possible to isolate the pixels having a movement considered to be sufficiently great between two consecutive (or close) images. The difference between the pixels of the current and preceding images is then binary coded, for example black if the difference exceeds the predetermined threshold characterizing the movement and white in the opposite case. The binary image formed by the pixels whose difference exceeds the predetermined threshold forms a mask of interest or tracking in the region of interest considered (step 430).
  • If points of interest have been validated beforehand, the mask is modified (step 460) in order to exclude from the mask zones in which points of interest are recursively tracked. Thus, as represented by the use of dashed line, step 460 is only carried out if there are validated points of interest. As indicated earlier, this step consists in eliminating zones from the mask created, for example disks of a predetermined diameter, around points of interest validated beforehand.
  • Points of interest are then searched for in the region of the preceding image corresponding to the mask of interest so defined (step 435), the mask of interest here being the mask of interest created at step 430 or the mask of interest created at step 430 and modified during step 460.
  • The search for points of interest is, for example, limited to the detection of twenty points of interest. Naturally, this number may be different and may be estimated according to the size of the mask of interest.
  • This search is advantageously carried out with the algorithm known by the name FAST. According to this algorithm, a Bresenham circle, for example with a perimeter of 16 pixels, is constructed around each pixel of the image. If k contiguous pixels (k typically having a value of 9, 10, 11 or 12) contained in that circle all have either greater intensity than the central pixel, or all have lower intensity than the central pixel, that central pixel is considered as a point of interest. It is also possible to identify points of interest with an approach based on image gradients as provided in the approach known by the name of Harris points detection.
  • The points of interest detected in the preceding image according to the mask of interest as well as, where applicable, the points of interest detected and validated beforehand are used to identify the corresponding points of interest in the current image.
  • A search for corresponding points of interest in the current image is thus carried out (step 440), preferably using a method known under the name of optical flow. The use of this technique gives better robustness when the image is blurred, in particular thanks to the use of pyramids of images smoothed by a Gaussian filter. This is for example the approach implemented by Lucas, Kanade and Tomasi in the algorithm known under the name KLT.
  • When the points of interest of the current image, corresponding to the points of interest of the preceding image (which are determined according to the mask of interest or by recursive tracking), have been identified, movement parameters are estimated for objects tracked in the region of interest of the preceding image relative to the region of interest of the current image (step 445). Such parameters, also termed degrees of freedom, comprise, for example, a parameter of translation along the x-axis, a parameter of translation along the y-axis, a rotation parameter and/or a scale parameter, the transformation making a set of bi-directional points pass from one plane to another, grouping together these four parameters, being termed the similarity. These parameters are, preferably, estimated using the method of Nonlinear Least Squares Error (NLSE) or Gauss-Newton method. This method is directed to minimizing a re-projection error over the set of the tracked points of interest. In order to improve the estimation of the parameters of the model (position and orientation), it is advantageous, in a specific embodiment, to search for those parameters in a distinct manner. Thus, for example, it is relevant to apply the least squares error, in a first phase, in order to estimate only the translation parameters (x,y), these latter being easier to identify, then, during a second iteration, to compute the parameters of scale change and/or of rotation (possibly less precisely).
  • In a following step, the points of interest of the preceding image, for which a match has been found in the current image, are preferably analyzed in order to recursively determine valid points of interest relative to the movement estimated in the preceding step. For these purposes, it is verified, for each previously determined point of interest of the preceding image (determined according to the mask of interest of by recursive tracking), whether the movement, relative to that point of interest, of the corresponding point of interest of the current image is in accordance with the identified movement. In the affirmative, the point of interest is considered as valid whereas in the opposite case, it is considered as not valid. A threshold, typically expressed in pixels and having a predetermined value, is advantageously used to authorize a certain margin of error between the theoretical position of the point in the current image (obtained by applying the parameters of step 445) and its real position (obtained by the tracking method of step 440).
  • The valid points of interest, here referenced 455, are considered as belonging to an object whose movement is tracked whereas the non-valid points (also termed outliers), are considered as belonging to the image background or to portions of an object which are not visible in the image.
  • As indicated earlier, the valid points of interest are tracked in the following image and are used to modify the mask of interest created by comparison of a region of interest of the current image with the corresponding region of interest of the following image (step 460) in order to exclude from the portions of mask, pixels in movement between the current and following images as described with reference to FIG. 2 d. This modified mask of interest makes it possible to eliminate portions of images in which points of interest are recursively tracked. The valid points of interest are thus kept for several processing operations on successive images and in particular enable stabilization of the tracking of objects.
  • The new region of interest (or modified region of interest) which is used for processing the current image and the following image is then estimated thanks to the previously estimated degrees of freedom (step 445). For example, if the degrees of freedom are x and y translations, the new position of the region of interest is estimated according to the previous position of the region of interest, using those two items of information. If a change (or changes) of scale is estimated and considered in this step, it is possible, according to the scenario considered, also to modify the size of the new region of interest which is used in the current and following images of the video stream.
  • In parallel, when the different degrees of freedom have been computed, it is possible to estimate a particular interaction according to those parameters (step 470).
  • According to a particular embodiment, the estimation of a change (or changes) of scale is used for detecting the triggering of an action in similar manner to the click of a mouse. Similarly, it is possible to use changes of orientation, particularly those around the viewing axis of the video camera (referred to as roll) in order, for example, to enable the rotation of a virtual element displayed in a scene or to control a button of “potentiometer” type in order, for example, to adjust a volume of sound of an application.
  • This detection of interactions according to scale factor to detect an action such as a mouse click may, for example, be implemented in the following manner, by counting the number of images over which the norm of the movement vector (translation) and the scale factor (determined according to corresponding regions of interest) are less than certain predetermined values. Such a number characterizes a stability in the movement of the tracked objects. If the number of images over which the movement is stable exceeds a certain threshold, the system enters a state of standby for the detection of a click. A click is then detected by measuring the average of the absolute differences of the scale factors between current and preceding images, this being performed over a given number of images. If the sum thus computed exceeds a certain threshold, the click is validated.
  • When an object is no longer tracked in a sequence of images (either because it disappears from the image, or because it has been lost), the algorithm preferably returns to the initializing step. Furthermore, loss of tracking leading to the initializing step being re-executed may be identified by measuring the movements of a user. Thus, it may be decided to reinitialize the method when those movements are stable or non-existent for a predetermined period or when a tracked object leaves the field of view of the image sensor.
  • FIG. 5 illustrates more precisely certain aspects of the invention when four parameters characterize a movement of an object tracked in consecutive (or close) images of a sequence of images; These four parameters here are a translation denoted (Tx, Ty), a rotation denoted θ around the optical axis of the image sensor and a scale factor denoted s. These four parameters represent a similarity which is the transformation enabling a point M to be transformed from a plane to a point M′.
  • As illustrated in FIG. 5, O represents the origin of a frame of reference 505 for the object in preceding image and O′ represents the origin of a frame of reference 510 of the object in the current image, which frame of reference 510 being obtained in accordance with the object tracking method, the image frame of reference here bearing the reference 500. It is then possible to express the transformation of the point M to the point M′ by the following system of non-linear equations:

  • X M′ =s·(X M −X O)·cos(θ)−s·(Y M −Y O)·sin(θ)+T x +X O

  • Y M′ =s·(X M −X O)·sin(θ)+s·(Y M −Y O)·cos(θ)+T y +Y O
  • where (XM, YM) are the coordinates of the point M expressed in the image frame of reference, (X0, Y0) are the coordinates of the point O in the image frame of reference and (XM′, YM′) are the coordinates of the point M′ in the image frame of reference.
  • The points Ms and M respectively represent the transformation of the point M according to the change in scale s and the change of scale s combined with the rotation θ, respectively.
  • As described earlier, it is possible to use the nonlinear least squares error approach to solve this system by using all the points of interest tracked in step 440 described with reference to FIG. 4.
  • To compute the new position of the object in the current image (step 465 of FIG. 4), it suffices theoretically to apply the estimated translation (Tx,Ty) to the previous position of the object in the following manner:

  • X 0′ =X 0 +T x

  • Y 0′ =Y 0 +T y
  • where (X0′, Y0′) are the coordinates of the point O′ in the image frame of reference.
  • Advantageously, the partial derivatives of each point considered, that is to say the movements associated with each of those points, are weighted according to the associated movement. Thus, the points of interest moving the most have greater importance in the estimation of the parameters, which avoids the points of interest linked to the background disturbing the tracking of objects.
  • It has thus been observed that it is advantageous to add an influence of the center of gravity of the points of interest tracked in the current image to the preceding equation. This center of gravity approximately corresponds to the local center of gravity of the movement (the points tracked in the current image come from moving points in the preceding image). The center of the region of interest thus tends to translate to the center of the movement so long as the distance of the object to the center of gravity is greater than the estimated translation movement. The origin of the frame of reference in the current image, characterizing the movement of the tracked object, is advantageously computed according to the following relationship:

  • X O′ =X O+W GC·(X GC·(X GC −X O)+W T ·T x

  • Y O′ =Y O +W GC·(Y GC −Y O)+W T ·T y
  • where (XGC, YGC) represent the center of gravity of the points of interest in the current image and WGC represents the weight on the influence of the current center of gravity and WT the weight on the influence of the translation. The parameter WGC is positively correlated here with the velocity of movement of the tracked object whereas the parameter WT may be fixed depending on the desired influence of the translation.
  • FIG. 6, comprising FIGS. 6 a, 6 b and 6 c, illustrates an example of implementation of the invention in the context of a driving simulation game in which two regions of interest enable the tracking of a user's hands in real time, characterizing a vehicle steering wheel movement, in a sequence of images.
  • More specifically, FIG. 6 a is a pictorial presentation of the context of the game, whereas FIG. 6 b represents the display of the game as perceived by a user. FIG. 6 c illustrates the estimation of the movement parameters, or degrees of freedom, of the tracked objects to deduce therefrom a movement of a vehicle steering wheel.
  • FIG. 6 a comprises an image 600 extracted from the sequence of images provided by the image sensor used. The latter is placed facing the user, as if it were fastened to the windshield of the vehicle driven by the user. This image 600 here contains a zone 605 comprising two circular regions of interest 610 and 615 associated with a steering wheel 620 drawn in overlay by computer graphics. The image 600 also comprises elements of the real scene in which the user is situated.
  • The initial position of the regions 610 and 615 is fixed on a predetermined horizontal line, at equal distances on respective opposite sides of a point representing the center of the steering wheel, while awaiting a disturbance. When the user positions his hands in these two regions, he is able to turn the steering wheel either to the left, or to the right. The movement of the regions 610 and 615 is here constrained by the radius of the circle corresponding to the steering wheel 620. The image representing the steering wheel moves with the hands of the user, for example according to the average movement of both hands.
  • The radius of the circle corresponding to the steering wheel 620 may also vary when the user moves his hands towards or away from the center of that circle.
  • These two degrees of freedom are next advantageously used to control the orientation of a vehicle (position of the hands on the circle corresponding to the steering wheel 620) and its velocity (scale factor linked to the position of the hands relative to the center of the circle corresponding to the steering wheel 620).
  • FIG. 6 b, illustrating the display 625 of the application, comprises the image portion 605 extracted from the image 600. This display enables the user to observe and control his movements. The image portion 605 may advantageously be represented as a car rear-view mirror in which the driver may observe his actions.
  • The regions 610 and 615 of the image 600 enable the movements of the steering wheel 620 to be controlled, that is to say to control the direction of the vehicle referenced 630 on the display 625 as well as its velocity relative to the elements 635 of the decor, the vehicle 630 and the elements 635 of the decor being created here by computer graphics. In accordance with the standard driving applications, the vehicle may move in the decor and hit certain elements.
  • FIG. 6 c more precisely describes the estimation of the parameters of freedom linked to each of the regions of interest and to deduce the degrees of freedom of the steering wheel. In this implementation, the parameters to estimate are the orientation θ of the steering wheel and its diameter D.
  • In order to analyze the components of the movement, several frames of reference are defined. The frame of reference Ow here corresponds to an overall frame of reference (“world” frame of reference), the frame of reference Owh is a local frame of reference linked to the steering wheel 620 and the frames of reference Oa1 and Oa2 are two local frames of reference linked to the regions of interest 610 and 615. The vectors Va1(Xva1, Yva1) and Va2(Xva2, Yva2) are the movement vectors resulting from the analysis of the movement of the user's hands in the regions of interest 610 and 615, expressed in the frames of reference Oa1 and Oa2, respectively.
  • The new orientation θ′ of the steering wheel is computed relative to its previous orientation θ and on the basis of the movement of the user's hands (determined via the two regions of interest 610 and 615). The movement of the steering wheel is thus a constrained movement linked to the movement of several regions of interest. The new orientation θ′ may be computed in the following manner:

  • θ′=θ+((Δθ1+Δθ2)/2)
  • where Δθ1 and Δθ2, represent the rotation of the user's hands.
  • Δθ1 may be computed by the following relationship:

  • Δθ1=a tan2(Yva1wh, D/2)
  • with Yva1 wh=Xva1*sin(−(θ+π))+Yva1*cos(−(θ+π)) characterizing the translation along the y-axis in the frame of reference Owh.
  • Δθ2 may be computed in similar manner.
  • Similarly, the new diameter D′ of the steering wheel is computed on the basis of its previous diameter D and on the basis of the movement of the user's hands (determined via the two regions of interest 610 and 615). It may be computed in the following manner:

  • D′=D+((Xva1wh+Xva2wh)/2)
  • with Xva1 wh=Xva1*cos(−(θ+)−Yva1*sin(−(θ+π)) and
  • Xva2 wh=Xva2*cos(−θ)−Yva2*sin(−θ)
  • Thus, knowing the angular position of the steering wheel and its diameter, the game scenario may in particular compute a corresponding computer graphics image.
  • FIG. 7 illustrates an example of a device which may be used to identify the movements of objects represented in images provided by a video camera and to trigger particular actions according to identified movements. The device 700 is for example a mobile telephone of smartphone type, a personal digital assistant, a micro-computer or a workstation.
  • The device 700 preferably comprises a communication bus 702 to which are connected:
      • a central processing unit or microprocessor 704 (CPU);
      • A read only memory 706 (ROM) able to include the operating system and programs such as “Prog”;
      • a random access memory or cache memory (RAM) 708, comprising registers adapted to record variables and parameters created and modified during the execution of the aforementioned programs;
      • a video acquisition card 710 connected to a video camera 712; and
      • a graphics card 714 connected to a screen or a projector 716.
  • Optionally, the device 700 may also have the following items:
      • a hard disk 720 able to contain the aforesaid programs “Prog” and data processed or to be processed according to the invention;
      • a keyboard 722 and a mouse 724 or any other pointing device such as an optical stylus, a touch screen or a remote control enabling the user to interact with the programs according to the invention, in particular during the phases of installation and/or initialization;
      • a communication interface 726 connected to a distributed communication network 728, for example the Internet, the interface being able to transmit and receive data; and,
      • a reader for memory cards (not shown) adapted to read or write thereon data processed or to be processed according to the invention.
  • The communication bus allows communication and interoperability between the different elements included in the device 700 or connected to it. The representation of the bus is non-limiting and, in particular, the central processing unit may communicate instructions to any element of the device 700 directly or by means of another element of the device 700.
  • The executable code of each program enabling the programmable apparatus to implement the processes according to the invention may be stored, for example, on the hard disk 720 or in read only memory 706.
  • According to a variant, the executable code of the programs can be received by the intermediary of the communication network 728, via the interface 726, in order to be stored in an identical fashion to that described previously.
  • More generally, the program or programs may be loaded into one of the storage means of the device 700 before being executed.
  • The central processing unit 704 will control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, these instructions being stored on the hard disk 720 or in the read-only memory 706 or in the other aforementioned storage elements. On powering up, the program or programs which are stored in a non-volatile memory, for example the hard disk 720 or the read only memory 706, are transferred into the random-access memory 708, which then contains the executable code of the program or programs according to the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.
  • It should be noted that the communication apparatus comprising the device according to the invention can also be a programmed apparatus. This apparatus then contains the code of the computer program or programs for example fixed in an application specific integrated circuit (ASIC).
  • Naturally, to satisfy specific needs, a person skilled in the art will be able to make amendments to the preceding description.

Claims (17)

1. A computer implemented method of detecting movement of at least one object situated in a field of an image sensor, the image sensor providing a stream of images to the computer, the method comprising:
receiving at least one first image from the image sensor;
identifying at least one first region of interest in the first image, wherein the at least one first region of interest corresponds to a part of the at least one first image;
receiving at least one second image from the image sensor;
identifying at least one second region of interest in the at least one second image, wherein the at least one second region of interest corresponds to the at least one first region of interest of the at least one first image;
comparing the at least one first and second regions of interest and determining a mask of interest characterizing a variation of at least one feature of corresponding points in the at least one first and second regions of interest;
determining a movement of the at least one object from the mask of interest, wherein the at least one object is at least partially represented in at least one of the at least one first and second regions of interest;
analyzing the movement; and
determining whether to trigger an action.
2. The method according to claim 1, wherein determining the movement comprises determining and matching at least one pair of points of interest in the at least one first and second images, wherein at least one point of the at least one pair of points of interest belong to the mask of interest.
3. The method according to claim 2, wherein determining the movement comprises determining and matching a plurality of pairs of points of interest in the at least one first and second images, wherein at least one point of each of the pairs of points of interest belong to the mask of interest, wherein the movement is estimated on the basis of a transformation of a first set of points of interest into a second set of points of interest, wherein the points of interest of the first and second sets belong to the plurality of pairs of points of interest, wherein the points of interest of the first set of points of interest additionally belong to at least one first image, and wherein the points of interest of the second set of points of interest additionally belong to at least one second image.
4. The method according to claim 3, wherein the transformation implements a weighting function based on a distance between two points of interest from the same pairs of points of interest of the plurality of pairs of points of interest.
5. The method according to claim 3, further comprising validating at least one point of interest of the at least one first image, belonging to the at least one pair of points of interest, according to the determined movement, wherein the at least one validated point of interest is used to track the object in at least one third image following the at least one second image and the at least one validated point of interest is used for modifying a mask of interest created on the basis of the at least one second and third images.
6. The method according to claim 1, wherein comparing the at least one first and second regions of interest comprises performing subtraction, point by point, of values of corresponding points of the at least one first and second regions of interest and comparing a result of the subtraction to a threshold.
7. The method according to claim 1, further comprising detecting at least one feature in the at least one first image, wherein the at least one first region of interest is at least partially identified in response to the detecting.
8. The method according to claim 7, wherein the at least one feature includes at least one of a shape and a color.
9. The method according to claim 1, further comprising estimating at least one modified second region of interest in the at least one second image, wherein the at least one modified second region of interest of the at least one second image is estimated according to the at least one first region of interest of the at least one first image and of the at least one second region of interest of the at least one second image.
10. The method according to claim 9, wherein the estimating comprises performing an object tracking algorithm of KL T type.
11. The method according to claim 1, wherein the movement comprises at least one of a translation, a rotation, a scale factor.
12. The method according to claim 11, wherein the movement comprises a scale factor and wherein whether the action is triggered is determined based at least in part on the scale factor.
13. The method according to claim 1, wherein movements of at least two objects situated in the field of the image sensor are determined, and wherein whether the action is triggered is determined based at least in part on a combination of the movements associated with the at least two objects.
14. (canceled)
15. (canceled)
16. A non-transitory computer readable medium having instructions, which, when executed cause the computer to perform the method of claim 1.
17. A device configured to perform the method of claim 1.
US13/300,509 2010-11-19 2011-11-18 Method and device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system Abandoned US20120129605A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1059541 2010-11-19
FR1059541A FR2967804B1 (en) 2010-11-19 2010-11-19 METHOD AND DEVICE FOR DETECTING AND TRACKING REAL-TIME MOVING NON-RIGID OBJECTS IN A VIDEO STREAM ENABLING A USER TO INTERACT WITH A COMPUTER SYSTEM

Publications (1)

Publication Number Publication Date
US20120129605A1 true US20120129605A1 (en) 2012-05-24

Family

ID=44168356

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/300,509 Abandoned US20120129605A1 (en) 2010-11-19 2011-11-18 Method and device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system

Country Status (5)

Country Link
US (1) US20120129605A1 (en)
EP (1) EP2455916B1 (en)
JP (1) JP5967904B2 (en)
KR (1) KR20120054550A (en)
FR (1) FR2967804B1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100172573A1 (en) * 2009-01-07 2010-07-08 Michael Bailey Distinguishing Colors of Illuminated Objects Using Machine Vision
US20130308001A1 (en) * 2012-05-17 2013-11-21 Honeywell International Inc. Image stabilization devices, methods, and systems
US20140051513A1 (en) * 2012-05-14 2014-02-20 Fabrizio Polo Interactive augmented reality using a self-propelled device
US20140254870A1 (en) * 2013-03-11 2014-09-11 Lenovo (Singapore) Pte. Ltd. Method for recognizing motion gesture commands
US8933970B2 (en) 2012-09-11 2015-01-13 Longsand Limited Controlling an augmented reality object
US9011246B1 (en) * 2013-11-18 2015-04-21 Scott Kier Systems and methods for immersive backgrounds
WO2016106052A1 (en) * 2014-12-23 2016-06-30 Matthew Daniel Fuchs Augmented reality system and method of operation thereof
US9569661B2 (en) * 2015-05-21 2017-02-14 Futurewei Technologies, Inc. Apparatus and method for neck and shoulder landmark detection
US20170092009A1 (en) * 2012-05-14 2017-03-30 Sphero, Inc. Augmentation of elements in a data content
US20170206666A1 (en) * 2014-08-26 2017-07-20 Sony Corporation Information processing apparatus, information processing method, and program
US9766620B2 (en) 2011-01-05 2017-09-19 Sphero, Inc. Self-propelled device with actively engaged drive system
US9829882B2 (en) 2013-12-20 2017-11-28 Sphero, Inc. Self-propelled device with center of mass drive system
US9886032B2 (en) 2011-01-05 2018-02-06 Sphero, Inc. Self propelled device with magnetic coupling
US20180075660A1 (en) * 2016-09-15 2018-03-15 Thomson Licensing Method and device for blurring a virtual object in a video
US10022643B2 (en) 2011-01-05 2018-07-17 Sphero, Inc. Magnetically coupled accessory for a self-propelled device
US10056791B2 (en) 2012-07-13 2018-08-21 Sphero, Inc. Self-optimizing power transfer
US20180260619A1 (en) * 2017-03-09 2018-09-13 Fujitsu Limited Method of determining an amount, non-transitory computer-readable storage medium and information processing apparatus
US10168701B2 (en) 2011-01-05 2019-01-01 Sphero, Inc. Multi-purposed self-propelled device
US10192310B2 (en) 2012-05-14 2019-01-29 Sphero, Inc. Operating a computing device by detecting rounded objects in an image
US10248118B2 (en) 2011-01-05 2019-04-02 Sphero, Inc. Remotely controlling a self-propelled device in a virtualized environment
US10474921B2 (en) 2013-06-14 2019-11-12 Qualcomm Incorporated Tracker assisted image capture
US20190370979A1 (en) * 2015-12-30 2019-12-05 Deepak Kumar Poddar Feature point identification in sparse optical flow based tracking in a computer vision system
US10846838B2 (en) * 2016-11-25 2020-11-24 Nec Corporation Image generation device, image generation method, and storage medium storing program

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BE1029378B1 (en) * 2021-05-05 2022-12-05 P³Lab IMAGE PROCESSING METHOD

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080030459A1 (en) * 2004-06-30 2008-02-07 Tsutomu Kouno Information Processing Device For Controlling Object By Using Player Image And Object Control Method In The Information Processing Device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR957353A (en) 1950-02-18
FR752809A (en) 1932-03-31 1933-09-30 Lorenz C Ag Device for directing the landing of airplanes
US1966429A (en) 1932-04-21 1934-07-17 Buxton Inc Key case
BE395422A (en) 1933-03-08
FR854382A (en) 1938-12-30 1940-04-11 Anti-leak hose for high pressure
FR902764A (en) 1943-02-09 1945-09-12 Small frame for slides
JPH05205052A (en) * 1992-01-23 1993-08-13 Matsushita Electric Ind Co Ltd Automatic tracking device
ATE551675T1 (en) * 2005-01-21 2012-04-15 Qualcomm Inc MOTION BASED TRACKING
JP4635663B2 (en) * 2005-03-15 2011-02-23 オムロン株式会社 Image processing system, image processing apparatus and method, recording medium, and program
JP2008071172A (en) * 2006-09-14 2008-03-27 Toshiba Corp Face authentication system, face authentication method, and access control device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080030459A1 (en) * 2004-06-30 2008-02-07 Tsutomu Kouno Information Processing Device For Controlling Object By Using Player Image And Object Control Method In The Information Processing Device

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100172573A1 (en) * 2009-01-07 2010-07-08 Michael Bailey Distinguishing Colors of Illuminated Objects Using Machine Vision
US8320662B2 (en) * 2009-01-07 2012-11-27 National Instruments Corporation Distinguishing colors of illuminated objects using machine vision
US10168701B2 (en) 2011-01-05 2019-01-01 Sphero, Inc. Multi-purposed self-propelled device
US9841758B2 (en) 2011-01-05 2017-12-12 Sphero, Inc. Orienting a user interface of a controller for operating a self-propelled device
US10678235B2 (en) 2011-01-05 2020-06-09 Sphero, Inc. Self-propelled device with actively engaged drive system
US10423155B2 (en) 2011-01-05 2019-09-24 Sphero, Inc. Self propelled device with magnetic coupling
US10281915B2 (en) 2011-01-05 2019-05-07 Sphero, Inc. Multi-purposed self-propelled device
US10248118B2 (en) 2011-01-05 2019-04-02 Sphero, Inc. Remotely controlling a self-propelled device in a virtualized environment
US11460837B2 (en) 2011-01-05 2022-10-04 Sphero, Inc. Self-propelled device with actively engaged drive system
US11630457B2 (en) 2011-01-05 2023-04-18 Sphero, Inc. Multi-purposed self-propelled device
US10022643B2 (en) 2011-01-05 2018-07-17 Sphero, Inc. Magnetically coupled accessory for a self-propelled device
US10012985B2 (en) 2011-01-05 2018-07-03 Sphero, Inc. Self-propelled device for interpreting input from a controller device
US9766620B2 (en) 2011-01-05 2017-09-19 Sphero, Inc. Self-propelled device with actively engaged drive system
US9952590B2 (en) 2011-01-05 2018-04-24 Sphero, Inc. Self-propelled device implementing three-dimensional control
US9886032B2 (en) 2011-01-05 2018-02-06 Sphero, Inc. Self propelled device with magnetic coupling
US9836046B2 (en) 2011-01-05 2017-12-05 Adam Wilson System and method for controlling a self-propelled device using a dynamically configurable instruction library
US20170092009A1 (en) * 2012-05-14 2017-03-30 Sphero, Inc. Augmentation of elements in a data content
US9827487B2 (en) * 2012-05-14 2017-11-28 Sphero, Inc. Interactive augmented reality using a self-propelled device
US10192310B2 (en) 2012-05-14 2019-01-29 Sphero, Inc. Operating a computing device by detecting rounded objects in an image
US20140051513A1 (en) * 2012-05-14 2014-02-20 Fabrizio Polo Interactive augmented reality using a self-propelled device
US20130308001A1 (en) * 2012-05-17 2013-11-21 Honeywell International Inc. Image stabilization devices, methods, and systems
US8854481B2 (en) * 2012-05-17 2014-10-07 Honeywell International Inc. Image stabilization devices, methods, and systems
US10056791B2 (en) 2012-07-13 2018-08-21 Sphero, Inc. Self-optimizing power transfer
US8933970B2 (en) 2012-09-11 2015-01-13 Longsand Limited Controlling an augmented reality object
US20140254870A1 (en) * 2013-03-11 2014-09-11 Lenovo (Singapore) Pte. Ltd. Method for recognizing motion gesture commands
US10474921B2 (en) 2013-06-14 2019-11-12 Qualcomm Incorporated Tracker assisted image capture
US11538232B2 (en) 2013-06-14 2022-12-27 Qualcomm Incorporated Tracker assisted image capture
US9011246B1 (en) * 2013-11-18 2015-04-21 Scott Kier Systems and methods for immersive backgrounds
US10620622B2 (en) 2013-12-20 2020-04-14 Sphero, Inc. Self-propelled device with center of mass drive system
US11454963B2 (en) 2013-12-20 2022-09-27 Sphero, Inc. Self-propelled device with center of mass drive system
US9829882B2 (en) 2013-12-20 2017-11-28 Sphero, Inc. Self-propelled device with center of mass drive system
US10169880B2 (en) * 2014-08-26 2019-01-01 Sony Corporation Information processing apparatus, information processing method, and program
US20170206666A1 (en) * 2014-08-26 2017-07-20 Sony Corporation Information processing apparatus, information processing method, and program
US11633667B2 (en) 2014-12-23 2023-04-25 Matthew Daniel Fuchs Augmented reality system and method of operation thereof
WO2016106052A1 (en) * 2014-12-23 2016-06-30 Matthew Daniel Fuchs Augmented reality system and method of operation thereof
US11040276B2 (en) 2014-12-23 2021-06-22 Matthew Daniel Fuchs Augmented reality system and method of operation thereof
US11433297B2 (en) 2014-12-23 2022-09-06 Matthew Daniel Fuchs Augmented reality system and method of operation thereof
US10335677B2 (en) 2014-12-23 2019-07-02 Matthew Daniel Fuchs Augmented reality system with agent device for viewing persistent content and method of operation thereof
US9569661B2 (en) * 2015-05-21 2017-02-14 Futurewei Technologies, Inc. Apparatus and method for neck and shoulder landmark detection
US11915431B2 (en) * 2015-12-30 2024-02-27 Texas Instruments Incorporated Feature point identification in sparse optical flow based tracking in a computer vision system
US20190370979A1 (en) * 2015-12-30 2019-12-05 Deepak Kumar Poddar Feature point identification in sparse optical flow based tracking in a computer vision system
US10825249B2 (en) * 2016-09-15 2020-11-03 Interdigital Ce Patent Holdings Method and device for blurring a virtual object in a video
US20180075660A1 (en) * 2016-09-15 2018-03-15 Thomson Licensing Method and device for blurring a virtual object in a video
US10846838B2 (en) * 2016-11-25 2020-11-24 Nec Corporation Image generation device, image generation method, and storage medium storing program
US11620739B2 (en) 2016-11-25 2023-04-04 Nec Corporation Image generation device, image generation method, and storage medium storing program
US10878549B2 (en) * 2016-11-25 2020-12-29 Nec Corporation Image generation device, image generation method, and storage medium storing program
US20180260619A1 (en) * 2017-03-09 2018-09-13 Fujitsu Limited Method of determining an amount, non-transitory computer-readable storage medium and information processing apparatus
US10706272B2 (en) * 2017-03-09 2020-07-07 Fujitsu Limited Method of determining an amount, non-transitory computer-readable storage medium and information processing apparatus

Also Published As

Publication number Publication date
JP5967904B2 (en) 2016-08-10
FR2967804B1 (en) 2013-01-04
KR20120054550A (en) 2012-05-30
JP2012113714A (en) 2012-06-14
EP2455916B1 (en) 2017-08-23
EP2455916A1 (en) 2012-05-23
FR2967804A1 (en) 2012-05-25

Similar Documents

Publication Publication Date Title
US20120129605A1 (en) Method and device for detecting and tracking non-rigid objects in movement, in real time, in a video stream, enabling a user to interact with a computer system
Habermann et al. Livecap: Real-time human performance capture from monocular video
US20100002909A1 (en) Method and device for detecting in real time interactions between a user and an augmented reality scene
US20210001885A1 (en) Method for predicting direction of movement of target object, vehicle control method, and device
US9489739B2 (en) Scene analysis for improved eye tracking
US11158121B1 (en) Systems and methods for generating accurate and realistic clothing models with wrinkles
US6697072B2 (en) Method and system for controlling an avatar using computer vision
US9076257B2 (en) Rendering augmented reality based on foreground object
KR100974900B1 (en) Marker recognition apparatus using dynamic threshold and method thereof
Davis et al. Real-time motion template gradients using Intel CVLib
WO2015026902A1 (en) Multi-tracker object tracking
AU2013242830A1 (en) A method for improving tracking in crowded situations using rival compensation
Elhayek et al. Fully automatic multi-person human motion capture for vr applications
CN112088348A (en) Method, system and computer program for remote control of a display device via head gestures
Matikainen et al. Prop-free pointing detection in dynamic cluttered environments
Amri et al. A robust framework for joint background/foreground segmentation of complex video scenes filmed with freely moving camera
Fan et al. A feature-based object tracking approach for realtime image processing on mobile devices
Chun et al. 3D face pose estimation by a robust real time tracking of facial features
Dornaika et al. Real time 3D face and facial feature tracking
Fiore et al. Towards achieving robust video selfavatars under flexible environment conditions
Habermann et al. Reticam: Real-time human performance capture from monocular video
Kitanovski et al. Augmented reality mirror for virtual facial alterations
Habermann et al. Nrst: Non-rigid surface tracking from monocular video
Huo et al. Multiple people tracking and pose estimation with occlusion estimation
Song et al. Real-time single camera natural user interface engine development

Legal Events

Date Code Title Description
AS Assignment

Owner name: TOTAL IMMERSION, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIVET, NICOLAS;PASQUIER, THOMAS;SIGNING DATES FROM 20111104 TO 20111106;REEL/FRAME:027311/0572

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: QUALCOMM CONNECTED EXPERIENCES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TOTAL IMMERSION, SA;REEL/FRAME:034260/0297

Effective date: 20141120

AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:QUALCOMM CONNECTED EXPERIENCES, INC.;REEL/FRAME:038689/0718

Effective date: 20160523