WO1999022336A1

WO1999022336A1 - Objet specification in a bit mapped image

Info

Publication number: WO1999022336A1
Application number: PCT/US1998/022551
Authority: WO
Inventors: Anthony J. Isadore Barreca
Original assignee: Magic Circle Media, Inc.
Priority date: 1997-10-24
Filing date: 1998-10-23
Publication date: 1999-05-06
Also published as: AU1119999A

Abstract

An object specification method (10) for identifying and tracking tags (16) delineating an object (20) in a video image (22). The object specification method (10) is applicable for the placing of tags (1) in still video images (22) and further tracks the tags (16) as the video image (22) is progressed through frames to become a moving video image (22). A computer system (12) is used by an originating user to perform the inventive operation (10) culminating in a calculate vectors and vector change frames operation (56) wherein a product is created suitable for use by an end user.

Description

OBJECT SPECIFICATION IN A BIT MAPPED IMAGE

TECHNICAL FIELD

The present invention relates to the field of computer video image manipulation, and more specifically to an improved means and method for identifying objects of concern within a video image, such that identification of the objects can be maintained even as the objects move within the image. The predominant current usage of the present inventive object specification system is in the identification of moving objects in a digitized movie wherein it is desirable to treat visually identifiable objects individually.

BACKGROUND ART

Manipulation of digitized video images, both still pictures and moving video presentations, is an important aspect of the present trend toward the introduction of "multimedia" into many aspects of our lives, as well as in modern aspects of more traditional endeavors such as. for example, the creation of motion pictures. United States Patent No. 5,590, 262, issued to Isadore-Barreca teaches a method for converting a conventional "moving picture" video into a computer/user interface means. In accomplishing the method of that previous invention, it is necessary to identify, within the video presentation, particular objects of concern. As discussed in the above referenced patent disclosure, such identification can be quite laborious, and it was anticipated that methods for transferring some of that labor from the human operator to the computer might be developed in the future. It was disclosed that the designation of "hot spots", consisting of objects within a moving video, were, ".... accomplished by viewing each key frame and. at least until a more automated system is developed therefor, manually designating which, if any, objects or items of interest in the key frame are to be designated as the hot spots." A copending U.S. Patent Application No. 08/435,439, was subsequently directed to a method and means for automating the identification of such objects and maintaining such identification through time. Briefly, that disclosure teaches a means and method wherein an object is first identified within a single frame of a moving video image. It was anticipated that such initial identification could be accomplished using extensive originating user input, or by more automated methods, and two alternative general methods for accomplishing such identification are taught as being the, then, presently best known embodiments for accomplishing the invention. Although the known methods for identifying an object in a video image are useful for many purposes, clearly it would be desirable to have an improved means and method for better and more easily identifying, for use with a computer, objects within a digitized video image, and for tracking such objects in a moving video.

DISCLOSURE OF INVENTION Accordingly, it is an object- of the present invention to provide a means and method for identifying visually perceptible objects in a digitized video image.

It is still another object of the present invention to provide a means and method for identifying visually perceptible objects which will accurately define the limits of such objects.

It is yet another object of the present invention to provide a means and method for identifying visually perceptible objects in a digitized video image which requires a minimum of operator intervention.

It is still another object of the present invention to provide a means and method for identifying visually perceptible objects using a computer which will not overly tax the capabilities of readily available computers.

It is yet another object of the present invention to provide a means and method for identifying and tracking visually perceptible objects in a video image even as such objects move within the image. Briefly, the inventive object specification system uses a computer to allow an operator to place a "tag" on an edge of an object in an image. The computer then will identify other potential tags and the operator will accept the potential tags as desired, according to the application. Potential tags are identified according to criteria which is inherent to the method and further according to criteria which can be modified as required, and tag identification is a function of the combining of such criteria according to the present inventive method.

Although the object specification method and means described and claimed herein was originally intended to aid in tracking moving objects within a digital video image, these can also be used to pick out objects in any digital bit map, including, for example, still photographs, or individual 2D or 3D graphic images. Regarding the present invention, it should be understood that computers will play a part both in the inventive method for identifying objects within a video presentation and, also, computers (more than likely, different computers than those used for originally identifying the objects) will be employed to eventually use the end product of the present invention. In this regard, there will be reference hereinafter to "originating user(s)", those being the users who use the present inventive method for creating initially identifying ("tagging") objects in a video image. References to "end user(s)" will be to those persons who. rather than directly using the present inventive method, will use the computer/user interface or other product produced by the originating user according to the present inventive method.

An advantage of the present invention is that operator effort is minimized in the selection of tags to identify an image in a digitized picture.

A further advantage of the present invention is that usefulness of tags is optimized by assisting in the selection of tag locations. Yet another advantage of the present invention is that objects identified according to the inventive method may be more easily tracked by a computer where the objects are in a moving video image.

Still another advantage of the present invention is that the inventive method is easy to implement using conventional readily available computers and peripheral devices. These and other objects and advantages of the present invention will become clear to those skilled in the art in view of the description of the best presently known mode of carrying out the invention and the industrial applicability of the preferred embodiment as described herein and as illustrated in the several figures of the drawing.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a flow chart depicting an object specification method according to the present invention:

Fig. 2 is a computer system having displayed thereon a video image such as is acted upon according to the present inventive method; Fig. 3 is a potential tag location diagram.

BEST MODE FOR CARRYING OUT INVENTION

The best presently known mode for carrying out the invention is an object specification method instituted, primarily, through the use of a computer. The predominant expected usage of the inventive object specification method is in the segregation of visually identifiable objects in a digitized video image such that the objects can be separated processed or treated by the computer. The inventive object specification method is depicted in a flow diagram in Fig. 1 and is designated therein by the general reference character 10. Fig. 2 is an elevational view of a computer system 12 such as is used by the originating user to practice the present invention.

Referring to Fig. 1, the object specification method 10 begins with a "place tags" operation 14. The place tags operation is discussed in detail in the issued United States Patent No. 5,590,262 ("the '262 patent"), discussed previously herein. Briefly, in the place tags operation 14 the originating user places a plurality of tags 16 (Fig. 2) on an outer edge 18 of an object 20 within a video image 22 displayed on a display screen 24 of the computer system 12. Such operations are well known, such that one skilled in the art will understand how to cause a computer 26 of the computer system 12 to place the tags 16 when the originating user points with a mouse 28 and clicks with a mouse button 30. As was discussed in more detail in the '262 patent a tag 16 is a set of pixels that is a subset of the pixels in a single frame of the video image 22 defining the object 20 that will be specified by the originating user and tracked by the computer system 12. By definition, the tag 16 must contain at least one edge 18 of an object 20. The minimum tag size is 9X9 pixels, but tags may be multiples of the basic Sobel mask size (3N X 3N), where N is equal to or greater than 3.

As one who is familiar with the placing of tags 16 understands, each tag will have a region of interest 32, which is generally that portion of the tag 16 which lies within the object 20, while the remainder of the tag 16 lies outside the region of interest. While the determination and use in tracking of the region of interest ("ROI") 32 is known in the art. due primarily to the teachings of the '262 patent, briefly, the ROI 32 is determined as follows: Using the nearest neighbor tags 16 on either side of a tag 16 for which the ROI 32 is to be calculated, compute the angle formed by these three tags 16 with the subject tag 16 placed at the vertex. Then, the pixels in the subject tag 16 that are subtended by the angle computed in the previous step form the ROI 32 for that tag 16. Determination of the ROI 32 may also require the use of a tag interiority test to determine which side of the edge 18 is inside the ROI 32 and which is outside. A tag interiority test is briefly described hereinafter in relation to a "functional specification of heuristics". Having established the ROI 32, it can be appreciated that the edge 18 might then be redefined as being that juncture within the tag 16 with an ROI 32 that can be found using standard computational techniques for finding differences in color, saturation, brightness, or other attributes of digitally expressed color spaces.

Returning again to Fig. 1, following the place tags operation 14 is an apply map and validity tests operation 32. A map is an algorithm that accepts as input a single tag 16 and that outputs another tag 16 that is based on the first but that is different in some way. In addition to the new output tag, a map may also generate additional optional information. Types of maps include: Filters, which are used to average or eliminate noise within a given color space or to emphasize features; Transforms, which include color space conversions, convolutions used as part of a feature extraction process such as the known Sobel process or LaPlace edge detection, and difference functions used to enhance the process of feature extraction; and, Statistics, such that statistical characteristics of tags, such as groups with ROI's are extracted. That is, in the apply map and validity tests operation, maps are used to "clean up" the tag so that it is more readily identified in later tracking operations. It is anticipated by the inventor that validity tests to be applied in the apply map and validity tests operation 34 will include an algorithm that asses whether an individual tag 16 is valid when it is evaluated within a set of tags 16 defining an object 20. An example of a tag validity algorithm is a Tag-In-Context test, described here in relation to a functional specification of heuristics. In a "valid?" decision operation, if a tag 16 is not valid, the originating user is returned to the place tags operation 14 to try again. When all tags 16 on the edge 18 have been established as valid, then the object specification method 10 proceeds to a "roll mode" 38 wherein subsequent operations are performed on successive pairs of frames of the of a moving video image 22. Again, this sort of analysis has been described in greater detail in the '262 patent. First in the roll mode 38, the object specification method 10 performs a score possible matches and create list operation 40. Fig. 3 is a potential tag location diagram 42 showing a plurality of potential tag locations 16a relative to a tag 16 (the tag 16 being the location of that tag 16 in the just previous frame of the moving video image 22). As can be seen in the view of Fig. 3, the potential tag locations 16a are pixel groups which are removed from the tag by a factor "r = n" where r is the distance the potential tag location 16a is removed form the tag 16 and n can be a distance of one pixel, one unit of pixels, or one tag size, from the previous location of the tag 16. For each subsequent frame of the video image 22, a list is created for each tag 16 providing a score for each potential tag location 16a. The score possible matches and create list operation 40 is performed for each tag 16 on each object 20 of interest (since there may be more than one object 20 of interest in a frame of the video image 22) and for each frame of the moving video image 22). For the sake of clarity, this multiple looping is not shown in the simplified flow diagram of figure 1.

Next in the object specification method 10 is a select anchor tag operation 44. An anchor tag 16b (Fig. 2) is located at that tag 16 which has the tightest cluster of scores for the several potential tag locations 16a (Fig. 3). The anchor tag 16b is the tag 16 which is used as the principal point of reference in applying certain validity tests which will discussed hereinafter. The anchor tag 44 is selected by evaluating statistics associated with scoring the set of possible match tags.

Once the anchor tag 16b is selected, a rigid geometry test 46 is applied to determine of the object 20 is rigid (from the point of view of an observer of the image 22) such that all of the tags 16 appear to be moving in a consistent manner - as compared to a just previous frame of the video image 22. The rigid geometry test 46 tests whether the spatial relations between tags 16 delineating an object remain constant form frame to frame.

When a frame "fails" the rigid geometry test 46 flow continues to an articulated motion test 48 wherein it is determined whether a portion of the object 20 is. itself, rigid but is moving in relation to a remainder of the object 22 (such as. for example, a stationary dog wagging its tail). That is. the articulated motion test 48 determines when parts or sections of rigid objects

20 move with respect to the remainder of the object.

When it is determined that there is neither a rigid object or articulated motion, flow continues to a rotation shift test 50 wherein it is determined if the object 20 is moving in such a manner that it appears to be both changing in dimension and moving - such as where it might be rotating on two axis at the same time. The rotation shift test identifies objects 20 that exhibit perspective, rotation, or shading changes.

Where any of the rigid geometry test 46, the articulate motion test 48 or the rotation shift test 50 are "passed", then it will be possible to describe the movement of the tags 16 as a vector continuation from the previous frame, and so flow is directed to a store tag locations operation

52 and the roll mode 38 is repeated for each object 20 in the video image 22 and. eventually, for each frame of the video image.

Where none of either the rigid geometry test 46, the articulate motion test 48 or the rotation shift test 50 are "passed", then it will not be possible to describe the movement of the tags 16 as a vector continuation from the previous frame and the new location of a tag 16 cannot be definitively located and so flow is directed to a get help from user operation 54, wherein the originating user can, effectively, "place" a tag (if required) or direct the flow of the object specification method 10 as may otherwise be provided for in future iterations of the inventive method.

When all of the tags 16 have been located for all of the frames of the moving video image 22, a calculate vectors and vector change frames operation 56 uses the data previously obtained in the object specification method 10 to calculate, for each tag 16, a motion vector which will continue through the running of the moving video image 22 until a new motion vector for a given frame is recorded. In this manner, when the moving video image 22 is provided to an end user (as, for example, as a part of a program to be purchased and used by the end user), the inventive object specification method 10 will not have to be accomplished each time the moving video image 22 is presented to the end user. Indeed, the inventive object specification method 10 could not reasonably be accomplished each time the moving video image 22 is presented to the end user for reasons including that it requires operator intervention, and far too much computing power to accomplish in "real time". Rather, the location of the object 20 will be known to the computer 26 (which will, typically be a different computer 26 than the one originally employed to accomplish the object specification method) will keep track of the location of the object 20 using the tag vectors established in the calculate vectors and vector change frames operation 56.

As previously discussed herein, various tests are provided for determining the validity of a tag 16. both initially and as the tag 16 moves through time in the moving video image 22. Also, the validity of the position of tags 16 is tested in relation to the position of other tags to determine that the tags 16 adequately delineate the boundaries of the object 20. Object validity tests are applied during each of the rigid geometry test 46. the articulated motion test 48 and the rotation shift test 50. The object validity tests examine a set of tags 16 to determine whether or not the specified set is sufficient to validly characterize an object 22. An example of an object validity test is the Horizon test, described herein in relation to a functional specification of tests.

Regarding the functional specification of tests : A tag interiority test is a part of the apply map and validity tests operation 14. The tag interiority test automatically determines the tag 16

ROI 32 by determining which side of an edge 18 is interior, that is. which side is a part of the object 20 partially defined by the tag, and computing the relevant characteristics of that ROI 32 (such as. color and/or luminance). The tag interiority test uses a topological approach that looks at the overall tag set to perform its inside/outside determination.

A tag horizon test is to ensures that when an originating user creates a tag set bounding (delineating) a particular object 20, that no significant part of that object 20 is omitted. It may be that visual inspection by the originating user will be required to insure that this is adequately accomplished. The tag horizon test is also presently a part of the apply map and validity tests operation. A tag in context test identifies tags 16 that can be ignored when tracking. These tags 16 are those that can be moved without causing a change in the tag code (that is, without being able to detect a change in the tag's characterization).

Various modifications are possible within the scope of the invention. For example, the specific tests discussed herein could be augmented or replaced with other tests. Indeed, the inventor is still experimenting with the use of various tests to be included within the broader operations described herein.

Since the object specification method 10 of the present invention may be readily produced and integrated into existing systems and methods for identifying and tracking objects in a video image, and since the advantages as described herein are provided, it is expected that it will be readily accepted in the industry. For these and other reasons, it is expected that the utility and industrial applicability of the invention will be both significant in scope and long lasting in duration.

All of the above are only some of the examples of available embodiments of the present invention. Those skilled in the art will readily observe that numerous other modifications and alterations may be made without departing from the spirit and scope of the invention.

Accordingly, the above disclosure is not intended as limiting and the appended claims are to be interpreted as encompassing the entire scope of the invention.

Claims

IN THE CLAIMS:

1. A method for identifying an object in a digitized video image, comprising the steps of:

(a) delimiting the object with a plurality of tags;

(b) checking that each of said tags meets specified requirements.

2. The method of claim 1 , and further including: a roll mode which includes the steps of;

(c) identifying potential tag matches; and

(d) scoring said potential tag matches to determine an anchor tag.

3. The method of claim 1. wherein: the digitized video image is a moving picture video; and steps subsequent to (a) are performed on frames of the digitized video image subsequent to an initial frame of the moving picture video.

4. The method of claim 3, and further including: motion tests to determine if movement of a tag can be predicted according to a vector quantity.

5. The method of claim 4. wherein: the motion tests include a rigid geometry test.

6. The method of claim 4, wherein: the motion tests include a rotation shift test

7. The method of claim 5, wherein: the motion tests include an articulated motion test.

8. The method of claim 4, and further including: a calculate vectors operation wherein movement of the tags is stored as a vector quantity.

9. The method of claim 8, wherein: a new vector quantity is calculated and recorded when movement of a tag is no longer predictable by a previous vector quantity.

10. A system for tracking an object in a moving video image, comprising: a computer with a display screen for displaying the image; and a pointing device for allowing an originating user to place a plurality of tags delineating the object; wherein when the user places a tag on the object the validity of the tag is tested to determine if it placed on an identifiable edge of the object.

11. The system of claim 10. wherein: the validity of the tag is tested to determine if it is placed on an identifiable edge of the object repeatedly as the moving video image is progressed through time.

12. The system of claim 10. wherein: the validity of the positioning of the tag is further tested over time to determine if the tag delineates the object.

13. The system of claim 10. wherein: each tag is scored to determine if the tag has moved form a previous tag location.

14. The system of claim 13. wherein: an anchor tag is selected according to the clustering of scores.

15. The system of claim 10, wherein: a rigid geometry test is applied to determine if movement of the tags can be described by a vector quantity.

16. The system of claim 10, wherein: an articulated motion test is applied to determine if movement of some of the tags can be described by a vector quantity.

17. The system of claim 10, wherein: a rotation shift test is applied to see if movement of any of the tags can be described by a vector quantity.