WO2004086751A2

WO2004086751A2 - Method for estimating logo visibility and exposure in video

Info

Publication number: WO2004086751A2
Application number: PCT/CH2004/000182
Authority: WO
Inventors: Sergei Startchik
Original assignee: Sergei Startchik
Priority date: 2003-03-27
Filing date: 2004-03-24
Publication date: 2004-10-07
Also published as: EP1611738A2; WO2004086751A3

Abstract

The visibility and exposure of logos in video is of important commercial concern for advertisement on television. During a tennis match for example a logo placed on the flat or round surface appears from different viewpoints, partially visible or occluded. The logo visibility time, its size (relative to the screen), position (relative to the terrain) and percent of the non-occluded part allow to directly compute its advertising impact. This patent discloses a method to automatically compute the visible part of the given logo in the video sequence. For each frame of the video sequence the method computes four parameters that describe visibility of a given logo. First, it is the percentage of the visible part with respect to the whole logo. Second and third, are the position and size of the logo with respect to the image. Fourth, is the quality (blur, motion) of the visible part. The advantages of the new method is its capacity to work with real video sequences including high occlusion, illumination changes and blur. These latter conditions are realistic thus making the method very useful and robust for this application. (Fig. 1) The obtained information can be used for logo modification or replacement as well as for its quality enhancement.

Description

Method for estimating logo visibility and exposure in video.

References cited

US 2003/0016921 Al

TW 434520 US 61,00,941 WO 0007367 US 4817166 WO 0152547

Keywords: Logo visibility, Multimedia, Advertisement, Video sequence, geometric invariants, chromatic invariants, point tuples.

10 claims, 7 drawing sheets

I. TECHNICAL FIELD OF THE INVENTION

The visibility and exposure of logos in video is of important commercial concern for advertisement on television. During a tennis match for exapmle a logo placed on the flat or round surface appears from different viewpoints, partially visible or occluded. The logo visibility time, its size (relative to the screen), position (relative to the terrain) and percent of the non-occluded part allows to directly compute its advertising impact. This patent discloses a method for automatic computation of the visible part of the given logo in the video sequence. For each frame of the video sequence the method computes four parameters that describe the visibility of a given logo. First parameter is the percentage of the visible part with respect to the whole logo. Second and third are the position and size of the logo with respect to the image. Fourth, is the quality (motion blur) of the visible part.

The present invention belongs to the domain of registration (or matching) methods in computer vision. In particular, methods of registration of two-dimensional (planar) patterns like logos under pro- jective transformation (modelling the image acquisition by a camera) are addressed. The registration of patterns is based on registration of features of those patterns. Therefore, registration methods are classified with respect to features that are used.

II. BACKGROUND ART

Technically speaking, some methods use template matching where an example of the logo original is matched to the image by applying all possible transformations to the original and evaluate the match. The best found match is the position of the logo. This method is very sensitive to occlusion, size changes and blur. Another family of methods use local features of the pattern. In particular, if the shape is composed of several curves and edges, comparison of their features can be used as registration. How- ever, the limitation of this method is the requirement for the pattern/logo to have special features th^t can not be guaranteed in practice. The algorithm is unable to locate logos that do not have such feature _s.

Existing patents have a distant relation to the current invention. For example US 2003/0016921 A suggests a logo changing system. Other patents related to the application of logo (or other pattern) location in a video sequence are TW 434520, US 61,00,941, WO 0007367. Methods in adjacent applications were also explored in US 4817166. The aspect of logo modification in a video sequence is covered in WO 0152547.

The advantages of the new method is its capacity to work with real conditions including high occlusion of the logo, illumination changes and blur. These latter conditions are very frequent in sport events (see Fig. 2) thus making the method very useful for this application.

III. DISCLOSURE OF THE INVENTION

The method for registration of visible part of the logo is based on the dense and massive matching of point tuples. Each point tuple is matched using the comparison of its properties described by values that are invariant to geometric transformation produced by a camera and chromatic transformations produced by illumination changes.

The general algorithm of registration is presented on Fig. 3 and comprises several stages explained in the following sections.

First, an example image is used to specify a user-defined zone. Second, as many locally unique individual tuples as possible are found from that zone and their geometric and chromatic features stored are computed. The information about each tuple as well as the relative position of each tuple with respect to the user-defined zone is stored. Learning of locally unique tuples from example image is described in section III.3. The set of all tuples correspond to logo representation.

Third, the obtained representation is used to search for learned logo in a given video sequence by analysing each individual frame , . The stage of computing visibility of a particular logo in 1 frame consist of registering numerous learned tuples corresponding to example image and verifying their consistency. The stage of registration of individual tuple is explained in section III.1. The stage for computing the registration of a particular logo from registered tuples is described in III.4.

The main advantage of the invention is that a top-down approach is used to look for point tuples and that no support region is used to find points. Therefore, high resolution matching can be performed independently of any occlusion.

1. Registering an individual tuple

Registering a tuple of several points P , in two images is done by comparing joint geometric and chromatic properties of those points. Depending on the complexity of transformation, involved in image formation a various number of points and their properties is required to perform the registration robustly.

The geometric transformation from the plane with the logo in the scene to the sensor of the camera (and so the image) can be modelled by 2D affine, 2D projective or more complex transformation. Some additional transformation is required to account for lens distortion, but in the case of television cameras the distance to the object reduces the distortion effect. If 2D affine transformation is used to model camera, four points are necessary to compute two values that are independent on that transformation and thus characterizes the tuple independently of that information. If a 2D projective transformation is used, already five points are necessary to compute two different geometric invariant values. The known geometric invariant to projective transformation is the two cross-ratios of five points defined by:

where the determinants are defined as:

where x_l,y_[ are the coordinates of the point coordinates in the image. These two invariants will be used to search for matching tuples.

Another property of the tuple of points that can be used as a stable characteristic is the value based on chromatic values of individual points and that is invariant to illumination changes. If the changing illumination (sun, clouds, reflections) is modelled by the scaling transformation in rgb domain, two points are needed to compute one invariant value. The invariant to this chromatic transformation is the ratio of several rgb values. Therefore, for five points selected above four invariant values can be computed.

First, we describe the way individual tuples are searched in the image. Several types of transformations are considered here. Let us assume, first, that no illumination changes are present and the geometric transformation is the 2D projective transformation. Adaptation to illumination changes is described in section 5. Thus, a five-tuple will be used for search. A tuple of five points is characterized by two geometric invariants I_χ ,/₂ and five chromatic values c_χ, c₂, c₃, c₄, c₅. The search tries to find a tuple satisfying such values.

The general algorithm of finding a tuple of points in the image that satisfy this constraint is described in the figure Fig. 5. First, the areas of the color c, are found in the image as well as areas of other four colors c₂ ,c₃ ,c₄ ,c₅ . Then, tuples of points that strictly satisfy the geometric constraint of the projective invariants /, ,/₂ are found. Finally, the tuple that have the minimum of the difference between values of original chromatic invariant and values of chromatic values of found points:

. „ . learned. ... ^m"»∑(^c| - ^cι ) (³)

is retained to obtain the best position of the tuple.

2. Optimal searching with subpixel accuracy

If the learning of point tuples is performed in the pixel-based manner in the original image, the search should be done without relying on the pixel grid of the image being searched. Therefore, a sub- pixeling computations should be done during search. This is achieved by selecting instead of individual pixels the triples of neighbour pixels whose intensity (or color) values establish an interval containing the value that is searched for as shown in Fig. 7. The algorithms with subpixel precision is described in the Fig. 6.

First, for each of five points, instead of individual pixels, triangles of pixels (P Q_t, R_t as shown in Fig. 7) are selected that have values defining an interval containing the value. For example, if the searched intensity value is 128 in graylevel intensity, the triangle of pixels with values 140, 124, 127 does contain the 128 value somewhere between them and thus can be selected.

Such selection of triangles is done for each point as shown in Fig. 7. Now each point S_t position can vary inside the triangle P,, Q_t, R, , its coordinates are expressed in barycentric form with respect to the three points defining the triangle. This form defines each point position with two parameters «,, v, , so P_t = [x₍(w,, v_f), >>,(«,, v₍)] . Thus four points will be defined with eight parameters. For each of the four points an interpolated intensity value can be computed c u_t, v₍) .

Using these points definitions, one can compute the position of the 5th point whose position will thus be defined with eight parameters u_χ, v,, w_2> v_2> w₃, v₃, u₄, v₄ and /,, 7₂ .

The geometric constraint, expressed with geometric invariant values, is satisfied by checking whether the corresponding comers of triangles P_v P₂, P^, P , P₅ satisfy the invariant values (withing certain tolerance).

For the fifth point the chromatic value is computed by interpolation and the final expression for c₅ will depend on eight parameters c₅{u_{, v,, w₂, v₂, M₃, V₃, U₄, V₄) . Computing a chromatic value with those expressions c₍( ) and subtracting a learned values, one obtains the error function for tuple position:

^min , , ^{» v}) - ^{c r}"^{e (}"₍> ^v,^{)) ( )}

Then, an interpolated values are computed and a close-form solution is produced. Minimising this expression with respect to the eight parameters give the position of the five points inside the triangles.

3. Learning the representative tuples

The algorithm for learning a suitable representation of a given pattern is outlined in Fig. 4 and illustrated in Fig. 1.

First, an area Learning zone (10) that contains an unoccluded pattern Logo (6) is specified by the user as in Fig. 1. Then, this area is searched for tuples of points that are representative in this whole area. In other words, tuples need to be found whose properties (geometric and chromatic invariant values) are sufficiently different from properties of all other point tuples in the learning zone. Tuples, satisfying such unicity conditions will not have similar tuples in the whole learning area and could, thus, be easily found at the search stage without mismatches.

The algorithm of search for such unique tuples is described in Fig. 4. The Fig. 1 illustrates the operation of representative tuple research. First, the Candidate tuple (14) is selected so that all points have different color or intensity values to increase discrimination between points.

Then, a fixed neighbourhood Search zone (9) is defined around each of the points of the tuple in the Image frame (8). For each tuple of neighbouring points Potential neighbour tuple (13) from those search zones its geometric and chromatic invariants are computed and their values are compared with invariants of the Candidate tuple (14) that is being tested for unicity. In practice, the Search zone (9) is equivalent to the whole Learning zone (10) to obtain full unicity. Also, instead of comparing values of each Candidate tuple (14) with all Potential neighbour tuple (13) in the Search zone (9), one need to compute invariant values for all possible tuples in the Learning zone (10) and select those that do n_σt have similar ones.

The similarity between the Candidate tuple (14) and Potential neighbour tuple (13) is measured according to several criterii. First, the difference of absolute intensity values of corresponding points should be satisfied:

₍.candidate _ ₍.neighbor _ < T C5)

Second the difference between geometric invariant values should be below certain threshold: jcandidate _ /neighbor _ § < y

(6)

/candidate _ /neighbor _ § < y 2. 2. Z o,

Finally, the difference between chromatic values for the whole tuple values of individual points and differences between combined chromatic values: x £-^ ( ,<?,( ,«,, v, \) - _C l_|earned (,",, v, .).) < Δ . _ΛGg ( ,7_)

Tuples that satisfy this unicity criterion are stored as part of logo representation. The use of those pixels in other tuples is reduced (they, however, will be used in neighbor tuples for comparison). The redundancy of representation is important, the participation of each pixel in a fixed number of tuples (for example ten). One avoids the use of one pixel in too many tuples to avoid extreme dependence on this point.

4. Massive matching with many tuples

The learning process, described above, will gradually use all the pixels in the Learning zone (10) (c.f. Fig. 1 ). If each pixel will be used several times in one or more Learned tuple (11) this area will be covered with several "layers" of tuples in a dense manner. With more tuples using a pixel, the whole representation will gain in reliability. At the end the whole representation will correspond to the large number of tuples characterised by their absolute intensity values, geometric invariant values and relative chromatic. When such dense representation is constructed, it can then be used for locating this logo in a new image.

The operation of locating a logo in a new image corresponds to the search of learned tuples in this image. The whole image will be analysed for the presence of the learned tuples one by one. The density of representation with tuples allows to deal with almost random occlusion since every point is virtually related to (at least) four other points at various parts of the image. The Logo (6) can be occluded in any way by an Occluding object (7) as shown in Fig. 2, there will always be visible points not hidden by this object and that are related by tuples.

Once all tuples (and thus all points) that belong to the logo were identified, one can proceed with their processing. First, the position of the logo frame is estimated. Every learned tuple contain information about its position relatively to the Learning zone (10). This information (which is in fact a transformation) could be inversed and the position of the Learning zone (10) (or reference frame) obtained from the position of the tuple. There will be false matches between tuples, therefore taking the frarne position confirmed by the majority of found tuples allows to find the right frame. This frame is also used to compute the position of the logo on the screen as well as its size relative to the screen.

Once the reference frame is constructed, one can compute the visible part. This is estimated as the number of pixels inside the frame covered by found tuples to the total number of pixels in the frarπe. This is done by computing the ratio of visually found points to the total number of points associated with particular logo. A different measure can be applied since not all points are visible all the time and something relative to pixels can be applied.

When the logo position and its visible part were identified, several postprocessing operations can be invoked. First, the points inside the logo frame that were not identified as belonging to logo (and thus belong to the occluding object) can be restored to the usual values of the logo. This operation would remove the occluding object and restore the full visibility of the logo.

Another operation that can be performed once all the point tuples were identified, is the modification of the visible part of the logo in order to improve its visual quality or provide a neat image of the logo. If the number of the identified points is sufficient, one can add many other points that can improve the resolution of the logo if it is viewed as a remote object.

Finally, the last operation that can be performed is the replacement of the logo by visual information that is perceptually different from the logo appearance. This operation might be useful for hiding the logo, if its visibility in this video sequence is not desired.

5. Searching with illumination changes

When illumination changes occur, the chromatic values in the observed image are transformed. This transformation can be modelled with several approximating transformations like for example scaling of each of rgb channels, linear transformation in rgb space, etc. The scaling transformation corresponds to the scaling of the every chromatic channel by an independent scaling factor:

(8)

Therefore, absolute chromatic values of the analysed image cannot be used directly. The only reliable information that remains is the chromatic invariants that are computed from the color values of the points in the tuple and independent of such transformation.

Chromatic invariants for this diagonal transform require chromatic values of two points for its computation:

/ - ^r i ^J\ - - £ ,l i ^l\ - - Λ —^b (9) r₂ 8₂ b₂

Therefore, we will compute these invariant values for every pair of pixels in the tuple. Then, these values will play a role of additional constraint for selection of tuples. This selection will thus be independent of the illumination changes, all pairs that have that value are retrieved. In the search image, these values can be computed only by combining several (N-tuples). Even if the number of those corfc. bination is high we are obliged to compute them since no other information is available.

Since the intensity values are discreet, a table can be constructed to perform this search. This st^p is an optional step in the search algorithm that is outlined in

IV. BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 An example image of a logo and tuple being learned

Fig. 2 Example of logo occlusion.

Fig. 3 General algorithm for estimating visibility.

Fig. 4 Algorithm for findind the candidate five-point.

Fig. 5 Algorithm for finding learned 5-tuple in the new image (pixel-based algorithm).

Fig. 6 Algorithm for finding learned 5-tuple in the new image (sub-pixel algorithm).

Fig. 7 Subpixel search

V. DESCRIPTION OF THE PREFERRED EMBODIMENT

This section, describes the implementation of the described invention and how it can be realized in practice.

The algorithms of learning tuples and searching them can be implemented as an image processing software modules. This software can be run on a DSP within an embedded system or on a stardard computer that is connected to a camera.

Claims

VI. WE CLAIM:

1. A method for automatic computation of the presence, position and visibility of 2D surface patterns (such as logos) in a video sequence based on matching sets of point tuples, related by geometric and illumination invariants, learned from example image and registered in every frame of the video sequence.

2. A method as in claim 1 where the point tuples correspond to the groups of five or more points linked by 2D projective geometric invariants, groups of four or more points linked by 2D affine geometric invariants or sets of more than five points linked by invariants of transformations of higher order including lens distortion.

3. A method as in claim 1 where the tuples of four, five or more points are characterised with chromatic invariants that are independent on illumination changes approximated by scaling or linear transformation in RGB space.

4. A method as in claim 1 where the algorithm of learning representative tuples of the pattern and their relationship from example image of the pattern consist of several steps:

- take an image from the video sequence that contains the logo and select the reference frame (or rectangle) containing the fully visible pattern

- find point tuples that have geometric and chromatic invariant properties that are unique with respect to the whole pattern inside the reference frame

- store the properties of each tuple in indexing table for fast access

- store relative coordinates of each tuple found with respect to the reference rectangle of the pattern.

5. A method as in claim 1 where the registration of tuples is done in a top-down manner without using any type of surrounding support region of each individual point.

6. A method as in claim 1 where every point of the learned pattern is used in several representative tuples that include points from various parts of the pattern thus making the representation of the pattern dense, distributed and redundant.

7. A method as in claim 1 where the algorithm of registering a learned pattern in a given image frame of a video sequence consist of the following steps:

- the search for all possible tuples from the learned pattern

- computation of the area of the visible part of the logo by evaluating the area covered by found tuples with respect to the whole pattern area

- estimation of the position, orientation and visual quality of the pattern

8. A method as in claim 1 where a geometric invariants of tuples are attributed a canonical value corresponding to the frontal view of the pattern in order to rectify the observed image to the frontal view of a pattern or to recover the orientation of the camera with respect to the plane that bears this pattern.

9. A method as in claim 1 where the pattern registered with numerous tuples can be completed with tuples from its model that were not registered in order to achieve full visibility of the pattern with increased quality; the registered tuples can be replaced with information that is perceptually different from the original logo in order to make the pattern unrecognizable by human.

10. A method as in claim 1 where sets of point tuples characterising the logo as well as geometric and illumination invariants characterising the tuples can be used to estimate the similarity between different logos for similarity search in logo databases.