CN100413327C

CN100413327C - A video object mask method based on the profile space and time feature

Info

Publication number: CN100413327C
Application number: CNB2006100533980A
Authority: CN
Inventors: 庄越挺; 董兆华; 肖俊
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2006-09-14
Filing date: 2006-09-14
Publication date: 2008-08-20
Anticipated expiration: 2026-09-14
Also published as: CN1997114A

Abstract

This invention discloses one visual front view subject label method, which comprises the following steps: a, dividing one section into several parts each with one key frame and several non key frames; b, for key frame, requiring user to input indication information with designed front view subject and back subject key part for label; c, for non key frame, according to key frame label results, using the front view part color distribution and shape and back color as known information's to label non key frame.

Description

A kind of object video mask method based on profile space and time feature

Technical field

The present invention relates to field of video processing, relate in particular to a kind of object video mask method based on profile space and time feature.

Background technology

Along with the universalness of digital camera and Digital Video, interactive image and Video processing become very popular and forward position research direction.Wherein, realize the video foreground high efficiency extraction, and then the prospect that extracts is synthesized in the new video sequence, perhaps this prospect is carried out operations such as cartoon style editor, become important techniques of video field with interactive means.

Video is that successive image frame constitutes, and prospect and background object are cut apart and can be carried out interactive operation respectively to every frame in the realization video flowing, use the display foreground extracting method, obtain the prospect and the background of each frame, thereby generate the prospect and the background of whole section video.But there is following several problem in this means: at first, need a large amount of repeated tedious work, promptly the user needs to point out alternately to background on every frame and prospect; Secondly, this means are handled respectively every frame, do not consider the time continuity between them, and a spot of difference all can be shone into visual jump between the consecutive frame.

If can follow the tracks of object of which movement in the video flowing exactly, can extract prospect and background according to interaction mechanism to key frame so, mutual knowledge and motion tracking result are diffused into reach non-key frame prospect of automatic extraction and background purpose on the non-key frame.People such as Hertzmann just adopt the light stream algorithm for estimating to follow the tracks of the motion of object [3], but the light stream algorithm for estimating is difficult to robust and obtains motion tracking result in the ordinary video at present, therefore can not obtain prospect on the non-key frame with the light stream algorithm for estimating, come the dynamic update tracing process but the light stream results estimated can be used as a constraints.Based on such thinking, the present invention proposes and a kind ofly on key frame, carry out interactive prospect background and extract, non-key frame is according to carrying out the method that video foreground extracts with key frame correlation and the spatial coherence of non-key frame own on sequential.

It is to carry out foreground extraction on the basis that profile extracts that a large amount of work is arranged.People such as Hall have proposed a kind of user and have supervised bottom profiled extracting method [2], and it allows the user that foreground object profile in some frames is sketched the contours, and then other frame is carried out interpolation, obtains the prospect profile of other frames like this, thereby obtains foreground area.This method needs a large amount of manpowers to sketch the contours profile, and for the video of rapid movement, the frame number that need manually sketch the contours is just many more, otherwise the mistake that the interpolation result of intermediate frame produces is just big more.People such as Agarwala have proposed a kind of based on optimizing and user interactions extracts the method [1] of profile, reduced the user alternately.But these methods are to exist circumscribedly, and its represents body form with approximate silhouette edge, and the object that edge details is abundant is lost these detailed information easily.And these methods require the line of demarcation of foreground object and background to want obviously.

Also there is a few thing to be based on to carry out on the basis of object piecemeal foreground extraction.People such as Wang have proposed the video foreground extracting method [6] that a kind of interactive map is cut apart, and use Mean-Shift that image is carried out pre-piecemeal, reduce the partial amt that needs.They have increased local cost function on the basis of global cost function, promptly statistical modeling is carried out in background and the place of striding mark, then it are carried out minimal cut and handle.People's such as Li algorithm also is the method [4] that a kind of figure of utilization partitioning algorithm carries out Video Object Extraction, this method synthesis has considered that each part is with respect to the color correlation of prospect and background color distribution on the key frame, and the color distortion in adjacent two zones that stride across object edge is maximized, simultaneously, also considered the temporal correlation of object of which movement.But, when object color when background color is similar on every side, the edge misjudgment appears in these two kinds of methods.

[1]A.Agarwala，A.Hertzmann，D.H.Salesin，and?S.M.Seitz.Keyframe-Based?Tracking?forRotoscoping?and?Animation.In?Proceedings?of?ACM?SIGGRAPH?2004.2004.pp.584-591

[2]J.Hall，D.Greenhill?and?G.Jones.Segmenting?Film?Sequences?using?Active?Surfaces.InInternational?Conference?on?Image?Processing(ICIP).1997.pp.751-754

[3]A.Hertzmann?and?K.Perlin.Painterly?Rendering?for?Video?and?Interaction.In?Proceedings?ofthe?lst?International?Symposium?on?Non-photorealistic?Animation?and?Rendering.2000.pp.7-12

[4]Y.Li，J.Sun?and?H.Y.Shum.Video?Object?Cut?and?Paste.In?Proceedings?of?ACMSISGGRAPH?2005.2005.pp.595-600

[5]L.Vincent?and?P.Soille.Watersheds?in?Digital?Spaces：An?Efficient?Algorithm?Based?onImmersion?Simulations.IEEE?Tran.on?Pattern?Analysis?and?Machine?Intelligence.1991.13(6)，pp.583-598

[6]J.Wang，P.Bhat，R.A.Colbum，M.Agrawala?and?M.F.Cohen.Interactive?Video?Cutout.InProceedings?of?ACM?SIGGRAPH?2005.2005.pp.585-594

Summary of the invention

The purpose of this invention is to provide a kind of object video mask method based on profile space and time feature.

It comprises the steps:

(1) one section video is divided into several portions, every part comprises some frames, in these frames, a width of cloth key frame is arranged, other all be non-key frame;

(2) for key frame, require user's input prompt information, specify some key components in foreground object and the background object, then key frame is marked, draw the subordinate relation of each part on this frame;

(3) for non-key frame, according to the annotation results on the key frame, according to the distribution of color and the shape information of prospect, and the colouring information of background, non-key frame is marked.

Described one section video is divided into several portions: according to the speed of object of which movement in the video, the frame number of each part can and it be inversely proportional to, fast under the situation, the frame number of each part is few for object of which movement, on the contrary then frame number is many.

For key frame, require user's input prompt information, specify some key components in foreground object and the background object: the user sketches the contours prospect or background with mouse on the image of key frame, draws some points, line segment and polygon, like this, for these points, line segment and polygon, they are the hard limit that mark on the key frame, and promptly the subordinate relation of these parts prospect or background in the process of mark can not change;

Key frame is marked, draw the subordinate relation of each part on this frame, key frame is marked, mask method comprises the steps:

(1) image is carried out preliminary treatment earlier, adopt the immersion watershed algorithm, adjacent in the image and pixel that color differs within certain threshold range are divided into same zone;

(2) for each zone, the field color of the average color in this zone as it;

(3) user is imported the prospect of appointment and the color value of the pixel on the background carries out cluster, obtain one group of background color center and foreground color center;

(4) the data value difference of defined range piece is the minimum value of difference between the color center of this field color and prospect or background, and difference is the distance between the color of adjacent area piece between the definition adjacent area piece;

(5) according to the difference of data value and the difference between the adjacent area piece, each region unit as a node, is constructed the figure that a width of cloth figure is cut apart, again this width of cloth figure is done minimal cut, obtain the mark that the near-optimization of this width of cloth image is dissolved.

For non-key frame, according to the annotation results on the key frame, according to the distribution of color and the shape information of prospect, and the colouring information of background, non-key frame is marked, mask method comprises the steps:

(1) according to the annotation results of key frame, the color of prospect and background is carried out cluster, this cluster result will be applied in the difference of data value of non-key frame;

(2), obtain the profile of foreground object according to the annotation results of key frame.Utilize the degree of belief broadcast algorithm, in certain range of movement, estimate the motion of object, obtain the approximate location of contour of object on the non-key frame, this profile information replenishing as adjacent area piece difference;

(3) according to the difference of data value and the difference between the adjacent area piece, as a node, the figure that structure-width of cloth figure is cut apart on non-key frame does minimal cut on this width of cloth figure, obtain the mark on the non-key frame each region unit.

Beneficial effect of the present invention

Some are about the method for video foreground mark at present, when the prospect object is similar with background color, ubiquity foreground object edge marks inaccurate situation, the present invention utilizes the degree of belief broadcast algorithm that interactive information on the key frame and prospect shape are sent on the non-key frame, and take all factors into consideration on the non-key frame each part (region unit) for the color correlation of prospect and background distributions, the color distortion in adjacent two zones, and shape information is found the solution the mark of non-key frame.Experimental result shows that the present invention can solve the foreground object edge and mark inaccurate problem.

Description of drawings

Fig. 1 is based on the object video mask method schematic flow sheet of profile space and time feature, and among the figure: 3 box indicating three steps of the present invention are input as video sequence and the user interactive information on key frame;

Fig. 2 is the structure that the figure of key frame of the present invention and non-key frame is cut apart, and among the figure: solid box is represented the figure on the key frame, and frame of broken lines is represented the figure on the non-key frame;

Fig. 3 represents the information exchanging process in the markov network;

Fig. 4 is that the radian of silhouette edge of the present invention calculates schematic diagram;

Fig. 5 is the customer interaction information on the key frame of the present invention and the result of mark;

Fig. 6 is silhouette edge motion estimation result of the present invention and annotation results;

Fig. 7 (a) is the result of the video foreground mark of people's method such as Li,

Fig. 7 (b) is an annotation results of the present invention;

Fig. 8 is the comparison diagram of people's method annotation results such as the present invention and Li, and among the figure: first row is an original video sequence, and middle row is the result of people's methods such as Li, and last column is result of the present invention.

Embodiment

The present invention utilizes the figure partitioning algorithm to realize that the prospect on key frame and the non-key frame marks.Because this mark is a two-value, therefore, definition mark X={0,1}, wherein 0 represents background, 1 expression prospect.Structure 2D figure on these images, as shown in Figure 2.Solid box is partly represented key frame among Fig. 2, and frame of broken lines is partly represented intermediate frame.Suppose that 2D figure is expressed as G={V, ε }, wherein V is the set in each zone on the image, ε is the set that connects the limit of these zones and mark.From succinct consideration, in Fig. 2, omitted the fillet of some zones and mark point.

In order to improve processing speed, utilize watershed algorithm [Vincent1991] that each frame in the video is carried out preliminary treatment earlier, be divided into some little region units.This watershed algorithm is over-segmentation, and it can keep the profile of object well.So the point shown in Fig. 2 is not a pixel, but these overdivided regions.For key frame, solve the mark problem and make Gibbs ENERGY E (X) minimize exactly:

E (X) = \underset{i &Element; V}{Σ} E_{d} (x_{i}) + α \underset{i, j &Element; ϵ}{Σ} E_{l} (x_{i}, x_{j}) - - - (1)

Wherein: E _d(x _i) be data-dependent function, just the average color of regional i is with respect to the correlation of the distribution of color in prospect and the background; E _l(x _i, x _j) be to cross over two regional i of object edge and the color distortion between the j.α regulates parameter, is used for regulating the ratio of these two functions in whole energy function, and this paper gets α=1.5.α can rule of thumb obtain, and for the apparent in view video of some contour of object, this parameter can be provided with forr a short time, and for background color and the approximate video of foreground color, this parameter can be provided with more greatly.Several function numerical value are as giving a definition in the formula (1):

\{\begin{matrix} E_{d} (x_{i} = 1) = 0, E_{d} (x_{i} = 0) = \infty & &ForAll; i &Element; F \\ E_{d} (x_{i} = 1) = \infty, E_{d} (x_{i} = 0) = 0 & &ForAll; i &Element; B \\ E_{d} (x_{i} = 1) = \frac{d_{i}^{F}}{d_{i}^{F} + d_{i}^{B}}, E_{d} (x_{i} = 0) = \frac{d_{i}^{B}}{d_{i}^{F} + d_{i}^{B}} & &ForAll; i &NotElement; F \cup B \end{matrix} - - - (2)

E_{l} (x_{i}, x_{j}) = | x_{i} - x_{j} | e^{- α {| | c_{i} {- c}_{j} | |}^{2}} - - - (3)

Wherein,

d_{k}^{F} = \min_{m} | | c_{i} - K_{m}^{F} | |,

d_{k}^{B} = \min_{n} | | c_{i} - K_{n}^{B} | |,

|| || the expression Euclidean distance.F represents the prospect seed point set of user's appointment, and B represents the background seed point set of user's appointment, c _iBe the average color on the regional i, K _m ^FBe the m class color value after prospect seed point carries out cluster, K _n ^BIt is the n class color value after background seed point carries out cluster.For E _l(x _i, x _j), when the given identical mark in adjacent two zones, promptly belonging under the situation of same object, this function value is 0, has only when these two zones different labeled is arranged, and when just the border of object is crossed in these two zones, non-0 value is arranged.

Because middle non-key frame itself does not have the interactive information that can directly utilize, so this paper uses the degree of belief broadcast algorithm to estimate the motion conditions of customer interaction information on the key frame, thereby obtains the approximate interactive information of user on the intermediate frame.The degree of belief broadcast algorithm will be introduced in the back, the estimation of customer interaction information is the same with pixel on other non-silhouette edge, but their observation function is only relevant with brightness, potential-energy function is only relevant with the space continuity, therefore the tracking of whole motion estimation process and silhouette edge is the same, just the λ in (11) and (12) formula _GAnd λ _CBe set to 0, make gradient and radian inoperative.On the numerical value meaning, these interactive information help to obtain E _dThis function.E _lOn the acquisition of function and the key frame is identical.But, owing to will utilize time continuity characteristic between frame of video, be differentiated at the energy function of non-key frame and key frame.The energy function of non-key frame can followingly be represented:

E (X) = \underset{i &Element; V}{Σ} E_{d} (x_{i}) + α \underset{i, j &Element; ϵ}{Σ} E_{l} (x_{i}, x_{j}) + β \underset{i, j &Element; ϵ}{Σ} E_{s} (x_{i}, x_{j}) - - - (4)

Compare formula (1) and (4), the difference of these two energy functions is that the energy function on the non-key frame has increased shape constraining component E _s(x _i, x _j).We use the profile track algorithm based on shape facility, according to prospect profile feature on the key frame and the time continuity between the consecutive frame, calculate the approximate location of the contour of object on the non-key frame, obtain E then _s(x _i, x _j).Profile track algorithm based on shape facility has detailed introduction in chapter 4.

The max-flow algorithm that the present invention uses [Boycov2001] to be proposed is asked minimized the separating of (1) and (4) formula, and this algorithm is a kind of method of approximate global optimum, with solving visual mark problem.

The present invention is the space-time characteristic stipulations of contour of object four: brightness, gradient, space continuously and radian keep, the constraint that these four space-time characteristics itself contain can instruct profile to follow the tracks of, and adopts the degree of belief broadcast algorithm to come the dynamic change of approximate resoning space-time restriction simultaneously.

The motion of finding the solution object is exactly that motion to object provides mark, makes posterior probability P (X|Y) maximum.Wherein, X={x _iBe the mark set, x _i=(u _i, v _i), u and v expression level respectively and the distance that moves both vertically; Y={I, I ' } be observed key frame and non-key frame.Construct a markov network, as shown in Figure 3.Posterior probability P (X|Y) can followingly represent:

P (X | Y) &Proportional; \underset{i}{Π} φ_{i} (x_{i}, y_{i}) \underset{i}{Π} \underset{j &Element; N (i)}{Π} ψ_{i, j} (x_{i}, x_{j}) - - - (5)

φ _i(x _i, y _i) be to observe function, be used for calculating probability P (y _i| x _i); ψ _{I, j}(x _i, x _j) be potential-energy function, be used for weighing the compatibility that marks between the adjacent node.

The markov theory is thought: in Markov Random Fields, the conditional probability of a node only is subjected on every side that consecutive points influence.Degree of belief diffusion main purpose is on one four connected graph, and information between the adjacent node is transmitted.Each information is one group of possible vector that mark constitutes.m _Ij ^tBe the information that sends to j in t moment node i, m _i ^tBe that t marks the information that sends to node j, b constantly _iIt is the degree of belief of node i.The degree of belief broadcast algorithm is the algorithm of an iteration, and each iterative process is as follows:

m_{ij}^{t + 1} (x_{j}) = \frac{1}{Z} \max_{x_{i}} ψ_{ij} (x_{i}, x_{j}) m_{i}^{t} (x_{i}) \underset{k &Element; N (j) \ i}{Π} m_{kj}^{t} (x_{j}) - - - (6)

m _i ^t(x _i) all be identical constantly at each, its value is φ _i(x _i, y _i); N (j) i represent the non-i node set adjacent with node j, Z is a normalization numerical value.The value of last degree of belief is:

b_{i} (x_{i}) = \frac{1}{Z} m_{i} (x_{i}) \underset{j &Element; N (i)}{Π} m_{ji} (x_{i}) - - - (7)

The mark value is:

x_{i} = \underset{x_{k}}{\arg \max} b_{i} (x_{k}) - - - (8)

When Numerical Implementation, the amount of calculation of multiplication is too big in (6) and (7) formula, so they are transformed into the number space is calculated, and can obtain:

m_{ij}^{t + 1} (x_{j}) = \max_{x_{i}} (ψ_{ij} (x_{i}, x_{j}) + m_{i}^{t} (x_{i}) + \underset{k &Element; N (j) \ t}{Σ} m_{kj}^{t} (x_{j})) - - - (9)

b_{i} (x_{i}) = m_{i} (x_{i}) + \underset{j &Element; N (i)}{Σ} m_{ji} (x_{i}) - - - (10)

The brightness in motion process of video consecutive frame object, gradient and radian do not have big variation, and the adjacent moment object of which movement is continuous.Analyze these constraints, brightness and Grad can influence the observation function as can be known, and the spatial continuity and the radian of motion can influence potential-energy function, so these functions can followingly be represented:

φ _i(x _i)＝exp(-(λ _lE _l(x _i)+λ _GE _G(x _i))) (11)

ψ _ij(x _i，x _j)＝exp(-(λ _NE _N(x _i，x _j)+λ _CE _C(x _i，x _j))) (12)

Wherein: E _lBe that image brightness keeps constraint, E _GBe that gradient keeps constraint, E _NBe the spatial continuity constraint, E _CBe that radian keeps constraint, λ _l, λ _G, λ _NAnd λ _CBe weights corresponding to sub-energy function.

Suppose that (x, y are that coordinate is that ((x+u, y+v are that coordinate is that (wherein u and v are respectively the level of this pixel and the distance that moves both vertically for x+u, y+v) gray values of pixel points on the t+dt frame t+dt) to f for x, y) gray values of pixel points on the t frame t) to f.According to Taylor expansion,

f (x + u, y + v, t + dt) = f (x, y, t) {+ f}_{x} u + f_{y} u + f_{t} dt + O ({&PartialD;}^{2}) - - - (13)

Be very little numerical value, therefore have:

f(x+u，y+v，t+dt)≈f(x，y，t)+f _xu+f _yv+f _tdt (14)

Object is in the consecutive frame motion process, and it is very little that brightness value changes, thus the brightness of image constraint be exactly make f (x+u, y+v, t+dt) and f (x, y, t) between difference minimize, therefore

E _l(x _i)＝f _xu _i+f _yv _i+f _tdt (15)

In general, contour of object part Grad is bigger, easily distinguishes with non-outline portion.Therefore, this paper Grad as judging a whether important indicator of contour of object of this position.If g (x, y, t) be on the t frame coordinate for (x, the y) Grad of pixel can obtain equally:

E _G(x _i)＝g _xu _i+g _yv _i+g _tdt (16)

In order to keep the object space continuity, the adjacent part motion should be continuous on the object.Therefore have:

E _N(x _i，x _j)＝|u _i-u _j|+|v _i-v _j| (17)

In the motion process of object, the contour of object shape roughly remains unchanged, and that is to say that certain some radian keeps constancy on the contour of object.We are similar to radian with the second dervative of outline line,

c＝||p _j+p _k-2p _i|| (18)

P wherein _i, p _jAnd p _kBe adjacent 3 points on the outline line, as shown in Figure 4.

Make radian keep the bound energy function to be:

E_{C} = {| | (p_{j}^{t + dt} + p_{k}^{t + dt} - {2 p}_{i}^{t + dt}) - (p_{j}^{t} + p_{k}^{t} - {2 p}_{i}^{t}) | |}^{2} - - - (19)

Wherein

p_{i}^{t + dt} - p_{i}^{t} = (u_{i} \cdot dt, v_{i} \cdot dt),

Therefore following formula can be converted into:

E _C(x _i，x _j)＝((u _j+u _k-2u _i) ²+(v _j+v _k-2v _i) ²)·(dt) ² (20)

(u _k, v _k) be approximately (u _i, v _j), like this following formula just only and i, the mark of j is relevant.Obtain:

E _C(x _i，x _j)＝((u _j-u _i) ²+(v _j-v _i) ²)·(dt) ² (21)

Can obtain observing function and potential-energy function (11) and (12) like this, utilize the degree of belief broadcast algorithm, according to (9), (10) with (8 formulas can obtain the motion vector of the each point on the silhouette edge, and (u, v), the outline position on the non-key frame just can obtain.As shown in Figure 6, (a) being the profile that obtains according to annotation results on the key frame, (b) is the profile on the key frame to be followed the tracks of the result who obtains according to the degree of belief broadcast algorithm.The tracking of profile is more accurately, though in some error of head, as a kind of profile information, for the mark on the final non-key frame, this result is enough.

Shape components in the formula (4) is as follows:

E_{s} (x_{i}, x_{j}) = 1 - e^{- d_{ij}} - - - (22)

d _IjBe the minimum distance of the mid point of i and j to silhouette edge.As can be seen and near more this component of limit of silhouette edge just more little, also possible more being cut apart just.

Embodiment 1

At one section indoor video, it is carried out prospect mark.Implementation process is as follows:

(1) at first it is divided into several portions, each part comprises 10 frames, and wherein a frame is a key frame.Use the immersion watershed algorithm that these frames are carried out preliminary treatment, make image form by these segments.

(2) on key frame, the user carries out interactive operation to it, specifies some prospect part and background parts, shown in Fig. 5 (a).Use the figure partitioning algorithm,, key frame is marked, obtain the result shown in Fig. 5 (b) as formula (1).

(3) profile of prospect utilizes the degree of belief broadcast algorithm then on the key frame shown in Fig. 6 (a), and these profile informations are delivered on the non-key frame, calculates shape components, as formula (22).Use the figure partitioning algorithm,, these non-key frames are marked, obtain the result shown in Fig. 6 (b) as formula (4).

Parameter wherein is provided with as follows: α=1.5, β=0.8, λ _l=1.0, λ _G=1.0, λ _N=1.0, λ _c=2.0.

Embodiment 2

At one section outdoor video, it is carried out the prospect mark.Implementation process is as follows:

(2) on key frame, the user carries out interactive operation to it, specifies some prospect part and background parts with lines.Use the figure partitioning algorithm,, key frame is marked, obtain the prospect annotation results of key frame as formula (1).

(3) use the degree of belief broadcast algorithm, the prospect profile information on the key frame is delivered on the non-key frame, calculate shape components by formula (22).Use the figure partitioning algorithm,, these non-key frames are marked, obtain the prospect annotation results on the non-key frame as formula (4).

Parameter can be provided with like this: α=1.0, other parameter is with embodiment 1.Obtain the video foreground annotation results shown in the third line among Fig. 8.

Claims

1. the object video mask method based on profile space and time feature is characterized in that comprising the steps:

(3) for non-key frame, according to the annotation results on the key frame, according to the distribution of color and the shape information of prospect, and the colouring information of background, non-key frame is marked;

Described key frame is marked, draw the subordinate relation of each part on this frame, key frame is marked, mask method comprises the steps:

(2) for each zone, the field color of the average color in this zone as it;

(5) according to the difference of data value and the difference between the adjacent area piece, each region unit as a node, is constructed the figure that a width of cloth is cut apart, again this width of cloth figure is done smallest partition, obtain the mark that the near-optimization of this width of cloth image is dissolved;

Described for non-key frame, according to the annotation results on the key frame, according to the distribution of color and the shape information of prospect, and the colouring information of background parts, non-key frame is marked, mask method comprises the steps:

(2) according to the annotation results of key frame, obtain the profile of foreground object, utilize the degree of belief broadcast algorithm, the motion of estimation object in certain range of movement, obtain the approximate location of contour of object on the non-key frame, this profile information replenishing as adjacent area piece difference;

(3) according to the difference of data value and the difference between the adjacent area piece, as a node, the figure that structure one width of cloth is cut apart on non-key frame does smallest partition on this width of cloth figure, obtain the mark on the non-key frame each region unit.

2. a kind of object video mask method according to claim 1 based on profile space and time feature, it is characterized in that described one section video being divided into several portions: according to the speed of object of which movement in the video, one section video is divided into several portions, the frame number meeting of each part and the speed of object of which movement are inversely proportional to, when the speed of object of which movement is fast, the frame number of each part is few, otherwise then frame number is many.

3. a kind of object video mask method according to claim 1 based on profile space and time feature, it is characterized in that described for key frame, require user's input prompt information, specify some key components in foreground object and the background object: the user is on the image of key frame, with mouse prospect or background are sketched the contours, draw some points, line segment and polygon, like this, these the point, line segment and polygon are the hard limit that marks on the key frame, and promptly the subordinate relation of the part of being sketched the contours prospect or background in the process of mark can not change.