CN101504775B

CN101504775B - Roaming video automatic generation method based on image set

Info

Publication number: CN101504775B
Application number: CN2009100968558A
Authority: CN
Inventors: 丛林; 童若峰
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2009-03-19
Filing date: 2009-03-19
Publication date: 2011-08-31
Anticipated expiration: 2029-03-19
Also published as: CN101504775A

Abstract

The invention provides a method for automatically generating roaming videos based on image sets, which comprises the following steps: firstly, local matching; secondly, global optimization; thirdly, image synthesis; and fourthly, connection of all the frames into a video for output. The method takes a series of images as input, finds out an optimum browsing sequence, and forms a continuously amplified image browsing video with visual continuity. The invention simultaneously provides a more vivid image browsing mode, which makes users browse images in a more vivid mode just like roaming in image scenes; as for a series of images of a tourist attraction, the method can generate a video for well connecting different scenes of the tourist attraction in series and take the video as a propaganda film of the tourist attraction; and the production method has high speed, does not require scene design and image selection with superabundant manpower and material resources, and has high automation degree in the whole video generation process.

Description

A kind of roaming video automatic generation method based on image set

Technical field

The present invention relates to a kind of based on the image browsing method of amplifying form.

Background technology

Be under the jurisdiction of feeling of unreality image/video process field based on the image browsing method of amplifying form, at present online popular a kind of feeling of unreality artistic expression that is become zoomquilt: artist's Freehandhand-drawing one is an image, form a flash video that amplifies playing image, the video of broadcast visually is continuous.But make such flash, need the well-designed and drafting scene of artist, need to consume a large amount of manpowers.

Summary of the invention

Technical matters to be solved by this invention provides a kind of roaming video automatic generation method based on image set, and it can generate the picture browsing video automatically based on Computer Processing.The present invention is by the following technical solutions for this reason: this method may further comprise the steps:

(1) local coupling: establishing in the image set arbitrarily, piece image is an image A, another width of cloth image is an image B arbitrarily, after image B dwindled, compare with the diverse location image in the image A, calculate the image difference between them, seek in the described image A and the position of image difference minimum between the described image B of dwindling, as the embedded location of described image B of dwindling in image A, this minimum image difference deposits image difference table in as two image final image difference with this position; Repeat above-mentioned steps, the image in the image set is mated in twos, deposit the minimum image difference that obtains in image difference table;

(2) global optimization: system finds out the original image collating sequence of image difference summation minimum according to the image difference table that this locality coupling obtains, and as play sequence, wherein, the image of embedding comes the back of the image that is embedded into;

(3) image is synthetic: according to the play sequence that obtains, by succession, establish that original image is i, i+1, i+2 in the sequence ... i+ (n-1), i+n dwindle i+1 and are embedded among the i according to the embedded location that obtains in the coupling of this locality, amplify the composograph of i and i+1, be full of form until i+1; Repeat said process, carry out i+1 and i+2 until the processing of i+ (n-1) with i+n; The composograph of each amplification stage is all as the interpolation image between the original image of front and back, the frame of all images output video;

At last, all frames are connected into video output.

Based on the image browsing method of amplifying form, the image difference of definition comprises following key element: color distinction, texture difference, image complexity.

Color is the notable attribute of image, compares with other features, and color characteristic calculates simple, stable in properties, and is all insensitive for rotation, translation, dimensional variation, shows very strong robustness.Color characteristic comprises color histogram, main color, mean flow rate etc.This method uses the mean pixel difference square (a sum squared of difference) on the hsv color space to weigh color distinction

Texture analysis is an important research direction of computer vision always, and its method can roughly be divided into statistical method and structural approach.Statistical method is that the space distribution information of the color intensity of image is added up, can be further divided into again traditional based on model statistical method and based on the method for spectrum analysis, as Markov random field model, Fourier spectral characteristic etc.Structural approach at first suppose texture pattern by texture cell according to certain regularly arranged composition, so texture analysis just becomes and determines these unit, their spatial disposition of quantitative test.This method uses the mean pixel difference of gradient image square to weigh the texture difference.

The image complexity problem is the research hot issue of computer vision in recent years, human eye is difficult to pick out an object from complex scene, but be easy to go out an object respectively from simple scene, so the zone that downscaled images is embedded into a relative complex is a better choice.

The global optimization problem also is a problem that is widely studied, and has many methods to be used to solve optimization problem, such as simulated annealing, and ant group algorithm, dynamic programming algorithm also is applicable to these a series of problems of solution.

Figure cuts algorithm and is widely used in computer vision field in the past few years, to solve label problem, energy minimization problem or the like, concrete application comprises image segmentation, image mosaic, video-splicing or the like, aspect the image mosaic, figure cuts algorithm extraordinary effect especially.

Because adopt technical scheme of the present invention, the present invention as input, finds a best to browse sequence with a series of images, form one section picture browsing video that constantly amplifies with visual continuity, the present invention also has following beneficial effect simultaneously:

1. a kind of more lively picture browsing mode is provided, has made the user with a kind of more lively mode browse graph picture, just as in the scene of image, roaming.

2. for a series of images of tourist attractions, this method can generate one section better video of the different scenes in this sight spot of series connection, as the propaganda film of these tourist attractions.

3. method for making speed is very fast, does not need the manpower and materials of overspending to go design scenario and select image, whole video generative process automaticity height.

Description of drawings

Fig. 1 is the process flow diagram of automatic generation method provided by the present invention.

Fig. 2 is the sub-process of local coupling step provided by the present invention.

Fig. 3 is the sub-process figure of image synthesis step provided by the present invention.

Fig. 4 is the process prescription figure of automatic generation method provided by the present invention, has demonstrated image set as input, exports best play sequence, carries out the composograph interpolation between the adjacent picture in the play sequence, forms the process of output video.

Fig. 5 is the simplified schematic diagram that automatic generation method provided by the present invention is handled example, it in the drawings, belong to a part of image in an image play sequence of first row for arranging by sequence, second row connects the wherein interpolation intermediate frame of two adjacent images for generating in the video.

Embodiment

At first be defined in the symbol that will use in the next explanation: image sequence is defined as I ₁..., I _nWith I _iBe defined as target image, with I _I+1Down-scaled version note be P _I+1I _iAnd P _I+1Composograph note be S _{I, i+1}Mean pixel difference square (sum squared of difference) is abbreviated as SSD.

Fig. 1 is a basic flow sheet of the present invention, and the present invention as input, constantly is amplified in the video that in image switch as output with one section with a series of images.

Below each flow process of the present invention is elaborated:

1. local coupling

The local coupling with a series of images as input (flow process as shown in Figure 2), substantially flow process is as follows for it: establishing in the image arbitrarily, piece image is an image A, another width of cloth image is an image B arbitrarily, after image B dwindled, compare with the diverse location image in the image A, calculate the image difference between them, seek in the described image A and the position of image difference minimum between the described image B of dwindling, as the embedded location of described image B of dwindling in image A, this minimum image difference deposits image difference table in as two image final image difference with this position; Repeat above-mentioned steps, the image in the image set is mated in twos, deposit the minimum image difference that obtains in image difference table

The image difference comprises four aspects: color distinction, texture difference, the image complexity of embedded location, the embedded location position in image.

HSV spatial color difference: Δ c=α D (H)+β D (S)+γ D (V), wherein α: β: γ=4: 1: 1 is weighted value, D (H), D (S), D (V) represent 3 mean pixel SSD on the passage respectively

The texture difference: Δ T is defined as the mean pixel SSD through two width of cloth gradient images of medium filtering

Image complexity: use variance V to weigh the complexity of object region

Embedded location: to the position p definition penalty value P at edge _t(p), compare with the embedded location of image border near the embedded location of image central authorities, its image difference value is little, its penalty value P _t(p) little, for position intermediate, this is 0.Also the embedded location of promptly close image central authorities gives less punishment, gives bigger punishment in the position of image border.

Comparison procedure is carried out according to shown in Figure 2, certain embedded location is at first compared color distinction, if color distinction is greater than some setting thresholds, then skip this position, go to other positions to compare, otherwise continue relatively texture difference, image complexity, three factors of embedded location are up to the comprehensive evaluation that obtains this position.

Comprehensive above-mentioned several, the down-scaled version (size is r) that i is opened image embeds the consumption that j opens image and is defined as follows:

{\cos t}_{r} (i, j) = \min_{p &Element; C} (w_{1} Δc (i, p) - w_{2} V (p) + w_{3} ΔT (i, p) + w_{4} P_{t} (p))

Wherein p is an embedded location, and G is the set of all positions, w ₁=0.4, w ₂=w ₃=w ₄=0.2, be weighted value, all the other several the factors that will consider for foregoing description image difference, last, consider the different big or small of image i, the consumption that image i is pasted into image j is defined as follows:

\cos t (i, j) = \min_{r &Element; R} ({Cost}_{r} (i, j))

Wherein R gathers (generally choosing 16 * 16,32 * 32,64 * 64) for all resolution that we consider.

(i j), can obtain storing the table Best Map (image difference table) of difference between all images at last to calculate all images Cost between any two.

2. global optimization

The image difference table that global optimization obtains with this locality coupling is as input, problem will produce an optimum play sequence, make and do not duplicate image in the calling sequence, and the image difference sum minimum between the adjacent image, this problem solves by dynamic programming, set up a two-dimentional form BestSeq, the element Be stSeq in the form (i, j) .C _tExpression length is the minimum wastage in bulk or weight with j the image sequence that is ending of i, and form is gone back the index of previous image in the storage sequence simultaneously, and the dynamic programming recursion equation is expressed as follows:

BestSeq (i, j) . C_{t} = \min_{k = 1, . . ., L} (BestSeq (i - 1, k) . C_{t} + Cost (j, k) + Penalty)

Wherein Cost (j, k) be from Best Map, check in j is pasted consumption into k, Penalty is in order to prevent the appearance of multiimage in the sequence, if image j occurs in sequence before, Penalty is exactly a very big numeral, otherwise Penalty is exactly 0.

After dynamic programming finished, last column found an element of wastage in bulk or weight minimum in form, recalls by index information, can obtain an optimal sequence.

3. image is synthetic

After obtaining optimum play sequence, two images in front and back in the sequence are synthesized (flow process is as shown in Figure 3) at each amplification stage, image synthesis strategy of the present invention, employing figure cuts channeling conduct as a result, obtains embedded images P _I+1And I _iThe best of synthesizing is synthesized the border, expands by this border, generates to comprise P _I+1In the nuclear K of all pixel alpha values.

The energy function that figure cuts is defined as follows:

E (L) = α Σ_{p} E_{data} (p, L (p)) + Σ_{p, q} E_{smooth} (p, q, L (p), L (q))

E wherein _DataBe the single order energy, E _SmoothIt is the second order energy.α is a relative importance, and L (p) expression is to the label of p point mark, and this point of 1 expression adopts P _I+1Pixel, I is adopted in 0 expression _iPixel.

The single order energy is defined as follows:

\{\begin{matrix} E_{data} (p, 0) = nk - dist (p, center) \\ E_{data} (p, 1) = dist (p, center) \end{matrix}

Wherein, n is P _I+1Size, establish k=0.6, dist is the function that calculates two pixel distances, center is P _I+1Geometric center.

The second order energy is defined as follows:

E _smooth(p，q，L(p)，L(q))＝||L(p).p-L(q).p||+||L(p).q-L(q).q||

Wherein, L (a) .b represents that L (a) goes up the color value of pixel b, and L (a)=0 expression is a target image, and L (a)=1 expression is the image that dwindles.

Use the energy function of above-mentioned definition, employing max-flow method can be in the hope of embedded images P _I+1And I _iThe best of synthesizing is synthesized the border, and the border is divided into A and two parts of E with image.By this border, generation comprises P _I+1In the nuclear K of all pixel alpha values, it is as follows to generate principle:

● for the pixel among the regional A, given alpha value

● for the pixel that belongs among the E, carry out 4 in abutting connection with expansion, given alpha value: n/w-δ _*Step (p)

Wherein, n is current P _I+1Size, w is the form size, δ is a given constant, the step number that step (p) remarked pixel p arrives through expansion.

Obtain composograph at last:

S＝KP+(1-K)T _r

T wherein _rRepresent the object region corresponding with P.

In fact might exist figure to cut the unfavorable result of result,, then abandon employing figure and cut the result, then use Gaussian function to generate alpha nuclear K if figure cuts and obtains regional A less than 1/4 of whole region area:

K (x, y) = A e^{- [\frac{{(x - x_{0})}^{2}}{2 σ_{x}^{2}} + \frac{{(y - y_{0})}^{2}}{2 σ_{y}^{2}}]}

\{\begin{matrix} A = \frac{n}{w} \\ σ_{x}^{2} = {(\frac{n}{k})}^{2} \\ σ_{y}^{2} = {(\frac{n}{k})}^{2} \end{matrix}

(x wherein ₀, y ₀) be the coordinate of P geometric center, σ _x ²And σ _y ²Be variance, n is current P _I+1Size, w is the form size, k is decided to be 4.5 in order to regulate the smoothness of nuclear at this.

4. relevance between maintenance frame

In sequence, work as P _I+1Filled up form, composograph is by S _{I, i+1}Transferred S to _{I+1, i+2}, will produce vision sudden change at this, take place for preventing this sudden change, the composograph of front is carried out mixing of several steps with the composograph of back, the adding transition frames:

Tran＝αS _i，i+1+(1-α)S _i+1，i+2

Wherein Tran represents transition frames, and the initial value of α is set at 0.9, and each one amplifies its value is reduced 0.025, is decreased to 0 until its value.

5. generation video

Through above-mentioned plurality of processes, at a frame of each amplification procedure generation video, all these frames have been formed the amplified video of output.Fig. 5 has showed the frame that wherein partly connects adjacent image.

Claims

1. roaming video automatic generation method based on image set is characterized in that this method may further comprise the steps:

(1) local coupling: establishing in the image set arbitrarily, piece image is an image A, another width of cloth image is an image B arbitrarily, after image B dwindled, compare with the diverse location image in the image A, calculate the image difference between them, seek in the described image A and the position of image difference minimum between the image B of dwindling, as the embedded location of described image B of dwindling in image A, the minimum image difference deposits image difference table in as two image final image difference with this position; Repeat above-mentioned steps, the image in the image set is mated in twos, deposit the minimum image difference that obtains in image difference table;

(2) global optimization: according to the image difference table that this locality coupling obtains, find out the image collating sequence of image difference summation minimum, as play sequence, wherein, the image of embedding comes the back of the image that is embedded into;

At last, all frames are connected into video output.

2. a kind of roaming video automatic generation method based on image set according to claim 1 is characterized in that described image difference comprises 2 aspects: color distinction, texture difference; Wherein, color distinction uses the mean pixel difference on the hsv color space square to weigh; The texture difference uses the mean pixel difference of gradient image square to weigh.

3. a kind of roaming video automatic generation method based on image set according to claim 2 is characterized in that described image difference comprises 2 aspects: the image complexity of embedded location, the embedded location position in image; Image complexity uses variance to weigh, and the image complexity of embedded location is big more, and the image difference value is more little; Embedded location near image central authorities is compared with the embedded location of image border, and its image difference value is little.

4. according to claim 2 or 3 described a kind of roaming video automatic generation methods based on image set, it is characterized in that when calculating described image difference, at first calculate color distinction, if color distinction is then abandoned this image difference relatively greater than given threshold value, otherwise, continue to calculate remaining Several Factors and compare fully.

5. a kind of roaming video automatic generation method according to claim 1 based on image set, it is characterized in that in step (1), described image A is divided into and the described identical non-intersect subregion of image B size that dwindles, described image B of dwindling and described subregion are carried out the image difference relatively.

6. a kind of roaming video automatic generation method based on image set according to claim 1 is characterized in that in step (2), uses dynamic programming method to obtain described play sequence; If in two images of the adjacent front and back of play sequence, previous image is for being embedded into image, the downscaled images of the image in back is an embedded images, in output information, also comprise the best initial reduction dimension information of the embedded images in the play sequence and the positional information that is embedded into that is embedded into image.

7. a kind of roaming video automatic generation method according to claim 1 based on image set, it is characterized in that establishing in two images of adjacent front and back of play sequence, previous image is for being embedded into image, the downscaled images of the image in back is an embedded images, in step (3), to the embedded images in the composograph of each amplification stage be embedded into image and carry out image co-registration and handle.

8. a kind of roaming video automatic generation method according to claim 6 based on image set, it is characterized in that described image co-registration is handled adopts following disposal route: its employing figure cuts technology, obtains embedded images and is embedded into the minimum border of energy that merge the embedding target area of image; Under the condition that obtains the minimum border of energy, the pixel that the border is embedded in image gives initial alpha value, does 4 in abutting connection with expansion on the border, the pixel alpha value linear decrease that per step expansion obtains; Finally obtain the nuclear K of the alpha value of embedded images; Use K, the alpha blend of colors is carried out in embedded images and the embedding target area that is embedded into image, obtain composograph.

9. a kind of roaming video automatic generation method based on image set according to claim 1 is characterized in that composograph exports as frame of video, and described method is safeguarded its interframe continuity when synthetic video.

10. a kind of roaming video automatic generation method according to claim 9 based on image set, it is characterized in that between described maintenance frame the continuity step adopt keep before composograph carry out the mode that alpha mixes with current composograph.