US20060034528A1

US20060034528A1 - System and method for non-iterative global motion estimation

Info

Publication number: US20060034528A1
Application number: US10/916,599
Authority: US
Inventors: Yeping Su; Ming-Tin Sun; Yuh-Feng Hsu
Original assignee: Industrial Technology Research Institute ITRI; Washington University in St Louis WUSTL
Current assignee: Industrial Technology Research Institute ITRI; Washington University in St Louis WUSTL
Priority date: 2004-08-12
Filing date: 2004-08-12
Publication date: 2006-02-16
Also published as: US7684628B2

Abstract

A fast non-iterative Global Motion Estimation (GME) algorithm is disclosed for estimating the perspective transform global motion parameters from the Motion Vectors (MV) obtained from the block matching process that includes grouping a plurality of motion vectors in the input video stream into a predetermined number of groups of motion vectors, calculating a set of global motion parameters from each of the predetermined groups of the motion vector, and processing the set of global motion parameters generated from the calculation to obtain a final estimation.

Description

BACKGROUND

1. Field of the Invention
The present invention relates generally to methods and systems for estimating Global Motions (GMs) in a video sequence, and more particularly, to methods and systems for estimating and compensating for GMs in a video sequence through a novel non-iterative motion estimation.
2. Background of the Invention
Utilization of a video camera or a digital still camera (DSC) to record a scene is well known in the art. The scene recorded by the video camera is formed of a video sequence that comprises a number of individual images, or frames, taken at regular intervals. When the intervals are sufficiently small, displaying the successive frames adequately recreates the motion of the recorded scene.
In general, the motion in the video sequence, or the differences between successive frames, is due to movements of an object being recorded or the motion of the camera itself, resulting from adjustments by the user to the camera functionalities, such as zooming, involuntary movements, or jitters. The motions caused by camera movements result in Global Motions (GMs) in the video sequence, meaning the entire scene shifts and moves, as opposed to a local motion, such as a movement by an object being recorded, against a steady background. Some GMs such as jitters are generally unintended and undesired during a recordation process. A number of systems and methods have been proposed to estimate and compensate for GMs.
It is known in the art that GMs in a video sequence are often modeled by parametric transforms of 2D images. The process of estimating the transform parameters from images is known as Global Motion Estimation (GME). GME is an important tool widely used in computer vision, video processing, and other related fields. As an example, for MPEG-4 GME, global motions are described in a parametric form, with models ranging from a simple translational model with two parameters to a general perspective model with eight parameters. Among these models, the model with eight parameters is the most general in MPEG-4 GME. According to this model, the GM between a reference frame and a current frame can be represented by coordinates (x, y) that is calculated by the following equations: $x^{'} = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1}, y^{'} = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1}$
GMs can only be calculated by finding all eight parameters, m₀˜m₇, of the frames. Many algorithms have been proposed for MPEG-4 GME, both in the pixel-domain and in the compressed-domain. Most of the algorithms dealing with the perspective model, however, are iterative because the perspective transform model is nonlinear with respect to the GM parameters. Although acceptable performance can be achieved through the iterative approach, the computational cost may be prohibitive for real-time encoding or for applications with limited computational power such as those in wireless devices.
Furthermore, the conventional GME algorithm is considered as the most time consuming and cost ineffective operation in modern MPEG-4 Advanced Simple Profile (ASP) video coding. As computational cost is the major concern for some applications involving GME, it is desirable to design an algorithm with less computational complexities.

BRIEF SUMMARY OF THE INVENTION

In accordance with the present invention, there is provided a non-iterative method for estimating global motions between a plurality of image frames in an input video stream that includes grouping a plurality of motion vectors in the input video stream into a predetermined number of groups of motion vectors, calculating a set of global motion parameters from each of the predetermined groups of the motion vector, and processing the set of global motion parameters generated from the calculation to obtain a final estimation.
In one embodiment, the step of grouping the motion vectors is based on a fixed spatial distance among the motion vectors within each of the predetermined number of groups.
Also in accordance with the present invention, there is provided a method for estimating global motions between a reference image frame and a current image frame that includes employing a perspective model with eight global motion parameters (m₀-m₇), wherein $x^{'} = f_{x} (x, y | m) = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1}$ $y^{'} = f_{y} (x, y | m) = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1}$

- where (x,y) and (x′,y′) are the coordinates in the current and the reference images frames, respectively, with the set of eight global motion parameters m=[m₀, . . . ,m₇], and calculating the set of eight global motion parameter m using algebraic distance as below: $χ^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} x_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{0} x_{i} + m_{1} y_{i} + m_{2}) \\ y_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{3} x_{i} + m_{4} y_{i} + m_{5}) \end{matrix}] \rangle}^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} - x_{i} \cdot m_{0} - y_{i} \cdot m_{1} - m_{2} + x_{i}^{'} x_{i} \cdot m_{6} + x_{i}^{'} y_{i} \cdot m_{7} + x_{i}^{'} \\ - x_{i} \cdot m_{3} - y_{i} \cdot m_{4} - m_{5} + y_{i}^{'} x_{i} \cdot m_{6} + y_{i}^{'} y_{i} \cdot m_{7} + y_{i}^{'} \end{matrix}] \rangle}^{2}$ $where x_{i}^{'} = {MV}_{xi} + x_{i} and y_{i}^{'} = {MV}_{yi} + y_{i} .$

In one embodiment, the algebraic distance equation may be solved with an over-determined linear system as follows: $(\begin{matrix} x_{0} & y_{0} & 1 & 0 & 0 & 0 & - x_{0} x_{0}^{'} & - y_{0} x_{0}^{'} \\ 0 & 0 & 0 & x_{0} & y_{0} & 1 & - x_{0} y_{0}^{'} & - y_{0} y_{0}^{'} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N - 1} & y_{N - 1} & 1 & 0 & 0 & 0 & - x_{N - 1} x_{N - 1}^{'} & - y_{N - 1} x_{N - 1}^{'} \\ 0 & 0 & 0 & x_{N - 1} & y_{N - 1} x & 1 & - x_{N - 1} y_{N - 1}^{'} & - y_{N - 1} y_{N - 1}^{'} \end{matrix}) (\begin{matrix} m_{0} \\ m_{1} \\ m_{2} \\ m_{3} \\ m_{4} \\ m_{5} \\ m_{6} \\ m_{7} \end{matrix}) = (\begin{matrix} x_{0}^{'} \\ y_{0}^{'} \\ ⋮ \\ x_{N - 1}^{'} \\ y_{N - 1}^{'} \end{matrix})$
In accordance with the present invention, there is additionally provided a non-iterative method for estimating global motions between a plurality of image frames in an input video stream that includes grouping a plurality of motion vectors in the input video stream into a predetermined number of groups of motion vectors, calculating a set of global motion parameters from each of the predetermined groups of the motion vector having a plurality of global motion parameters, and processing the set of global motion parameters generated from the calculation to obtain a final estimation. The step of calculating a set of global motion parameters further includes calculating the plurality of global motion parameters using algebraic distance, and calculating the algebraic distance using an over-determined linear system.
In accordance with the present invention, there is further provided a system for estimating global motions between image frames of an input video stream that includes a grouping device for grouping a plurality of motion vectors contained in the input video stream to obtain a predetermined groups of motion vectors, a calculation device for calculating a global motion estimation from each of the predetermined groups of motion vectors to obtain a set of global motion parameters {m_j}_j=1:J, with each global motion estimation m_jcomprising eight global motion parameters (m₀, . . . ; m₇), and a post-processing device for obtaining a final estimation from the set of global motion parameters.
In one embodiment, the system further includes means for calculating a histogram of the global motion parameter {m_j}_j=1;Jwith four bins in each of eight dimensions, means for choosing a bin from the four bins that includes a largest amount of m_j, and means for averaging over the m_jof the chosen bin to obtain the final estimate.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing GM correspondences between current and reference image frames;
FIG. 2 is a flow chart of a method of non-iterative MV-based GME in accordance with one embodiment of the present invention;
FIG. 3 is a block diagram of a system for obtaining non-iterative MV-based GME in accordance with one embodiment of the present invention;
FIG. 4 is a schematic diagram showing MV groupings in accordance with one embodiment of the present invention;
FIGS. 5(a) and 5(b) are charts comparing the rate-distortion (R-D) performance of a conventional pixel-GME algorithm and the MV-GME algorithm consistent with the present invention; and
FIG. 6 is a chart comparing the number of bits used by simulation between the conventional pixel-GME algorithm and the MV-GME algorithm consistent with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Before one or more embodiments of the invention are described in detail, one skilled in the art will appreciate that the invention is not limited in its application to the details of construction, the arrangements of components, and the arrangement of steps set forth in the following detailed description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.
A fast non-iterative Global Motion Estimation (GME) algorithm is disclosed for estimating the perspective transform global motion parameters from the Motion Vectors (MV) obtained from the block matching process. The present invention employs a non-iterative motion vector based GME algorithm to estimate the eight GM parameters of the most general model as described above. The algorithm of the present invention utilizes motion vectors (MVs) in an input video stream to estimate the GM parameters. The MV-based algorithm of the present invention is able to reduce the computational complexity of global motion estimation with minimum quality loss. In addition, the algorithm of the present invention may be implemented in a conventional MPEG-4 encoder after a block-based motion estimation (BME) process. Specifically, the MV-based algorithm of the present invention is linear and non-iterative, and therefore can estimate the perspective GM parameters efficiently and robustly.
In accordance with one embodiment of the present invention, the MV-based GME algorithm is to estimate GM parameters with the general perspective model from a sampled MV field. A BME is first performed on parts of the image frame to estimate the block MVs, and then the block MVs are used to estimate the GM parameters. One embodiment of the present invention provides a method for estimating GMs between image frames in an input video stream. According to this embodiment, a plurality of MVs included in the input video stream is grouped into J groups of MVs. The method then calculates a GME for each of the groups of the motion vector to obtain J sets of GME. The J sets of GME are further processed to obtain a final estimation. The GMs may be estimated using algebraic distance and an over-determined system.
The present invention further provides a system for estimating GMs between image frames in an input video stream. The system includes a grouping device for grouping a plurality of MVs contained in the input video stream to obtain J groups of MVs, a calculation device for calculating a GME from each of the J groups of MVs to obtain a set of GME {m_j}_j=1:J, with each GME m_jcomprising eight GM parameters (m₀, . . . ,m₇), and a post-processing device for obtaining a final estimation from the set of GME.
In accordance with the present invention, an image frame recorded by a digital camera is first divided into a number of blocks and each block includes a matrix of pixels. The block motion estimation of each of the MV blocks is calculated first and resulted motion vectors are processed to obtain a final estimate. Unlike the pixel-by-pixel estimation used in the conventional method, which is iterative and therefore time-consuming, the present invention performs GMEs for each of the MV blocks, thus reducing the number of calculation steps and computational complexity.
For example, consider a point on an object moving in the 3D space. Its position can be expressed in the 3D coordinates as x=(X,Y,Z)^T∈R³, and (X(t),t) defines its moving trajectory in the 3D space over time. Image acquisition systems project the 3D world onto a 2D image plane and sample them at a usually uniform grid x=(x,y)^T∈R². Upon this projection, a 2D motion trajectory (x(t),t) in obtained. In general, a MV field is a vector-valued function of motion trajectories on continuous spatial coordinates. In practical applications, this function is commonly described in a parametric form as transformations with sets of parameters or the motion trajectories of some reference points.
Various 2-D parametric models have been defined in MPEG-4 standards and the eight-parameter perspective model is the most general one, in which the transformation is defined as $\begin{matrix} x^{'} = f_{x} (x, y | m) = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1} y^{'} = f_{y} (x, y | m) = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1} & (1) \end{matrix}$

- where (x,y) and (x′,y′) are the coordinates in the current and the reference images respectively, with the set of GM parameters m=[m₀, . . . ,m₇]. An embodiment of the present invention focuses on the perspective transforms as the most general GM model defined in MPEG-4.

FIG. 1 shows the concept of global motion compensation with the perspective model, where the correspondence between a current frame 12 and a reference frame 11 are illustrated.
In an embodiment of the present invention, each of image frames taken by a camera is divided into a number of MV blocks by an exemplary conventional GME method. In applications such as MPEG-2 to MPEG-4 ASP transcoding, the MVs from the block-matching are readily available. Considering MVs from BME as noisy samples of the motion vector field, the goal of a practical GME algorithm is to achieve accurate estimation of the global motion parameters robustly and efficiently.
A difficulty of the GME method using MVs, however, is the estimation of the GM parameter m from the MV set, which is available from the compressed video bit-stream and defined as {(x_i,y_i),(MVx_i,MVy_i)}_i=0:N−1, where (MVx_i,MVy_i) denotes the i-th motion vector located at (x_i,y_i) in the current picture, with N denoting the total number of MVs. Applying the Euclidian distance calculations, the parameters can be calculated by the following nonlinear least-square (LS) problem because the perspective model is nonlinear. $\begin{matrix} m = \underset{m}{\arg \min} {\sum_{i = 0}^{N - 1} { r_{i} }^{2}} \underset{m}{\arg \min} {\sum_{i = 0}^{N - 1} { [\begin{matrix} {MVx}_{i} - f_{x} (x_{i}, y_{i} | m) + x_{i} \\ {MVy}_{i} - f_{y} (x_{i}, y_{i} | m) + y_{i} \end{matrix}] \rangle}^{2}} & (2) \end{matrix}$
To solve this nonlinear LS problem, however, an iterative optimization procedures need to be employed. As computational burden of the iterative procedures will increase the cost of the GME module, these procedures might be cost-prohibitive for many applications.
Instead, in accordance with an embodiment of the present invention, an algebraic distance is used in the target function such that the LS calculation becomes linear as shown by the following formula: $\begin{matrix} χ^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} x_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{0} x_{i} + m_{1} y_{i} + m_{2}) \\ y_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{3} x_{i} + m_{4} y_{i} + m_{5}) \end{matrix}] \rangle}^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} - x_{i} \cdot m_{0} - y_{i} \cdot m_{1} - m_{2} + x_{i}^{'} x_{i} \cdot m_{6} + x_{i}^{'} y_{i} \cdot m_{7} + x_{i}^{'} \\ - x_{i} \cdot m_{3} - y_{i} \cdot m_{4} - m_{5} + y_{i}^{'} x_{i} \cdot m_{6} + y_{i}^{'} y_{i} \cdot m_{7} + y_{i}^{'} \end{matrix}] \rangle}^{2} & (3) \end{matrix}$
where x_i′=MV_x _i+x_iand y_i′=MV_y _i+y_i. It is known in the art that the LS formulation in (3) is prone to outliers, largely due to the inaccuracies in the BME processes and local motions. Many robust regression algorithms have been established to solve the outliers problem, such as using M-estimators. The preferred embodiment of the present invention avoids the use of iterative algorithms, but still be able to handle the outliers. Accordingly, the equation in (3) may be solved by employing the following over-determined linear system: $\begin{matrix} (\begin{matrix} x_{0} & y_{0} & 1 & 0 & 0 & 0 & - x_{0} x_{0}^{'} & - y_{0} x_{0}^{'} \\ 0 & 0 & 0 & x_{0} & y_{0} & 1 & - x_{0} y_{0}^{'} & - y_{0} y_{0}^{'} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N - 1} & y_{N - 1} & 1 & 0 & 0 & 0 & - x_{N - 1} x_{N - 1}^{'} & - y_{N - 1} x_{N - 1}^{'} \\ 0 & 0 & 0 & x_{N - 1} & y_{N - 1} x & 1 & - x_{N - 1} y_{N - 1}^{'} & - y_{N - 1} y_{N - 1}^{'} \end{matrix}) (\begin{matrix} m_{0} \\ m_{1} \\ m_{2} \\ m_{3} \\ m_{4} \\ m_{5} \\ m_{6} \\ m_{7} \end{matrix}) = (\begin{matrix} x_{0}^{'} \\ y_{0}^{'} \\ ⋮ \\ x_{N - 1}^{'} \\ y_{N - 1}^{'} \end{matrix}) & (4) \end{matrix}$
or, A_2NX×8m_8×1=b_2N×1, which is equivalent to A^TAm=A^Tb. This matrix equation can be solved by using standard matrix inversion routines, or more robustly, using pseudo-inverse via Singular Value Decomposition (SVD).
By using the algebraic distance in the target function, the estimation problem of the prior art is drastically simplified, which makes the non-iterative approach feasible. Although the algebraic distance may result in some accuracy deviation, the performance degradation is insignificant as shown in the simulation results shown in FIGS. 5 and 6, which will be described in detailed later.
FIG. 2 illustrates a flow chart of a non-iterative GME algorithm in accordance with one embodiment of the present invention. Referring to FIG. 2, at step 31, the input MV blocks of each image frame is grouped into J groups of MVs blocks. The grouping of the MV blocks is illustrated in FIG. 4, which will be described later. In accordance with one embodiment, each group of MVs includes four or more MVs.
At step 32, the GME in each group is solved by through the equation A^TAm=A^Tb as shown by equation (4) described above using SVD-based pseudo-inverse. In an exemplary embodiment, A is an 8×8 matrix. As a result, a group of GM parameters: {m_j}_j=1:Jis obtained.
Generally, the further MVs are separated within a group, the better the discriminative power the MVs possess for the estimation of global motion parameters. This is illustrated as how badly A^TA is ill-conditioned for inversion, that is, the larger spatial distance, the smaller the condition number of A, which is the same as the condition number of A^TA. Specifically, in SVD, conditional number is defined as the ratio of the largest and smallest singular values of a matrix. When the condition number is sufficiently large, the matrix is near singular and the inverse of the matrix becomes unreliable.
The method of the present invention can pick any four motion vectors to generate a GM parameter. For example, four motion vectors may be picked near the four corners. However, to prevent the situation that a particular choice of four motion vectors may happen to be corrupted by local motions, an exemplary preferred embodiment of the present invention groups all the motion vectors in groups of four or more motion vectors, and use them to achieve a robust estimation. The grouping of the input MVs is by a fixed spatial aperture/pattern, as shown in FIG. 4. By fixing the distance among the MVs within a group, each group of MVs has the same and maximum allowable spatial diversity.
Referring again to FIG. 2, at step 33, a final estimation is calculated from the group of GM parameters: {m_j}_j=1:Jobtained from step 32. In accordance with an embodiment of the present invention, the final estimation is performed by a histogram-based post-processing approach. In other words, the histogram of {m_j}_j=1:Jis calculated with 4 bins in each of 8 dimensions, and the bin with the largest amount of m_jin it is chosen. The number of bins is chosen according to the number of groups. The simulations shown and discussed herein are based on 4-bin calculations for each of 8 dimensions. The final estimate is obtained by averaging all the m_js within the chosen bin. A more accurate result may be obtained by creating more bins. However, as shown in the simulation results, four bins already provide exceptionally accurate results.
By using the MV grouping, the present invention divides the input MV data sets into small non-overlapping subsets, and gets one GM estimation from each subset. Some resulted m_js will be corrupted by outliers, but the majority will be around the true global motion value. So a histogram-based approach in the above step is able to eliminate the effects of outliers and provide a robust estimation.
The above-described method of the present invention may be implemented in a system depicted in FIG. 3. Referring to FIG. 3, system 40 includes a block-based matching device 41 for dividing an input video stream 401 into a plurality of MVs. The plurality of MVs are next grouped by grouping device 42 to result in J groups of MVs. As described above, each of the J groups includes four or more MVs, and is grouped based on a fixed spatial aperture/pattern such that the distance among the MVs within each group is fixed.
System 40 further includes calculation device 43 for calculating GM of each of the J groups of MVs. The calculation follows the perspective model with eight GM parameters (m₀-m₇) as described with reference to equations (1)-(4). After the calculation, J sets of GM parameters {m_j}_j=1:Jare obtained.
The J sets of GM parameters {m_j}_j=1:Jare then processed in post-processing device 44. In an exemplary embodiment of the present invention, post-processing device 44 first calculates a histogram from the J sets of GM parameter {m_j}_j=1:Jwith 4 bins in each of eight dimensions, chooses a bin with a largest amount of m_jin it, and averages the GM parameters (m₀-m_j) of the chosen m_jto obtain a final estimation 402. Final estimation 402 is then output to a processor (not shown) for further processing.
FIGS. 5(a), 5(b) and 6 illustrate simulation results of the method and system of the present invention in comparison with the conventional methods and systems. The GME algorithm of the present invention is tested in the MPEG-4 GMC encoding as a fast estimation of the GM parameters in place of the default pixel-domain iterative GME in the reference software Momusys. The GME algorithm of the present invention is denoted as MV-GME, in which integer-pixel MVs from the 16-by-16 Macroblock-based full-search BME are fed to the fast GME routine.
Several CIF-sized video sequences are used in the simulations, which contain typical camera zooming and panning motions. The Rate-Distortion performance comparison of the Pixel-GME and the MV-GME algorithms are shown in FIGS. 5(a) and 5(b). Clearly, the MV-GME algorithm performs very close to the Pixel-GME method in coding efficiency. FIG. 6 shows the comparison of bits used when coding the first 30 frames of the MIT sequence, with PSNR≈30 dB using fixed QP.
In terms of the computational cost, the MV-GME method only requires a small fraction of the computational requirements of the Pixel-GME since only one MV out of a block of pixels is involved in the estimation and the estimation is non-iterative. Average computations per call of the MV-GME and Pixel-GME routines in terms of runtime are compared in Table 1, which shows the significant increase in speed in the system and method of the present invention. The computation reduction is calculated as the runtime ratio of two GME algorithms: {overscore (T)}_MVGME/{overscore (T)}_PixelGME, where T denotes the total runtime of the GME subroutine, and the experiments are conducted on a PC with 2 GHz P4 CPU.

TABLE 1

Computational Comparison

Resolution Computation

Sequence #frames Reduction

MIT CIF/59 0.5%

Bicycle CIF/150 0.37%

Pigeon CIF/300 0.68%

Office CIF/300 0.98%

Quad CIF/300 1.04%
The foregoing disclosure of the preferred embodiments of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many variations and modifications of the embodiments described herein will be apparent to one of ordinary skill in the art in light of the above disclosure. The scope of the invention is to be defined only by the claims appended hereto, and by their equivalents.
Further, in describing representative embodiments of the present invention, the specification may have presented the method and/or process of the present invention as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. In addition, the claims directed to the method and/or process of the present invention should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the present invention.

Claims

1. A non-iterative method for estimating global motions between a plurality of image frames in an input video stream, comprising:

grouping a plurality of motion vectors in the input video stream into a predetermined number of groups of motion vectors;

calculating a set of global motion parameters from each of the predetermined groups of the motion vector; and

processing the set of global motion parameters generated from the calculation to obtain a final estimation.

2. The method of claim 1, wherein each of the predetermined groups includes N motion vectors, N is an integer having a value of at least 4.

3. The method of claim 1, wherein the plurality of motion vectors are obtained through a block-based motion estimation method on the input video stream.

4. The method of claim 1, wherein the step of grouping the motion vectors is based on a fixed spatial distance among the motion vectors within each of the predetermined number of groups.

5. The method of claim 1, wherein the calculating results in a group of global motion parameters {m_j}_j=1;J, and wherein calculating the global motion estimates is based on a perspective model with eight global motion parameters (m₀-m₇) as below:

x^{'} = f_{x} (x, y | m) = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1}

y^{'} = f_{y} (x, y | m) = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1}

where (x,y) and (x′,y′) are the coordinates in the current and the reference images frames, respectively, with the eight global motion parameters m=[m₀, . . . ,m₁],

wherein the eight global motion parameters m is calculated using algebraic distance:

χ^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} x_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{0} x_{i} + m_{1} y_{i} + m_{2}) \\ y_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{3} x_{i} + m_{4} y_{i} + m_{5}) \end{matrix}] \rangle}^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} - x_{i} \cdot m_{0} - y_{i} \cdot m_{1} - m_{2} + x_{i}^{'} x_{i} \cdot m_{6} + x_{i}^{'} y_{i} \cdot m_{7} + x_{i}^{'} \\ - x_{i} \cdot m_{3} - y_{i} \cdot m_{4} - m_{5} + y_{i}^{'} x_{i} \cdot m_{6} + y_{i}^{'} y_{i} \cdot m_{7} + y_{i}^{'} \end{matrix}] \rangle}^{2}

where x_{i}^{'} = {MV}_{xi} + x_{i} and y_{i}^{'} = {MV}_{yi} + y_{i} .

6. The method of claim 5, wherein the algebraic distance is determined using an over-determined linear system as below:

(\begin{matrix} x_{0} & y_{0} & 1 & 0 & 0 & 0 & - x_{0} x_{0}^{'} & - y_{0} x_{0}^{'} \\ 0 & 0 & 0 & x_{0} & y_{0} & 1 & - x_{0} y_{0}^{'} & - y_{0} y_{0}^{'} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N - 1} & y_{N - 1} & 1 & 0 & 0 & 0 & - x_{N - 1} x_{N - 1}^{'} & - y_{N - 1} x_{N - 1}^{'} \\ 0 & 0 & 0 & x_{N - 1} & y_{N - 1} x & 1 & - x_{N - 1} y_{N - 1}^{'} & - y_{N - 1} y_{N - 1}^{'} \end{matrix}) (\begin{matrix} m_{0} \\ m_{1} \\ m_{2} \\ m_{3} \\ m_{4} \\ m_{5} \\ m_{6} \\ m_{7} \end{matrix}) = (\begin{matrix} x_{0}^{'} \\ y_{0}^{'} \\ ⋮ \\ x_{N - 1}^{'} \\ y_{N - 1}^{'} \end{matrix})

7. The method of claim 6, wherein the matrix equation is solvable by using at least one of standard matrix inversion routines and pseudo-inverse via Singular Value Decomposition.

8. The method of claim 5, wherein the processing step further comprises:

calculating a histogram of the global motion parameter {m_j}_j=1;Jwith four bins in each of eight dimensions;

choosing a bin from the four bins that includes a largest amount of m_j; and

averaging over the m_jof the chosen bin to obtain the final estimate.

9. A method for estimating global motions between a reference image frame and a current image frame, comprising:

employing a perspective model with eight global motion parameters (m₀-m₇), wherein

\begin{matrix} x^{'} = f_{x} (x, y | m) = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1} y^{'} = f_{y} (x, y | m) = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1} & (1) \end{matrix}

where (x,y) and (x′,y′) are the coordinates in the current and the reference images frames, respectively, with the set of eight global motion parameters m=[m₀, . . . ,m₇], and

calculating the set of eight global motion parameter m using algebraic distance as below:

\begin{matrix} χ^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} x_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{0} x_{i} + m_{1} y_{i} + m_{2}) \\ y_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{3} x_{i} + m_{4} y_{i} + m_{5}) \end{matrix}] \rangle}^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} - x_{i} \cdot m_{0} - y_{i} \cdot m_{1} - m_{2} + x_{i}^{'} x_{i} \cdot m_{6} + x_{i}^{'} y_{i} \cdot m_{7} + x_{i}^{'} \\ - x_{i} \cdot m_{3} - y_{i} \cdot m_{4} - m_{5} + y_{i}^{'} x_{i} \cdot m_{6} + y_{i}^{'} y_{i} \cdot m_{7} + y_{i}^{'} \end{matrix}] \rangle}^{2} where x_{i}^{'} = {MV}_{xi} + x_{i} and y_{i}^{'} = {MV}_{yi} + y_{i} . & (2) \end{matrix}

10. The method of claim 9, wherein the algebraic distance equation (2) is solvable by an over-determined linear system as follows:

(\begin{matrix} x_{0} & y_{0} & 1 & 0 & 0 & 0 & - x_{0} x_{0}^{'} & - y_{0} x_{0}^{'} \\ 0 & 0 & 0 & x_{0} & y_{0} & 1 & - x_{0} y_{0}^{'} & - y_{0} y_{0}^{'} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N - 1} & y_{N - 1} & 1 & 0 & 0 & 0 & - x_{N - 1} x_{N - 1}^{'} & - y_{N - 1} x_{N - 1}^{'} \\ 0 & 0 & 0 & x_{N - 1} & y_{N - 1} x & 1 & - x_{N - 1} y_{N - 1}^{'} & - y_{N - 1} y_{N - 1}^{'} \end{matrix}) (\begin{matrix} m_{0} \\ m_{1} \\ m_{2} \\ m_{3} \\ m_{4} \\ m_{5} \\ m_{6} \\ m_{7} \end{matrix}) = (\begin{matrix} x_{0}^{'} \\ y_{0}^{'} \\ ⋮ \\ x_{N - 1}^{'} \\ y_{N - 1}^{'} \end{matrix})

wherein the over-determined linear system is represented as A_2N×8m_8×1=b_2N×1, which is equivalent to A^TAm=A^Tb.

11. A non-iterative method for estimating global motions between a plurality of image frames in an input video stream, comprising:

calculating a set of global motion parameters from each of the predetermined groups of the motion vector having a plurality of global motion parameters, including

calculating the plurality of global motion parameters using algebraic distance, and

calculating the algebraic distance using an over-determined linear system; and

12. The method of claim 11, wherein the processing further comprises calculating a histogram of the global motion parameters.

13. A system for estimating global motions between image frames of an input video stream, comprising:

a grouping device for grouping a plurality of motion vectors contained in the input video stream to obtain a predetermined groups of motion vectors;

a calculation device for calculating a global motion estimation from each of the predetermined groups of motion vectors to obtain a set of global motion parameters {m_j}_j=1:J, with each global motion estimation m_jcomprising eight global motion parameters (m₀, . . . ;m₇); and

a post-processing device for obtaining a final estimation from the set of global motion parameters.

14. The system of claim 13, wherein each of the predetermined groups of motion vectors comprises N motion vectors, wherein N is an integer of at least 4.

15. The system of claim 13, wherein the plurality of motion vectors are obtained using block-based motion estimation method on the input video stream.

16. The system of claim 13, wherein the grouping is based on a fixed spatial distance among the motion vectors within each group.

17. The system of claim 13, wherein the global motion estimates are based on a perspective model with eight global motion parameters (m₀-m₇) as below:

\begin{matrix} x^{'} = f_{x} (x, y | m) = \frac{m_{0} x + m_{1} y + m_{2}}{m_{6} x + m_{7} y + 1} y^{'} = f_{y} (x, y | m) = \frac{m_{3} x + m_{4} y + m_{5}}{m_{6} x + m_{7} y + 1} & (1) \end{matrix}

where (x,y) and (x′,y′) are the coordinates in the current and the reference images frames, respectively, with the set of eight global motion parameters m=[m₀, . . . ,m₇],

18. The system of claim 17, wherein the set of eight global motion parameters m is calculated using algebraic distance:

\begin{matrix} χ^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} x_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{0} x_{i} + m_{1} y_{i} + m_{2}) \\ y_{i}^{'} \cdot (m_{6} x_{i} + m_{7} y_{i} + 1) - (m_{3} x_{i} + m_{4} y_{i} + m_{5}) \end{matrix}] \rangle}^{2} = \sum_{i = 0}^{N - 1} { [\begin{matrix} - x_{i} \cdot m_{0} - y_{i} \cdot m_{1} - m_{2} + x_{i}^{'} x_{i} \cdot m_{6} + x_{i}^{'} y_{i} \cdot m_{7} + x_{i}^{'} \\ - x_{i} \cdot m_{3} - y_{i} \cdot m_{4} - m_{5} + y_{i}^{'} x_{i} \cdot m_{6} + y_{i}^{'} y_{i} \cdot m_{7} + y_{i}^{'} \end{matrix}] \rangle}^{2} where x_{i}^{'} = {MV}_{xi} + x_{i} and y_{i}^{'} = {MV}_{yi} + y_{i} . & (2) \end{matrix}

19. The system of claim 18, wherein the equation (2) is solvable using an over-determined linear system as below:

(\begin{matrix} x_{0} & y_{0} & 1 & 0 & 0 & 0 & - x_{0} x_{0}^{'} & - y_{0} x_{0}^{'} \\ 0 & 0 & 0 & x_{0} & y_{0} & 1 & - x_{0} y_{0}^{'} & - y_{0} y_{0}^{'} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N - 1} & y_{N - 1} & 1 & 0 & 0 & 0 & - x_{N - 1} x_{N - 1}^{'} & - y_{N - 1} x_{N - 1}^{'} \\ 0 & 0 & 0 & x_{N - 1} & y_{N - 1} x & 1 & - x_{N - 1} y_{N - 1}^{'} & - y_{N - 1} y_{N - 1}^{'} \end{matrix}) (\begin{matrix} m_{0} \\ m_{1} \\ m_{2} \\ m_{3} \\ m_{4} \\ m_{5} \\ m_{6} \\ m_{7} \end{matrix}) = (\begin{matrix} x_{0}^{'} \\ y_{0}^{'} \\ ⋮ \\ x_{N - 1}^{'} \\ y_{N - 1}^{'} \end{matrix})

20. The system of claim 13, wherein the post-processing device further comprises:

means for calculating a histogram of the global motion parameter {m_j}_j=1;Jwith four bins in each of eight dimensions;

means for choosing a bin from the four bins that includes a largest amount of m_j; and

means for averaging over the m_jof the chosen bin to obtain the final estimate.