WO2000018134A1 - Frame skipping without having to perform motion estimation - Google Patents

Frame skipping without having to perform motion estimation Download PDF

Info

Publication number
WO2000018134A1
WO2000018134A1 PCT/US1999/021830 US9921830W WO0018134A1 WO 2000018134 A1 WO2000018134 A1 WO 2000018134A1 US 9921830 W US9921830 W US 9921830W WO 0018134 A1 WO0018134 A1 WO 0018134A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
motion
current image
distortion measure
distortion
Prior art date
Application number
PCT/US1999/021830
Other languages
French (fr)
Inventor
Sriram Sethuraman
Ravi Krishnamurthy
Original Assignee
Sarnoff Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sarnoff Corporation filed Critical Sarnoff Corporation
Publication of WO2000018134A1 publication Critical patent/WO2000018134A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/65Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
    • H04N19/69Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience involving reversible variable length codes [RVLC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/15Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/162User input
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • H04N19/198Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters including smoothing of a sequence of encoding parameters, e.g. by averaging, by choice of the maximum, minimum or median value
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder

Definitions

  • the present invention relates to image processing, and, in particular, to video compression.
  • motion-compensated inter-frame differencing in which blocks of image data are encoded based on the pixel-to-pixel differences between each block in an image currently being encoded and a selected block in a reference image.
  • the process of selecting a block in the reference image for a particular block in the current image is called motion estimation.
  • the goal of motion estimation is to find a block in the reference image that closely matches the block in the current image such that the magnitudes of the pixel-to-pixel differences between those two blocks are small, thereby enabling the block in the current image to be encoded in the resulting compressed bitstream using a relatively small number of bits.
  • a block in the current image is compared with different blocks of the same size and shape within a defined search region in the reference image.
  • the search region is typically defined based on the corresponding location of the block in the current image with allowance for inter-frame motion by a specified number of pixels (e.g., 8) in each direction.
  • Each comparison involves the computation of a mathematical distortion measure that quantifies the differences between the two blocks of image data.
  • One typical distortion measure is the sum of absolute differences (SAD) which corresponds to the sum of the absolute values of the corresponding pixel-to-pixel differences between the two blocks, although other distortion measures may also be used.
  • This selected block of reference image data is referred to as the "best integer-pixel location," because the distance between that block and the corresponding location of the block of current image data may be represented by a motion vector having X (horizontal) and Y (vertical) components that are both integers representing displacements in integer numbers of pixels.
  • the process of selecting the best integer-pixel location is referred to as full-pixel or integer-pixel motion estimation.
  • half-pixel motion estimation may be performed.
  • the block of current image data is compared to reference image data corresponding to different half-pixel locations surrounding the best integer-pixel location, where the comparison for each half-pixel location is based on interpolated reference image data.
  • the primary goal in video compression processing is to reduce the number of bits used to represent sequences of video images while still maintaining an acceptable level of image quality during playback of the resulting compressed video bitstream.
  • Another goal in many video compression applications is to maintain a relatively uniform bit rate, for example, to satisfy transmission bandwidth and/or playback processing constraints.
  • Video compression processing often involves the tradeoff between bit rate and playback quality. This tradeoff typically involves reducing the average number of bits per image in the original video sequence by selectively decreasing the playback quality in each image that is encoded into the compressed video bitstream. Alternatively or in addition, the tradeoff can involve skipping certain images in the original video sequence, thereby encoding only a subset of those original images into the resulting compressed video bitstream.
  • a video encoder may be able to skip additional images adaptively as needed to satisfy bit rate requirements.
  • the decision to skip an additional image is typically based on a distortion measure (e.g., SAD) of the motion-compensated interframe differences and only after motion estimation has been performed for the particular image.
  • SAD a distortion measure
  • the motion-compensated interframe differences derived from the motion estimation processing are then used to further encode the image data (e.g., depending on the exact video compression algorithm, using such techniques as discrete cosine transform (DCT) processing, quantization, run-length encoding, and variable-length encoding).
  • DCT discrete cosine transform
  • the motion-compensated interframe differences are no longer needed, and processing continues to the next image in the video sequence.
  • the present invention is directed to a technique for generating an estimate of a motion- compensated distortion measure for a particular image in a video sequence without actually having to perform motion estimation for that image.
  • the estimated distortion measure can be used during video encoding to determine whether to skip the image without first having to perform motion estimation.
  • motion estimation processing is avoided and the computational load of the video compression processing is accordingly reduced.
  • motion estimation processing can then be implemented, as needed, to generate motion-compensated interframe differences for subsequent compression processing. Under such a video compression scheme, motion estimation processing is implemented only when the resulting interframe differences will be needed to encode the corresponding image.
  • the present invention is a method for processing a sequence of video images, comprising the steps of (a) generating a raw distortion measure for a current image in the sequence relative to a reference image; (b) using the raw distortion measure to generate an estimate of a motion-compensated distortion measure for the current image relative to the reference image without having to perform motion estimation on the current image; (c) determining whether or how to encode the current image based on the estimate of the motion-compensated distortion measure; and (d) generating a compressed video bitstream for the sequence of video images based on the determination of step (c).
  • Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention
  • Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention
  • Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention.
  • Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention.
  • the particular raw distortion measure generated using the algorithm of Fig. 1 is a mean absolute difference MAD.
  • the algorithm in Fig. 1 can be interpreted as applying to gray-scale images in which each pixel is represented by a single multi-bit intensity value. It will be understood that the algorithm can be easily extended to color images in which each pixel is represented by two or more different multi-bit components (e.g., red, green, and blue components in an RGB format or an intensity (Y) and two color (U and V) components in a YUV format).
  • the algorithm of Fig. 1 distinguishes two different types of pixels in the current image: Type I being those pixels having an intensity value sufficiently similar to the corresponding pixel value in the reference image and Type II being those pixels having a pixel value sufficiently different from that of the corresponding pixel in the reference image.
  • the "corresponding" pixel is the pixel in the reference image having the same location (i.e., same row and column) as a pixel in the current image.
  • the pixels in that portion of the current image will typically be characterized as being of Type I.
  • relatively spatially uniform portions i.e., portions in which the pixels have roughly the same value
  • those pixels will also typically be characterized as being of Type I.
  • the absolute differences between the pixels in the current image and the corresponding pixels in the reference image will be relatively large and most of those current-image pixels will typically be characterized as being of Type ⁇ .
  • nl and n2 are counters for these two different types of pixels, respectively, and the variables distl and distl are intermediate distortion measures for these two different types of pixels, respectively.
  • these four variables are initialized to zero at Lines 1-2 in Fig. 1.
  • the absolute difference ad between the current pixel value and the corresponding pixel value in the reference frame is generated (Line 4). If ad is less than a specified threshold value thresh, then the current pixel is determined to be of Type I, and distl and nl are incremented by ad and 1, respectively (Line 5).
  • the current pixel is determined to be of Type ⁇ , and distl and nl are incremented by ad and 1, respectively (Line 6).
  • a typical threshold value for the parameter thresh is about 20.
  • the intermediate distortion measures distl and distl are then normalized in Lines 8 and 9, respectively.
  • relative movement of the person's head from frame to frame e.g., a side-to-side motion
  • will result in some portions of the wall being newly covered by pixels corresponding to the head and other portions of the wall that were previously occluded by the head being newly exposed.
  • the raw distortion measure MAD is a mean absolute difference that is corrected for double-image effects.
  • a typical value for the parameter factor is 0.5.
  • the term ⁇ distl*nl*( ⁇ -factor) corrects for double-image effects by treating pixels removed from Type ⁇ as Type I pixels so that the average distortion level in similar areas is added back.
  • the distortion distl of Type I pixels is considered as an estimate for the residual and coding noise. It is assumed that this cannot be removed by motion compensation.
  • the Type II pixels occupy roughly twice the area as compared to the "perfectly" motion-compensated images, and the term factor reflects this, and is nominally chosen as 0.5.
  • the term factor is allowed to vary, since motion compensation is typically not perfect.
  • the unoccluded region can be motion compensated; however, the fraction of pixels (n2*(l -factor)) is expected to have a residual plus coding noise similar to Type I pixels. Hence, the term distl*n2*(l- factor) is used as an estimate for distortion of these unoccluded Type ⁇ pixels.
  • Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention.
  • the particular distortion measure estimated using the algorithm of Fig. 2 is the motion-compensated mean absolute difference S.
  • the algorithm of Fig. 2 derives an estimate Se for the distortion measure S from the raw distortion measure MAD derived using the algorithm of Fig. 1. This estimated distortion measure Se can be used to determine whether to skip images during video encoding without having to perform motion estimation processing for each image.
  • the raw distortion measure MAD(T) for the current frame and the raw distortion measure MAD(I- ⁇ ) for the previous frame are used to determine a measure H of the percentage change in MAD from the previous frame to the current frame (Line 1 of Fig. 2).
  • Other suitable expressions characterizing the change in the raw distortion measure MAD from the previous frame to the current frame could also conceivably be used.
  • the estimated distortion measure Se(I) for the current frame is assumed to be the same as the actual motion- compensated distortion measure 5(/-l) for the previous frame (Line 3). Otherwise, if the percentage change H is less than a second threshold value Tl (Line 4) (where Tl is greater than Tl), then the estimated distortion measure Se ⁇ ) for the current frame is determined using the expression in Line 5, where the factor k is a parameter preferably specified between 0 and 1. Otherwise, the percentage change H is greater than the second threshold value Tl (Line 6) and the estimated distortion measure Se(I) for the current frame is determined using the expression in Line 7. Typical values for Tl and Tl are 0.1 and 0.5, respectively.
  • the raw distortion measure MAD(T) is a measure of the non-motion-compensated pixel differences between the current frame and its reference frame.
  • the raw distortion measure MAD(I-l) is a measure of the raw pixel differences between the previous frame and its reference frame, which may be the same as or different from the reference frame for the current frame.
  • the percentage change H is a measure of the relative change between the two raw distortion measures MAD(I) and MAD(I-l), which are themselves measures of rates of change between those images and their corresponding reference images. Motion compensation does a fairly good job predicting image data when there is little or no change in distortion from frame to frame.
  • the actual motion-compensated distortion measure 5(7-1) for the previous frame will be a good estimate Se(I) of the motion-compensated distortion measure 5(7) for the current frame, as in Line 3 of Fig. 2.
  • the distortion from frame to frame is changing (e.g., during a scene changes or other non-uniform changes in imagery)
  • motion compensation will not do as good a job predicting the image data.
  • the actual motion-compensated distortion measure 5(/-l) for the previous frame will not necessarily be a good indication of the actual motion-compensated distortion measure S(I) for the current frame.
  • the percentage change H is large (e.g., H>T1), it may be safer to estimate the actual motion-compensated distortion measure 5(7) for the current frame from the raw distortion measure MAD(I) for the current frame, as in the expression in Line 7 of Fig. 2.
  • the estimated distortion measure Se generated using the algorithms of Figs. A and B can be used to determine whether to skip the current image, that is, whether to avoid encoding the current image into the compressed video bitstream during video encoding processing.
  • an adaptive frame-skipping scheme enables a video coder to maintain control over the transmitted frame rate and the quality of the reference frames. In cases of high motion, this ensures a graceful degradation in frame quality and the frame rate.
  • the coder can be in one of two states: steady state or transient state.
  • steady state all attempts are made to meet a specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate.
  • the coder switches into a transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted.
  • transient state typically occurs during scene changes and sudden large motions. It is desirable for the coder to go from the transient state to the steady state in a relatively short period of time.
  • images may be designated as the following different types of frames for compression processing: o An intra (I) frame which is encoded using only intra-frame compression techniques, o A predicted (P) frame which is encoded using inter-frame compression techniques based on a previous I or P frame, and which can itself be used as a reference frame to encode one or more other frames, o A bi-directional (B) frame which is encoded using bi-directional inter-frame compression techniques based on a previous I or P frame and a subsequent I or P frame, and which cannot be used to encode another frame, and o A PB frame which corresponds to two images — a P frame and a temporally preceding B frame — that are encoded as a single frame with a single set of overhead data (as in the H.263 video compression algorithm).
  • I intra
  • P predicted
  • B bi-directional
  • B frames H.263+, MPEG
  • PB frames H.263
  • the B and PB frames are used for two purposes in two different situations.
  • the system is designed for applications where control over the rate and quality of reference frames is required.
  • the parameters that are adjusted include the rate for the frame, the acceptable distortion level in the frame, and the frame-rate. An attempt is made to maintain these parameters by performing an intelligent mode decision as to when to encode B or PB frames and by intelligently skipping frames, when warranted.
  • R Number of bits needed to encode the current frame as a P frame.
  • R Number of bits needed to encode the current frame as a P frame.
  • the same model can be applied for a B frame except with a quantizer that is typically higher than that of a corresponding P frame;
  • H Number of bits needed to encode the overhead (i.e., header and motion information);
  • XI, XI Parameters of quadratic model, which are recursively updated from frame to frame.
  • the estimate Se generated using the algorithms of Figs. A and B without have to perform motion estimation, is preferably used in Equation (1 ) for the motion-compensated distortion measure S.
  • MAD Raw distortion measure for the current frame where the distortion measure is based on the mean absolute difference.
  • H Overhead bits (e.g., for motion vectors) other than bits used to transmit residuals for the current frame. If this information is unavailable, H is assumed to be zero.
  • CBR constant bit rate
  • smin Smallest skip desired for encoding the next frame (e.g., 1 / average target frame rate).
  • smax Largest skip allowed between frames at steady state.
  • skip Pointer corresponding to the number of frames to skip from the previously encoded frame.
  • Bframeskip Pointer corresponding to frame stored as a potential B frame.
  • B Buffer occupancy at frame skip before encoding frame skip.
  • B Bp-(Rp*skip), where Bp is the buffer occupancy after encoding the previous frame.
  • PCI Similar to PCFD1, except that the determination is made after motion estimation, e.g., by comparing the average motion vector magnitude to a specified threshold level.
  • Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention.
  • the algorithm contains seven routines: START, LOOP1-LOOP5, and TRANSIENT.
  • START is called during steady-state processing after coding a reference frame.
  • TRANSIENT is called during transient processing.
  • START is called during steady-state processing after coding a reference frame.
  • TRANSIENT is called during transient processing.
  • all attempts are made to meet the preset specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate.
  • the coder switches into the transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted.
  • the transient state typically occurs at the start of the transmission, during scene changes, and during sudden large motions.
  • the processing of the START routine begins at Line Al in Fig. 3 A with the initialization of the current frame pointer skip to the minimum skip value smin.
  • the smallest frame skip value may be 2, corresponding to a coding scheme in which an attempt is made to encode every other image in the original video sequence.
  • the raw distortion measure MAD is computed for the current frame skip using the algorithm of Fig. 1.
  • Equation (1) is then evaluated using Se to estimate R, the number of bits needed to encode the current frame as a P frame. If encoding the current frame as a P frame does not make the buffer too full, then the flag PCFDI is set to 1 (i.e., true). Otherwise, PCFD1 is set to 0 (i.e., false).
  • PCFDI is true (Line A2) indicating that the current frame can be transmitted as a P frame
  • motion estimation is performed for the current frame
  • the actual motion-compensated distortion measure 5 is calculated
  • the number of bit R is reevaluated using S in Equation (1) instead of Se
  • the values for flags PCI and PC2 are determined (Line A3).
  • the flag PCI indicates the impact to the buffer from encoding the current frame skip as a P frame based on the motion-compensated distortion measure 5.
  • PCI is set to 1 if frame skip can be encoded as a P frame.
  • the flag PCI indicates whether the motion estimation results indicate that motion (e.g., average motion vector magnitude for frame) is larger than a specified threshold. If so, then PCI is set to 1.
  • the LOOP1 routine starts by storing the current frame smin as a possible B frame
  • the next frame is selected by setting skip equal to 2*smin (Line B8).
  • the LOOP2 routine is called when there is not enough room in the buffer to transmit the current frame skip-smin as a P frame. Under those circumstances, frame smin will not be encoded and the LOOP2 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
  • the parameter skip is set to smin+l to point to the next frame in the video sequence (Line Cl in Fig. 3B), and the frames from smin+l to smin+floor(smin/2), where "floor” is a truncation operation, are then sequentially analyzed (Lines C2, C14, C15) to see if any of them can be encoded (Lines C3-C13).
  • the number of bits to encode are calculated based on the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line C3).
  • the flag PCFDI is set without actually performing motion estimation, by comparing the raw distortion measure MAD to a specified threshold level. If MAD is greater than the threshold level, then motion is assumed to be large and PCFDI is set to 1.
  • the LOOP3 routine is called when the processing in the LOOP2 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP3 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
  • the parameter skip is set to •_m +floor(_'m /2)+l (Line Dl in Fig. 3B), and the frames from there up to 2*smin-l are then sequentially analyzed (Lines D2, D5, D6) to see if any of them can be encoded (Lines D3-D4).
  • Initializing the parameter skip to smin+f ⁇ oor(smin/2)+l allows the P and the B frames to be closer together for the given B skip, which improves coding efficiency in an H.263 PB frame when the P and B frames are tightly coupled. With true B frames, this strategy may need to be changed.
  • the number of bits R to encode are calculated based on the estimated distortion measure Se generated from the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line D3). If both those conditions are met, then the current frame skip is encoded as a P frame, and processing returns to the START routine (Line D4).
  • the LOOP4 routine is called when the processing in the LOOP3 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP4 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
  • the parameter skip is set to 2*smin+l (Line El in Fig. 3C), and the frames from there up to smax-l are then sequentially analyzed (Lines E2, E6, E7) to see if any of them can be encoded (Lines E3-E5).
  • the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line E3).
  • the LOOP5 routine is called when the processing in the LOOP4 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP5 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
  • the parameter skip is set to smax+ ⁇ (Line FI in Fig. 3C), and the frames from there up to smin + smax are then sequentially analyzed (Lines F2, F5, F6) to see if any of them can be encoded (Lines F3-F4).
  • the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line F3).
  • the TRANSIENT routine is called when the processing in the LOOP5 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, processing switches from the steady state into the transient state, where the TRANSIENT routine selects one or more frames for encoding as P frames until the TRANSIENT routine determines that processing can return to the steady state. In alternative embodiments, the TRANSIENT routine may encode at least some of the frames as B frames.
  • the algorithm presented in Figs. 3A-3C provides a complete approach to frame skipping, PB decision, and quality control when the quantizer step variation is constrained to be within certain bounds from one reference frame to the next.
  • the scheme maintains the user-defined minimum frame rate during steady-state operations and attempts to transmit data at a high quality and at an "acceptable" frame rate (greater than the minimum frame rate). It provides a graceful degradation in quality and frame rate when there is an increase in motion or complexity. B frames are used both for improving the frame rate and the coded quality. However, in situations of scene change or when the motion increases very rapidly, the demands of frame rate or reference frame quality may be unable to be met. In this situation, processing goes into a transient state to "catch up" and slowly re-enter a new steady state.
  • the scheme requires minimal additional computational complexity and no additional storage (beyond that required to store the incoming frames).
  • the present invention can be embodied in the form of methods and apparatuses for practicing those methods.
  • the present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • the present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
  • program code When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.

Abstract

A raw distortion measure (e.g., MAD), relative to a reference frame, is generated for the current image in a video stream. The raw distortion measure is then used to generate an estimate (e.g., Se) of a motion-compensated distortion measure (e.g., S) for the current image relative to the reference image without having to perform motion estimation on the current image. The estimate of the motion-compensated distortion measure is then used to determine whether or how to encode the current image, and a compressed video bitstream is generated for the sequence of video images based, in part, on that determination. The present invention enables a video coder to determine when to skip frames without first having to perform computationally expensive motion estimation and motion compensation processing, which computation load would otherwise be wasted for frames that are skipped.

Description

FRAME SKIPPING WITHOUT HAVING TO PERFORM MOTION ESTIMATION
R ACKGROUND OF THE INVENTION Field of the Invention
The present invention relates to image processing, and, in particular, to video compression.
Cross-Reference to Related Applications
This application claims the benefit of the filing date of U.S. provisional application no. 60/100,939, filed on 09/18/98 as attorney docket no. SAR 12728PROV.
Description of the Related Art
In video compression processing, it is known to encode images using motion-compensated inter-frame differencing in which blocks of image data are encoded based on the pixel-to-pixel differences between each block in an image currently being encoded and a selected block in a reference image. The process of selecting a block in the reference image for a particular block in the current image is called motion estimation. The goal of motion estimation is to find a block in the reference image that closely matches the block in the current image such that the magnitudes of the pixel-to-pixel differences between those two blocks are small, thereby enabling the block in the current image to be encoded in the resulting compressed bitstream using a relatively small number of bits. In a typical motion estimation algorithm, a block in the current image is compared with different blocks of the same size and shape within a defined search region in the reference image. The search region is typically defined based on the corresponding location of the block in the current image with allowance for inter-frame motion by a specified number of pixels (e.g., 8) in each direction. Each comparison involves the computation of a mathematical distortion measure that quantifies the differences between the two blocks of image data. One typical distortion measure is the sum of absolute differences (SAD) which corresponds to the sum of the absolute values of the corresponding pixel-to-pixel differences between the two blocks, although other distortion measures may also be used.
There are a number of methods for identifying the block of reference image data that "best" matches the block of current image data. In a "brute force" exhaustive approach, each possible comparison over the search region is performed and the best match is identified based on the lowest distortion value. In order to reduce the computational load, alternative schemes, such as log-based or layered schemes, are often implemented in which only a subset of the possible comparisons are performed. In either case, the result is the selection of a block of reference image data as the block that "best" matches the block of current image data. This selected block of reference image data is referred to as the "best integer-pixel location," because the distance between that block and the corresponding location of the block of current image data may be represented by a motion vector having X (horizontal) and Y (vertical) components that are both integers representing displacements in integer numbers of pixels. The process of selecting the best integer-pixel location is referred to as full-pixel or integer-pixel motion estimation.
In order to improve the overall encoding scheme even further, half-pixel motion estimation may be performed. In half-pixel motion estimation, after performing integer-pixel motion estimation to select the best integer-pixel location, the block of current image data is compared to reference image data corresponding to different half-pixel locations surrounding the best integer-pixel location, where the comparison for each half-pixel location is based on interpolated reference image data.
Even though some of these motion estimation techniques require fewer computations than other techniques, they all require a significant computational effort.
The primary goal in video compression processing is to reduce the number of bits used to represent sequences of video images while still maintaining an acceptable level of image quality during playback of the resulting compressed video bitstream. Another goal in many video compression applications is to maintain a relatively uniform bit rate, for example, to satisfy transmission bandwidth and/or playback processing constraints.
Video compression processing often involves the tradeoff between bit rate and playback quality. This tradeoff typically involves reducing the average number of bits per image in the original video sequence by selectively decreasing the playback quality in each image that is encoded into the compressed video bitstream. Alternatively or in addition, the tradeoff can involve skipping certain images in the original video sequence, thereby encoding only a subset of those original images into the resulting compressed video bitstream.
Conventional video compression algorithms dictate a regular pattern of image skipping, e.g., skip every other image in the original video sequence. In addition, a video encoder may be able to skip additional images adaptively as needed to satisfy bit rate requirements. The decision to skip an additional image is typically based on a distortion measure (e.g., SAD) of the motion-compensated interframe differences and only after motion estimation has been performed for the particular image. When the decision is made not to skip the current frame, the motion-compensated interframe differences derived from the motion estimation processing are then used to further encode the image data (e.g., depending on the exact video compression algorithm, using such techniques as discrete cosine transform (DCT) processing, quantization, run-length encoding, and variable-length encoding). On the other hand, when the decision is made to skip the current frame, the motion-compensated interframe differences are no longer needed, and processing continues to the next image in the video sequence.
SUMMARY OF THE INVENTION The present invention is directed to a technique for generating an estimate of a motion- compensated distortion measure for a particular image in a video sequence without actually having to perform motion estimation for that image. In preferred embodiments, the estimated distortion measure can be used during video encoding to determine whether to skip the image without first having to perform motion estimation. When the decision is made to skip the image, motion estimation processing is avoided and the computational load of the video compression processing is accordingly reduced. When the decision is made to encode the image, motion estimation processing can then be implemented, as needed, to generate motion-compensated interframe differences for subsequent compression processing. Under such a video compression scheme, motion estimation processing is implemented only when the resulting interframe differences will be needed to encode the corresponding image.
According to one embodiment, the present invention is a method for processing a sequence of video images, comprising the steps of (a) generating a raw distortion measure for a current image in the sequence relative to a reference image; (b) using the raw distortion measure to generate an estimate of a motion-compensated distortion measure for the current image relative to the reference image without having to perform motion estimation on the current image; (c) determining whether or how to encode the current image based on the estimate of the motion-compensated distortion measure; and (d) generating a compressed video bitstream for the sequence of video images based on the determination of step (c).
BRIEF DESCRIPTION OF THE DRAWINGS Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which: Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention;
Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention; and
Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention.
DETAILED DESCRIPTION Generating a Raw Distortion Measure for a Current Image
Fig. 1 shows pseudocode for an algorithm for generating a raw (i.e., non-motion-compensated) distortion measure for an image, according to one embodiment of the present invention. The particular raw distortion measure generated using the algorithm of Fig. 1 is a mean absolute difference MAD. The algorithm in Fig. 1 can be interpreted as applying to gray-scale images in which each pixel is represented by a single multi-bit intensity value. It will be understood that the algorithm can be easily extended to color images in which each pixel is represented by two or more different multi-bit components (e.g., red, green, and blue components in an RGB format or an intensity (Y) and two color (U and V) components in a YUV format).
The algorithm of Fig. 1 distinguishes two different types of pixels in the current image: Type I being those pixels having an intensity value sufficiently similar to the corresponding pixel value in the reference image and Type II being those pixels having a pixel value sufficiently different from that of the corresponding pixel in the reference image. In this algorithm, the "corresponding" pixel is the pixel in the reference image having the same location (i.e., same row and column) as a pixel in the current image.
When there is no motion in the imagery depicted, between a portion of the reference image and the corresponding portion of the current image, then the pixels in that portion of the current image will typically be characterized as being of Type I. Similarly, when there is motion between relatively spatially uniform portions (i.e., portions in which the pixels have roughly the same value), those pixels will also typically be characterized as being of Type I. If, however, there is motion between spatially non-uniform portions, the absolute differences between the pixels in the current image and the corresponding pixels in the reference image will be relatively large and most of those current-image pixels will typically be characterized as being of Type π. The variables nl and n2 are counters for these two different types of pixels, respectively, and the variables distl and distl are intermediate distortion measures for these two different types of pixels, respectively. For each new image, these four variables are initialized to zero at Lines 1-2 in Fig. 1. For each pixel in the current image (Line 3), the absolute difference ad between the current pixel value and the corresponding pixel value in the reference frame is generated (Line 4). If ad is less than a specified threshold value thresh, then the current pixel is determined to be of Type I, and distl and nl are incremented by ad and 1, respectively (Line 5). Otherwise, the current pixel is determined to be of Type π, and distl and nl are incremented by ad and 1, respectively (Line 6). In order to pick up significant edges, a typical threshold value for the parameter thresh is about 20. The intermediate distortion measures distl and distl are then normalized in Lines 8 and 9, respectively. In the case of the video-conferencing paradigm of a talking head in front of a uniform background (e.g., a uniformly painted wall), relative movement of the person's head from frame to frame (e.g., a side-to-side motion) will result in some portions of the wall being newly covered by pixels corresponding to the head and other portions of the wall that were previously occluded by the head being newly exposed. Such a situation will result in two different significant edges in the raw interframe differences: one edge corresponding to those portions of the background newly covered by the head and a second edge corresponding to those portions of the background newly uncovered by the head. These two edges are referred to as double-image effects.
The raw distortion measure MAD, generated using the expression in Line 10, is a mean absolute difference that is corrected for double-image effects. In order to avoid double counting of significant edges, a typical value for the parameter factor is 0.5. The term {distl*nl*(\-factor) corrects for double-image effects by treating pixels removed from Type π as Type I pixels so that the average distortion level in similar areas is added back. The distortion distl of Type I pixels is considered as an estimate for the residual and coding noise. It is assumed that this cannot be removed by motion compensation. The Type II pixels occupy roughly twice the area as compared to the "perfectly" motion-compensated images, and the term factor reflects this, and is nominally chosen as 0.5. The term factor is allowed to vary, since motion compensation is typically not perfect. It is assumed that the unoccluded region can be motion compensated; however, the fraction of pixels (n2*(l -factor)) is expected to have a residual plus coding noise similar to Type I pixels. Hence, the term distl*n2*(l- factor) is used as an estimate for distortion of these unoccluded Type π pixels.
Generating an Estimated Motion-Compensated Distortion Measure from the Raw Distortion Measure
Fig. 2 shows pseudocode for an algorithm for estimating a motion-compensated distortion measure for an image, according to one embodiment of the present invention. The particular distortion measure estimated using the algorithm of Fig. 2 is the motion-compensated mean absolute difference S. The algorithm of Fig. 2 derives an estimate Se for the distortion measure S from the raw distortion measure MAD derived using the algorithm of Fig. 1. This estimated distortion measure Se can be used to determine whether to skip images during video encoding without having to perform motion estimation processing for each image.
According to the algorithm of Fig. 2, the raw distortion measure MAD(T) for the current frame and the raw distortion measure MAD(I-\) for the previous frame are used to determine a measure H of the percentage change in MAD from the previous frame to the current frame (Line 1 of Fig. 2). Other suitable expressions characterizing the change in the raw distortion measure MAD from the previous frame to the current frame could also conceivably be used.
If the percentage change H is less than a first threshold value Tl (Line 2), then the estimated distortion measure Se(I) for the current frame is assumed to be the same as the actual motion- compensated distortion measure 5(/-l) for the previous frame (Line 3). Otherwise, if the percentage change H is less than a second threshold value Tl (Line 4) (where Tl is greater than Tl), then the estimated distortion measure Se{ϊ) for the current frame is determined using the expression in Line 5, where the factor k is a parameter preferably specified between 0 and 1. Otherwise, the percentage change H is greater than the second threshold value Tl (Line 6) and the estimated distortion measure Se(I) for the current frame is determined using the expression in Line 7. Typical values for Tl and Tl are 0.1 and 0.5, respectively.
The motivation behind the processing of Fig. 2 is as follows. The raw distortion measure MAD(T) is a measure of the non-motion-compensated pixel differences between the current frame and its reference frame. Similarly, the raw distortion measure MAD(I-l) is a measure of the raw pixel differences between the previous frame and its reference frame, which may be the same as or different from the reference frame for the current frame. The percentage change H is a measure of the relative change between the two raw distortion measures MAD(I) and MAD(I-l), which are themselves measures of rates of change between those images and their corresponding reference images. Motion compensation does a fairly good job predicting image data when there is little or no change in distortion from frame to frame. As such, when the percentage change H is small (e.g., H<T1), the actual motion-compensated distortion measure 5(7-1) for the previous frame will be a good estimate Se(I) of the motion-compensated distortion measure 5(7) for the current frame, as in Line 3 of Fig. 2. However, when the distortion from frame to frame is changing (e.g., during a scene changes or other non-uniform changes in imagery), motion compensation will not do as good a job predicting the image data. In these situations, the actual motion-compensated distortion measure 5(/-l) for the previous frame will not necessarily be a good indication of the actual motion-compensated distortion measure S(I) for the current frame. Thus, when the percentage change H is large (e.g., H>T1), it may be safer to estimate the actual motion-compensated distortion measure 5(7) for the current frame from the raw distortion measure MAD(I) for the current frame, as in the expression in Line 7 of Fig. 2. Selecting the factor k to be between 0 to 1 (e.g., preferably 0.8) assumes that motion-compensation will typically reduce the distortion measure by some specified degree.
The expression in Line 5 of Fig. 2 provides a linear interpolation between these two "extreme" cases for situations where the percentage change H is neither small nor large (e.g., T1<H<T1). As such, the algorithm of Fig. 2 provides a piecewise-linear, continuous relationship between the raw distortion measure MAD and the estimated motion-compensated distortion measure Se for all values of MAD. Experiment results confirm that the algorithms of Figs. A and B provide a reliable estimate Se of the actual motion-compensated distortion measure S, where the estimated distortion measure Se is almost always within 20% of the actual distortion measure S, and usually within 10-15%.
Determining Whether To Skip the Current Image Using the Estimated Distortion Measure
The estimated distortion measure Se generated using the algorithms of Figs. A and B can be used to determine whether to skip the current image, that is, whether to avoid encoding the current image into the compressed video bitstream during video encoding processing. In one embodiment of the present invention, an adaptive frame-skipping scheme enables a video coder to maintain control over the transmitted frame rate and the quality of the reference frames. In cases of high motion, this ensures a graceful degradation in frame quality and the frame rate.
The coder can be in one of two states: steady state or transient state. In the steady state, all attempts are made to meet a specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate. When it becomes impossible to maintain even the minimum frame rate, the coder switches into a transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted. In addition to the start of transmission, the transient state typically occurs during scene changes and sudden large motions. It is desirable for the coder to go from the transient state to the steady state in a relatively short period of time. Depending on the video compression algorithm, images may be designated as the following different types of frames for compression processing: o An intra (I) frame which is encoded using only intra-frame compression techniques, o A predicted (P) frame which is encoded using inter-frame compression techniques based on a previous I or P frame, and which can itself be used as a reference frame to encode one or more other frames, o A bi-directional (B) frame which is encoded using bi-directional inter-frame compression techniques based on a previous I or P frame and a subsequent I or P frame, and which cannot be used to encode another frame, and o A PB frame which corresponds to two images — a P frame and a temporally preceding B frame — that are encoded as a single frame with a single set of overhead data (as in the H.263 video compression algorithm).
According to one embodiment of the present invention, in the transient state, only I and P frames are allowed, while, in the steady state, B frames (H.263+, MPEG) and PB frames (H.263) are also allowed. In the steady state, the B and PB frames are used for two purposes in two different situations. First, when motion is large, B frames are used to increase the frame rate to acceptable levels. Second, when motion is small, using B and/or PB frames enables achievement of higher compression efficiency. The system is designed for applications where control over the rate and quality of reference frames is required. The parameters that are adjusted include the rate for the frame, the acceptable distortion level in the frame, and the frame-rate. An attempt is made to maintain these parameters by performing an intelligent mode decision as to when to encode B or PB frames and by intelligently skipping frames, when warranted.
These decisions are based on estimates of rate and distortion parameters that are measured as the frames are read into the frame buffer, which fit in very well with the H.263+ Video Codec Near- Term Model 8 (TMN 8, Study Group 16, ITU-T, Document Q15-A-59, Release 0, June 1997), and certain rate control schemes that can be used for MPEG and H.263. The strategy also ensures that the minimal amount of storage is used for the incoming frames that are encoded. Other strategies are possible that use more storage but enable maintenance of a better control on frame rate and reference frame quality. The present strategy also ensures that computational overhead for extra motion estimation is minimal. If additional computational power is available for motion estimation, the performance of the algorithm can be improved further. This strategy is based on a quadratic rate distortion model that relates the rate for encoding the frame and the SAD (sum of absolute differences) after motion compensation. This model is shown in Equation (1) as follows:
(R-H) / S = X1/Q + X1 / (Q**2), (1) where: R: Number of bits needed to encode the current frame as a P frame. The same model can be applied for a B frame except with a quantizer that is typically higher than that of a corresponding P frame;
H: Number of bits needed to encode the overhead (i.e., header and motion information);
5: Motion-compensated interframe SAD over the current frame; Q: Average quantizer step size over the previous frame; and
XI, XI: Parameters of quadratic model, which are recursively updated from frame to frame.
Since it is desirable to avoid estimating motion for frames that are not going to be encoded, the estimate Se, generated using the algorithms of Figs. A and B without have to perform motion estimation, is preferably used in Equation (1 ) for the motion-compensated distortion measure S.
Although the model has been described using the sum of absolute differences as the cost function, the present invention can be implemented using other suitable cost functions.
Consider, as an example, a sequence of three frames A, C, and E, where temporally Frame A is the first of the three frames and Frame E is the last of the three frames. The following discussion is for PB frames or for coding schemes having at most one B frame between reference frames.
Generalizations to coders with more than one B frame in between reference frames will be described later. Assuming that Frame A is encoded as a reference frame (i.e., as either an I or a P frame), the decision needs to be made as to how to encode Frames C and E, if at all. The following four choices are possible: ( 1 ) Encode Frame C as a B frame and encode Frame E as a reference frame;
(2) Encode Frames C and E together as a PB frame;
(3) Encode Frame C as a reference frame and restart process to determine how to encode Frame E; and
(4) Skip Frame C and encode Frame E as a reference frame. If possible, it is desirable to encode Frames C and E together as a PB frame. When motion is large and the buffer occupancy is not too high, Frame C may need to be encoded as a reference frame, in which case, the process is restarted to determine how to encode Frame E. When motion is large and the buffer occupancy is too high, Frame C may need to be skipped, in which case, Frame E will be encoded as a reference frame. The subsequent discussion assumes that the time reference is at Frame
A.
Notation
The following notation is used in the algorithm described in detail later in this specification.
MAD Raw distortion measure for the current frame, where the distortion measure is based on the mean absolute difference. S Actual motion-compensated distortion measure for the current frame, where the distortion measure is based on the mean absolute difference.
Se Estimate of the actual motion-compensated distortion measure S for the current frame, where the distortion measure is based on the raw mean absolute difference MAD.
R Number of bits needed to encode the current frame, generated according to Equation (1) using either the estimated distortion measure Se or the actual distortion measure S.
H Overhead bits (e.g., for motion vectors) other than bits used to transmit residuals for the current frame. If this information is unavailable, H is assumed to be zero.
Rp Bits output to the channel in one picture interval in the constant bit rate (CBR) case. smin Smallest skip desired for encoding the next frame (e.g., 1 / average target frame rate). smax Largest skip allowed between frames at steady state. skip Pointer corresponding to the number of frames to skip from the previously encoded frame.
Bframeskip Pointer corresponding to frame stored as a potential B frame.
Bmax Total size of the buffer.
B Buffer occupancy at frame skip before encoding frame skip. For a constant bit rate channel, B=Bp-(Rp*skip), where Bp is the buffer occupancy after encoding the previous frame.
The algorithm relies on the following flags:
PCFD1 : Indicates whether there is enough room in the buffer to transmit the current frame as a P frame, where that determination is made without first performing motion estimation for the current frame. In one embodiment, if (R(Se) + B < x*Bmax), where R is generated using Equation (1) based on the estimated distortion measure Se, then there is room in the buffer and PCFD1=\ . Otherwise, there is not enough room in the buffer and PCFD1=0. In one implementation, t=80%, although the tightness of the constraint can be varied by changing the value of x. PCI: Similar to PCFD1, except that R is generated using Equation (1) after performing motion estimation and based on the actual distortion measure S.
PCFD . Indicates whether motion in the current frame relative to its reference frame is "large," where that determination is made without first performing motion estimation for the current frame. In this case, the magnitude of motion is based on the raw distortion measure MAD. If MAD is greater than a specified threshold level, then motion is said to be large and PCFDl=l. Otherwise, motion is not large and PCFD1=0.
PCI: Similar to PCFD1, except that the determination is made after motion estimation, e.g., by comparing the average motion vector magnitude to a specified threshold level.
PBCFD: Indicates whether the current frame and a previous frame stored as a potential B frame can be coded together as a PB frame, where that determination is made without first performing motion estimation for the current frame. In one embodiment, if (R(Se) + (bits to encode B frame) + B < x*Bmax), then the two frames can be encoded together as a PB frame and PBCFD=l . Otherwise, they cannot and PBCFD=Q.
Pmeet Indicates whether a previously stored frame can be transmitted as a P frame. If so, Pmeet=l.
Figs. 3A-3C provide pseudocode for an algorithm for determining what frames to code and how to code them, according to one embodiment of the present invention. The algorithm contains seven routines: START, LOOP1-LOOP5, and TRANSIENT. START is called during steady-state processing after coding a reference frame. TRANSIENT is called during transient processing. As described earlier, in the steady state, all attempts are made to meet the preset specified frame rate, and, if this is not possible, an attempt is made to maintain a certain minimum frame rate. When it becomes impossible to maintain even the minimum frame rate, the coder switches into the transient state, where large frame skips are allowed until the buffer level depletes and the next frame can be transmitted. The transient state typically occurs at the start of the transmission, during scene changes, and during sudden large motions.
START Routine
The processing of the START routine begins at Line Al in Fig. 3 A with the initialization of the current frame pointer skip to the minimum skip value smin. For example, in one embodiment, the smallest frame skip value may be 2, corresponding to a coding scheme in which an attempt is made to encode every other image in the original video sequence. The raw distortion measure MAD is computed for the current frame skip using the algorithm of Fig. 1. After using the algorithm of Fig. 2 to generate the estimated motion-compensated distortion measure Se from the raw distortion measure MAD, Equation (1) is then evaluated using Se to estimate R, the number of bits needed to encode the current frame as a P frame. If encoding the current frame as a P frame does not make the buffer too full, then the flag PCFDI is set to 1 (i.e., true). Otherwise, PCFD1 is set to 0 (i.e., false).
If PCFDI is true (Line A2) indicating that the current frame can be transmitted as a P frame, then motion estimation is performed for the current frame, the actual motion-compensated distortion measure 5 is calculated, the number of bit R is reevaluated using S in Equation (1) instead of Se, and the values for flags PCI and PC2 are determined (Line A3). The flag PCI indicates the impact to the buffer from encoding the current frame skip as a P frame based on the motion-compensated distortion measure 5. Like PCFDI, PCI is set to 1 if frame skip can be encoded as a P frame. The flag PCI indicates whether the motion estimation results indicate that motion (e.g., average motion vector magnitude for frame) is larger than a specified threshold. If so, then PCI is set to 1. If there is enough room in the buffer to encode frame skip as a P frame (Line A4) and if the estimated motion is large (Line A5), then the current frame skip is encoded as a P frame and processing returns to the beginning of the START routine to determine how to encode the next frame in the sequence (Line A6). Otherwise, if there is enough room in the buffer to encode frame skip as a P frame (Line A4), but the estimated motion is not large (Lines A5 and A7), then the flag Pmeet is set to 1 indicating that there is enough room in the buffer to transmit frame skip as a P frame and processing continues to the LOOP1 routine (Line A8). Otherwise, if there is not enough room in the buffer to encode frame skip as a P frame (Lines A4 and A 10), then the flag Pmeet is set to 0 and processing continues to the LOOP2 routine (Line All). Similarly, if the estimated impact to the buffer based on the raw distortion measure indicates that frame skip cannot be transmitted as a P frame (Lines A2 and A13), then the flag Pmeet is set to 0 and processing continues to the LOOP2 routine (Line A14).
LOOP1 Routine
As described in the previous section, the LOOP1 routine is called when there is enough room in the buffer to encode the current frame skip=smin as a P frame, but the motion is not large. Under those circumstances, frame smin will be encoded either (1) as a B frame followed by a P frame or (2) in combination with a subsequent frame as a PB frame.
In particular, the LOOP1 routine starts by storing the current frame smin as a possible B frame
(Line Bl in Fig. 3A). The parameter skip is then incremented (Line B2) and the frames from smin+l to 2*smin-l are then sequentially checked (Lines B3, B6, B7) to see if any of them can be encoded as a P frame (Lines B4 and B5). This is done by estimating the impact to the buffer and the size of the motion without performing motion estimation (Line B4). If there is enough room in the buffer and the motion is large, then the frame skip is encoded as a P frame and processing returns to the beginning of the START routine for the next frame in the video sequence (Line B5). Note that, when smin is 2, only skip - 3 is evaluated in this "do while" loop.
If these conditions are not met for any of these frames, then the next frame is selected by setting skip equal to 2*smin (Line B8). The number of bits R needed to encode frame skip as a P frame are then estimated without performing motion estimation and the flag PBCFD is set (Line B9). If it is estimated that there is enough room in the buffer to encode frame smin and frame skip as a PB frame, then PBCFD is set to 1. If that condition is satisfied, then motion estimation is performed for frame skip, and frames smin and skip=2*smin are encoded together as a PB frame (Line BIO). Otherwise, there is not enough room to encode those frames as a PB frame, and frame smin is encoded as a P frame (Line Bl 1). In either case, processing returns to the START routine (Line B12).
LOOP2 Routine
As described in the section for the START routine, the LOOP2 routine is called when there is not enough room in the buffer to transmit the current frame skip-smin as a P frame. Under those circumstances, frame smin will not be encoded and the LOOP2 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
In particular, the parameter skip is set to smin+l to point to the next frame in the video sequence (Line Cl in Fig. 3B), and the frames from smin+l to smin+floor(smin/2), where "floor" is a truncation operation, are then sequentially analyzed (Lines C2, C14, C15) to see if any of them can be encoded (Lines C3-C13). For each frame analyzed, the number of bits to encode are calculated based on the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line C3). The flag PCFDI is set without actually performing motion estimation, by comparing the raw distortion measure MAD to a specified threshold level. If MAD is greater than the threshold level, then motion is assumed to be large and PCFDI is set to 1.
If there is room in the buffer to encode the current frame skip as a P frame (Line C4) and if the motion is large (Line C5), then motion estimation is performed and the impact to the buffer (PCI) and the motion (PC ) are reevaluated using the actual distortion measure 5 (Line C6). If there is still enough room in the buffer (Line C7) and the motion is still large (Line C8), then the current frame skip is encoded as a P frame and processing returns to the START routine (Line C8). Otherwise, if the motion-compensated results indicate that there is enough room in the buffer (Line C7), but the actual motion is not large (Line C8 and C9), then the current frame skip is stored as a B frame, the pointer Bframeskip is set equal to skip, the flag Pmeet is set to 1 indicating that there is enough room in the buffer to transmit frame skip as a P frame, and processing continues to the LOOP3 routine (Line C9). Otherwise, if the motion-compensated results indicate that there is not enough room in the buffer (Lines C7 and Cl 1), then the current frame skip is stored as a B frame, the pointer Bframeskip is set equal to skip, the flag Pmeet is set to 0 indicating that there is not enough room in the buffer to transmit frame skip as a P frame, and processing continues to the LOOP3 routine (Line Cl 1). If the non-motion-compensated data indicate that there is enough room in the buffer (Line C4), but the estimated motion is not large (Lines C5 and C13), then the current frame skip is stored as a B frame, the pointer Bframeskip is set equal to skip, the flag Pmeet is set to 1 indicating that there is enough room in the buffer to transmit frame skip as a P frame, and processing continues to the LOOP3 routine (Line C13). If, however, the non-motion-compensated data indicate that there is not enough room in the buffer (Lines C4 and C14), then processing continues to the next frame (Line C14).
If none of the frames from sim+\ to smin+flooτ(sminl2) satisfies the condition of Line C4, then the flag Pmeet is set to 0 indicating that there is not enough room in the buffer to transmit the last frame skip=smin+floor(smin/2) as a P frame, and processing continues to the LOOP3 routine (Line C16).
LOOP3 Routine
As indicated in the previous section, the LOOP3 routine is called when the processing in the LOOP2 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP3 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
In particular, the parameter skip is set to •_m +floor(_'m /2)+l (Line Dl in Fig. 3B), and the frames from there up to 2*smin-l are then sequentially analyzed (Lines D2, D5, D6) to see if any of them can be encoded (Lines D3-D4). Initializing the parameter skip to smin+fϊoor(smin/2)+l allows the P and the B frames to be closer together for the given B skip, which improves coding efficiency in an H.263 PB frame when the P and B frames are tightly coupled. With true B frames, this strategy may need to be changed. For each frame analyzed, the number of bits R to encode are calculated based on the estimated distortion measure Se generated from the raw distortion measure MAD and the flags PCFDI and PCFDI are set to indicate whether there is room in the buffer and whether motion is large, respectively (Line D3). If both those conditions are met, then the current frame skip is encoded as a P frame, and processing returns to the START routine (Line D4).
If the end of the range of frames up to 2*smin-\ is reached without encoding any of them as a P frame, then skip is set equal to the next frame 2*smin (Line D7). The number of bits R needed to encode frame skip as a P frame is then estimated from MAD without performing motion estimation and the flag PBCFD is set (Line D8). If it is estimated that there is enough room in the buffer to encode the previous frame Bframeskip stored as a potential B frame (in LOOP2) and the current frame skip=2*smin as a PB frame (Line D9), then motion estimation is performed for the current frame skip and, if not already performed, for the previous frame stored as a B frame (Line D10). Those frames are then encoded together as a PB frame, and processing returns to the START routine (Line Dl 1).
Otherwise, if those two frames cannot be encoded together as a PB frame (Lines D9 and D12), then, if the previous frame Bframeskip stored as a B frame (in LOOP2) can be transmitted as a P frame (i.e., Pmeet=\ ), then that previous frame Bframeskip is encoded as a P frame and processing returns to the START routine (Line D12). Otherwise, if that previous frame cannot be transmitted as a P frame (i.e., Pmeet=0) (Lines D12 and D13), but the non-motion-compensated data indicate that there is room in the buffer (i.e., PCFD1-Ϊ) and that motion is large (i.e., PCFD1=\), then the current frame skip=2*smin is encoded as a P frame, and processing returns to the START routine (Line D13). Otherwise, processing continues to the LOOP4 routine (Line D14).
LOOP4 Routine
As indicated in the previous section, the LOOP4 routine is called when the processing in the LOOP3 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP4 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
In particular, the parameter skip is set to 2*smin+l (Line El in Fig. 3C), and the frames from there up to smax-l are then sequentially analyzed (Lines E2, E6, E7) to see if any of them can be encoded (Lines E3-E5). For each frame analyzed, the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line E3). If it is estimated that there is enough room in the buffer to encode the previous frame Bframeskip stored as a B frame (in LOOP2) and the current frame skip as a PB frame (i.e., PBCFD=l), then motion estimation is performed for the current frame skip and, if necessary, for the previous frame Bframeskip stored as a B frame. Those frames are then encoded together as a PB frame, and processing returns to the START routine (Line E4).
Otherwise, if those two frames cannot be encoded together as a PB frame (i.e., PBCFD= ) (Lines E4 and E5) and if the current frame skip should be coded as a P frame (i.e., PCFDI =PCFD1=\), then the current frame is encoded as a P frame and processing returns to the START routine (Line E5). If the end of the range of frames up to smax- 1 is reached without encoding any of them, then processing continues to the LOOP5 routine (Line E8).
LOOP5 Routine
As indicated in the previous section, the LOOP5 routine is called when the processing in the LOOP4 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, the LOOP5 routine attempts to select the next frame to be coded and determine how that next frame should be encoded.
In particular, the parameter skip is set to smax+\ (Line FI in Fig. 3C), and the frames from there up to smin + smax are then sequentially analyzed (Lines F2, F5, F6) to see if any of them can be encoded (Lines F3-F4). For each frame analyzed, the number of bits R to encode are calculated based on the estimated distortion measure Se, which is in turn based on the raw distortion measure MAD, and the flag PBCFD is set (Line F3). If it is estimated that there is enough room in the buffer to encode the previous frame Bframeskip stored as a B frame (in LOOP2) and the current frame skip as a PB frame (i.e., PBCFD=l), then motion estimation is performed for the current frame skip and, if necessary, for the previous frame Bframeskip stored as a B frame. Those frames are then encoded together as a PB frame, and processing returns to the START routine (Line F4).
If the end of the range of frames up to smin+smax is reached without encoding any of them, then processing continues to the TRANSIENT routine (Line F7).
TRANSIENT Routine
As indicated in the previous section, the TRANSIENT routine is called when the processing in the LOOP5 routine fails to determine conclusively which frame to encode next and/or how to encode it. In that case, processing switches from the steady state into the transient state, where the TRANSIENT routine selects one or more frames for encoding as P frames until the TRANSIENT routine determines that processing can return to the steady state. In alternative embodiments, the TRANSIENT routine may encode at least some of the frames as B frames.
In particular, for the current frame skip, the raw distortion measure MAD and the number of bits R to encode are calculated based on the estimated distortion measure Se, and the flag PCFDI is set (Line Gl). If it is estimated that there is enough room in the buffer to transmit the current frame skip as a P frame (i.e., PCFDl=l) (Line G2), then motion estimation is performed for the current frame skip and the current frame is encoded as a P frame (Line G3). If the buffer occupancy is less than a specified threshold limit BO, then processing returns to the steady state of the START routine (Line G4). Otherwise, skip is set to smin to select the next frame in the video sequence and processing returns to the start of the TRANSIENT routine to process that next frame (Line G5). If the current frame skip could not be transmitted as a P frame (Lines G2 and G7), then skip is incremented and processing returns to the start of the TRANSIENT routine to process the next frame, without encoding the current frame (Line G7).
The algorithm presented in Figs. 3A-3C provides a complete approach to frame skipping, PB decision, and quality control when the quantizer step variation is constrained to be within certain bounds from one reference frame to the next. The scheme maintains the user-defined minimum frame rate during steady-state operations and attempts to transmit data at a high quality and at an "acceptable" frame rate (greater than the minimum frame rate). It provides a graceful degradation in quality and frame rate when there is an increase in motion or complexity. B frames are used both for improving the frame rate and the coded quality. However, in situations of scene change or when the motion increases very rapidly, the demands of frame rate or reference frame quality may be unable to be met. In this situation, processing goes into a transient state to "catch up" and slowly re-enter a new steady state. The scheme requires minimal additional computational complexity and no additional storage (beyond that required to store the incoming frames).
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as expressed in the following claims.

Claims

CLAIMS What is claimed is:
1. A method for processing a sequence of video images, comprising the steps of: (a) generating a raw distortion measure for a current image in the sequence relative to a reference image; (b) using the raw distortion measure to generate an estimate of a motion-compensated distortion measure for the current image relative to the reference image without having to perform motion estimation on the current image; (c) determining whether or how to encode the current image based on the estimate of the motion- compensated distortion measure; and (d) generating a compressed video bitstream for the sequence of video images based on the determination of step (c).
2. The invention of claim 1, wherein step (a) comprises the steps of: (1) generating a first intermediate distortion measure and a second intermediate distortion measure, wherein: the first intermediate distortion measure characterizes one or more relatively low-distortion portions of the current image; and the second intermediate distortion measure characterizes one or more relatively high-distortion portions of the current image; and (2) generating the raw distortion measure from the first and second intermediate distortion measures.
3. The invention of claim 2, wherein step (a)(2) applies a correction for double-image effects resulting from relative motion between the current image and the reference image.
4. The invention of claim 1 , wherein step (b) comprises the steps of: (1) generating a measure of change in distortion from a previous image in the sequence to the current image; and (2) generating the estimate of the motion-compensated distortion measure based on the measure of change in distortion.
5. The invention of claim 4, wherein the estimate of the motion-compensated distortion measure is generated using a piecewise-linear, continuous function.
6. The invention of claim 1, wherein step (c) comprises the steps of: (1) determining whether there is enough room in a corresponding buffer to transmit the current image as a P frame based on the estimate of the motion-compensated distortion measure; (2) determining whether motion in the current image is larger than a specified threshold level based on the raw distortion measure; and (3) determining whether or how to encode the current image based on the results of steps (c)(1) and (c)(2).
7. The invention of claim 6, wherein step (c)( 1 ) comprises the step of estimating a number of bits needed to encode the current image as a P frame based on a quadratic rate distortion model.
8. The invention of claim 1 , wherein step (c) comprises the step of determining whether to (1) skip the current image, (2) encode the current image as a B frame, (3) encode the current image as part of a PB frame, or (4) encode the current image as a reference frame.
9. The invention of claim 1, wherein: the processing can be in either a steady state or a transient state; in the steady state, the current image is either skipped or encoded as either a P frame, a B frame, or part of a PB frame; and in the transient state, the current image is either skipped or encoded as either a P frame or a B frame.
10. The invention of claim 9, wherein the processing automatically switches from the transient state to the steady state when a corresponding buffer level is below a specified threshold level.
PCT/US1999/021830 1998-09-18 1999-09-20 Frame skipping without having to perform motion estimation WO2000018134A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10093998P 1998-09-18 1998-09-18
US60/100,939 1998-09-18
US25594699A 1999-02-23 1999-02-23
US09/255,946 1999-02-23

Publications (1)

Publication Number Publication Date
WO2000018134A1 true WO2000018134A1 (en) 2000-03-30

Family

ID=26797724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/021830 WO2000018134A1 (en) 1998-09-18 1999-09-20 Frame skipping without having to perform motion estimation

Country Status (3)

Country Link
JP (1) JP3641172B2 (en)
KR (1) KR100323683B1 (en)
WO (1) WO2000018134A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1202579A1 (en) * 2000-10-31 2002-05-02 Interuniversitair Microelektronica Centrum Vzw A method and apparatus for adaptive encoding framed data sequences
EP1204279A3 (en) * 2000-10-31 2002-05-15 Interuniversitair Microelektronica Centrum Vzw A method and apparatus for adaptive encoding framed data sequences
WO2002085038A1 (en) * 2001-04-16 2002-10-24 Mitsubishi Denki Kabushiki Kaisha Method and system for determining distortion in a video signal
WO2006094033A1 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Adaptive frame skipping techniques for rate controlled video encoding
WO2006094001A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Region-of-interest coding with background skipping for video telephony
WO2006093999A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Region-of-interest coding in video telephony using rho domain bit allocation
WO2006094000A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Quality metric-biased region-of-interest coding for video telephony
US7616690B2 (en) 2000-10-31 2009-11-10 Imec Method and apparatus for adaptive encoding framed data sequences
US7733566B2 (en) 2006-06-21 2010-06-08 Hoya Corporation Supporting mechanism
DE102015121148A1 (en) * 2015-12-04 2017-06-08 Technische Universität München Reduce the transmission time of pictures
CN110832858A (en) * 2017-07-03 2020-02-21 Vid拓展公司 Motion compensated prediction based on bi-directional optical flow

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100739133B1 (en) * 2001-04-17 2007-07-13 엘지전자 주식회사 B-frame coding method in digital video coding
KR20120072202A (en) 2010-12-23 2012-07-03 한국전자통신연구원 Apparatus for estimating motion and method thereof
CN102271269B (en) * 2011-08-15 2014-01-08 清华大学 Method and device for converting frame rate of binocular stereo video
KR102308373B1 (en) 2021-06-08 2021-10-06 주식회사 스누아이랩 Video Deblurring Device for Face Recognition and Driving Method Thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0514865A2 (en) * 1991-05-24 1992-11-25 Mitsubishi Denki Kabushiki Kaisha Image coding system
EP0772362A2 (en) * 1995-10-30 1997-05-07 Sony United Kingdom Limited Video data compression

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0514865A2 (en) * 1991-05-24 1992-11-25 Mitsubishi Denki Kabushiki Kaisha Image coding system
EP0772362A2 (en) * 1995-10-30 1997-05-07 Sony United Kingdom Limited Video data compression

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEE J ET AL: "RATE-DISTORTION OPTIMIZED FRAME TYPE SELECTION FOR MPEG ENCODING", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 7, no. 3, 1 June 1997 (1997-06-01), pages 501-510, XP000690588, ISSN: 1051-8215 *
LEE J: "A FAST FRAME TYPE SELECTION TECHNIQUE FOR VERY LOW BIT RATE CODING USING MPEG-1", REAL-TIME IMAGING,GB,ACADEMIC PRESS LIMITED, vol. 5, no. 2, 1 April 1999 (1999-04-01), pages 83-94, XP000831435, ISSN: 1077-2014 *
TIHAO CHIANG ET AL: "A NEW RATE CONTROL SCHEME USING QUADRATIC RATE DISTORTION MODEL", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 7, no. 1, pages 246-250, XP000678897, ISSN: 1051-8215 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE44457E1 (en) 2000-10-31 2013-08-27 Imec Method and apparatus for adaptive encoding framed data sequences
US7616690B2 (en) 2000-10-31 2009-11-10 Imec Method and apparatus for adaptive encoding framed data sequences
EP1204279A3 (en) * 2000-10-31 2002-05-15 Interuniversitair Microelektronica Centrum Vzw A method and apparatus for adaptive encoding framed data sequences
EP1202579A1 (en) * 2000-10-31 2002-05-02 Interuniversitair Microelektronica Centrum Vzw A method and apparatus for adaptive encoding framed data sequences
WO2002085038A1 (en) * 2001-04-16 2002-10-24 Mitsubishi Denki Kabushiki Kaisha Method and system for determining distortion in a video signal
EP2046048A3 (en) * 2005-03-01 2013-10-30 Qualcomm Incorporated Region-of-interest coding with background skipping for video telephony
US8514933B2 (en) 2005-03-01 2013-08-20 Qualcomm Incorporated Adaptive frame skipping techniques for rate controlled video encoding
WO2006093999A3 (en) * 2005-03-01 2006-12-28 Qualcomm Inc Region-of-interest coding in video telephony using rho domain bit allocation
WO2006094000A3 (en) * 2005-03-01 2006-12-28 Qualcomm Inc Quality metric-biased region-of-interest coding for video telephony
US7724972B2 (en) 2005-03-01 2010-05-25 Qualcomm Incorporated Quality metric-biased region-of-interest coding for video telephony
WO2006093999A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Region-of-interest coding in video telephony using rho domain bit allocation
US8768084B2 (en) 2005-03-01 2014-07-01 Qualcomm Incorporated Region-of-interest coding in video telephony using RHO domain bit allocation
WO2006094001A3 (en) * 2005-03-01 2007-01-04 Qualcomm Inc Region-of-interest coding with background skipping for video telephony
EP2309747A3 (en) * 2005-03-01 2013-06-26 Qualcomm Incorporated Region-of-interest coding using RHO domain bit allocation
WO2006094000A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Quality metric-biased region-of-interest coding for video telephony
WO2006094001A2 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Region-of-interest coding with background skipping for video telephony
WO2006094033A1 (en) * 2005-03-01 2006-09-08 Qualcomm Incorporated Adaptive frame skipping techniques for rate controlled video encoding
US8693537B2 (en) 2005-03-01 2014-04-08 Qualcomm Incorporated Region-of-interest coding with background skipping for video telephony
US7733566B2 (en) 2006-06-21 2010-06-08 Hoya Corporation Supporting mechanism
DE102015121148A1 (en) * 2015-12-04 2017-06-08 Technische Universität München Reduce the transmission time of pictures
WO2017093205A1 (en) 2015-12-04 2017-06-08 Technische Universität München Reducing the transmission time of images
CN110832858B (en) * 2017-07-03 2023-10-13 Vid拓展公司 Apparatus and method for video encoding and decoding
CN110832858A (en) * 2017-07-03 2020-02-21 Vid拓展公司 Motion compensated prediction based on bi-directional optical flow

Also Published As

Publication number Publication date
KR20000023277A (en) 2000-04-25
JP2000125302A (en) 2000-04-28
KR100323683B1 (en) 2002-02-07
JP3641172B2 (en) 2005-04-20

Similar Documents

Publication Publication Date Title
US7224734B2 (en) Video data encoding apparatus and method for removing a continuous repeat field from the video data
EP2250813B1 (en) Method and apparatus for predictive frame selection supporting enhanced efficiency and subjective quality
EP0798930B1 (en) Video coding apparatus
US6819714B2 (en) Video encoding apparatus that adjusts code amount by skipping encoding of image data
WO2000018134A1 (en) Frame skipping without having to perform motion estimation
JP4702059B2 (en) Method and apparatus for encoding moving picture
US7095784B2 (en) Method and apparatus for moving picture compression rate control using bit allocation with initial quantization step size estimation at picture level
JPH10136375A (en) Motion compensation method for moving image
JP3755155B2 (en) Image encoding device
US20040234142A1 (en) Apparatus for constant quality rate control in video compression and target bit allocator thereof
JP3593929B2 (en) Moving picture coding method and moving picture coding apparatus
JP2000201354A (en) Moving image encoder
JP2000197049A (en) Dynamic image variable bit rate encoding device and method therefor
US6763138B1 (en) Method and apparatus for coding moving picture at variable bit rate
JP4644097B2 (en) A moving picture coding program, a program storage medium, and a coding apparatus.
JP3480067B2 (en) Image coding apparatus and method
US7133448B2 (en) Method and apparatus for rate control in moving picture video compression
KR100390167B1 (en) Video encoding method and video encoding apparatus
JP2001016594A (en) Motion compensation method for moving image
KR100413979B1 (en) Predictive coding method and device thereof
JP2005303555A (en) Moving image encoding apparatus and its method
JP3711573B2 (en) Image coding apparatus and image coding method
JPH0646411A (en) Picture coder
JPH08126012A (en) Moving picture compressor
Ryu Block matching algorithm using neural network

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): BR CA CN IN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase