WO2007007923A1 - Apparatus for encoding and decoding multi-view image - Google Patents

Apparatus for encoding and decoding multi-view image Download PDF

Info

Publication number
WO2007007923A1
WO2007007923A1 PCT/KR2005/002226 KR2005002226W WO2007007923A1 WO 2007007923 A1 WO2007007923 A1 WO 2007007923A1 KR 2005002226 W KR2005002226 W KR 2005002226W WO 2007007923 A1 WO2007007923 A1 WO 2007007923A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
motion
center
decoding
disparity
Prior art date
Application number
PCT/KR2005/002226
Other languages
French (fr)
Inventor
Dong Sik Yi
Kyung Hoon Bae
Won Kyoung Lee
Original Assignee
3R Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3R Inc. filed Critical 3R Inc.
Publication of WO2007007923A1 publication Critical patent/WO2007007923A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/57Motion estimation characterised by a search window with variable size or shape
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present invention relates to an apparatus for encoding and decoding a multi- view image, and more particularly, to an apparatus for encoding and decoding a three- view image capable of implementing a high quality three dimensional moving picture transmittable at a low speed with a high compression rate, based on MPEG4 (Motion Picture Experts Group 4).
  • MPEG4 Motion Picture Experts Group 4
  • MPEG Motion Picture Experts Group
  • MPEG Motion Picture Experts Group
  • MPEG2 among MPEGs is video compression technology requiring processing of a high transmission speed of around 4 to 100 Mbps for the application for fields requiring a high quality image and a high quality sound such as a digital TV, an interactive TV, and a DVD (Digital Video Disc).
  • MPEG4 is video compression technology for implementing a moving picture at a low transmission rate below 64 Kbps for multimedia communication in the Internet wired network and wireless networks such as a cellular network.
  • the binocular system and method for processing a three-dimensional moving picture is based on MPEG4, however, the view is limited to a two- view, so when an observer get out of the limited viewing region, or when a focus does not match, the observer cannot feel a three-dimensional effect, becomes tired in the eyes, and feels dizzy, thereby limiting practical application of the system and the method. Disclosure of Invention
  • an object of the present invention is to provide an apparatus for encoding and decoding a multi-view image capable of implementing a high quality three-dimensional moving picture with a high compression rate based on MPEG4 in which the transmission speed is low.
  • an apparatus for encoding a multi-view image comprising: a center image encoding unit generating texture information on a center image by encoding a first center image and motion information on the center image by predicting, compensating, and encoding a motion for center images from a second center image to a predetermined center image with respect to a previous center image which is used as a reference image; a left image encoding unit generating disparity information on a left image by predicting, compensating, and encoding a disparity of a first left image and the first center image, which is used as a reference image, and generating motion information on the left image by predicting, compensating, and encoding a motion for left images from a second left image to a predetermined left image with respect to a previous left image which is used as a reference image; a right image encoding unit generating disparity information on a right image by predicting, compensating, and encoding a disparity of
  • the center image encoding unit of the apparatus may comprise: an image decoding means generating the texture information on the center image by encoding the first center image, reconstructing the first center image, and providing the reconstructed first center image as the reference image, a motion predicting means generating the motion information on the center image by predicting a motion from the second center image to the predetermined center image with respect to the previous center image which is used as a reference image; a motion compensating means generating a motion compensation value based on the motion information predicted by the motion predicting means; and a subtracter generating a difference image of the center image by subtracting the motion compensation value compensated by the motion compensating means from the center image.
  • the left image encoding unit or the right image encoding unit may comprise: a disparity and motion predicting means generating the disparity information on the first left image or the first right image by predicting a disparity of the first left image or the first right image and the first center image, which is used as the reference image, and generating the motion information on the left or right image by predicting a motion from the second left or right image to the predetermined left or right image with respect to the previous left or right image which is used as the reference image; a disparity and motion compensating means generating a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means; and a subtracter generating a difference image of the left or right image by subtracting the disparity compensation value or the motion compensation value compensated by the disparity and motion compensating means from the left or right image.
  • a disparity and motion predicting means generating the disparity information on the first left image or the first right image by predicting a disparity of the first
  • the left image encoding unit and the right image encoding unit may further comprise an image encoding means generating the texture information by reconstructing and encoding the first center image used as the reference image, respectively.
  • center image encoding unit may predict the motion using a diamond search algorithm.
  • an apparatus for decoding a multi-view image comprising: a demultiplexer demultiplexing a multiplexed data stream and providing data streams of center, left, and right images; a center image decoding unit generating a reconstructed first center image by decoding texture information included in the data stream of the center image input from the demultiplexer and center images from a reconstructed second center image to a reconstructed predetermined center image by decoding and compensating motion information included in the center image data stream with respect to a previous center image which is used as a reference image; a left image decoding unit generating a first center image by decoding texture information included in the data stream of the left image, a first left image by decoding and compensating disparity information included in the data stream of the left image with respect to the first center image which is used as a reference image, and left images from a second left image to a predetermined left image by decoding and compensating motion information included in the data stream of the left image with respect
  • the center image decoding unit may comprise: an image decoding means decoding the texture information included in the data stream of the center image; a motion decoding means decoding the motion information included in the data stream of the center image; a motion compensating means compensating the previous center image stored in the image memory with the motion information decoded by the motion decoding means; and an image reconstructing means generating the first center image from the decoded texture information and center images from the second center image to the predetermined center image from the decoded and compensated motion information.
  • the left image decoding unit and the right image decoding unit respectively may comprise: an image decoding means decoding the texture information included in the data stream of the left image or the right image; a disparity and motion decoding means decoding the disparity information and the motion information included in the data stream of the left image or the right image; a disparity and motion compensating means compensating the texture information decoded by the image decoding unit with the disparity information decoded by the disparity and motion decoding means and the previous left or right image stored in the image memory with the motion information decoded by the disparity and motion means; and an image reconstructing means generating the first left or right image from the decoded and compensated disparity information and images from the second left or right image to the predetermined left or right image from the decoded and compensated motion information.
  • FIG. 1 is a block diagram illustrating an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention
  • FIG. 2 is a block diagram illustrating an apparatus for encoding a multi-view image according to an embodiment of the present invention
  • FIG. 3 is a diagram illustrating relation between motion prediction and disparity prediction of a multi-view image in an apparatus for encoding a multi-view image according to an embodiment of the present invention
  • FIG. 4 is a diagram illustrating a method of predicting a motion in a conventional full search block matching algorithm
  • FIG. 5 is a diagram illustrating a method of predicting a motion in a diamond search algorithm according to an embodiment of the present invention.
  • FIG. 6 is a block diagram illustrating an apparatus for decoding a multi-view image according to an embodiment of the present invention. Best Mode for Carrying Out the Invention
  • an apparatus for encoding and decoding a multi-view image includes an apparatus 100 for encoding a multi-view image receiving and encoding center, left, and right images from three cameras 300 installed at locations having a same interval and horizontally at a same height and an apparatus 200 for decoding a multi-view image decoding the encoded image by a reverse process.
  • the apparatus 100 for encoding a multi-view image includes a center image encoding unit 110, a left image encoding unit 120, a right image encoding unit 130, a multiplexer 140, and a buffer 150.
  • the center image encoding unit 110 includes an image encoding means 111, a motion predicting means 112, a motion compensating means 113, and a subtracter 114.
  • the center image encoding unit 110 generates texture information and motion information on the center image by encoding the center image while predicting a motion of the received center image as input and compensating for the motion.
  • the image encoding means 111 provides the texture information on the center image generated by encoding the center image to the multiplexer 140 and a reconstructed center image as a reference image to the motion predicting means 112 and the subtracter 114.
  • the motion predicting means 112 generates the motion information by predicting the motion of the center image in reference with the reconstructed center image and provides the generated motion information to the motion compensating means 113 and the multiplexer 140.
  • the motion compensating means 113 generates a motion compensation value based on the motion information predicted by the motion predicting means 112 and provides the generated motion compensation value to the subtracter 114.
  • the subtracter 114 generates a difference image of the center image by subtracting the motion compensation value input from the motion compensating means 113 from the reconstructed center image input from the image encoding means 111.
  • the left or right encoding unit 120 or 130 includes a disparity and motion predicting means 121, a disparity and motion compensating means 122, a subtracter 123, and an image encoding means 124.
  • the left or right encoding unit 120 or 130 generates texture information on the center image and disparity information on the left or right image and motion information on the left or right image by predicting and compensating a motion of the received left or right image as input, predicting and compensating a disparity in reference with the input center image as a reference image, and encoding the left or right image.
  • the disparity and motion predicting means 121 generates disparity information by predicting the disparity of the left or right image in reference with the input center image used as a reference image and the motion information by predicting a motion of the next left or right image in reference with the previous left or right image used as a reference image, and provides the generated disparity information and the motion information to the disparity and motion compensating means 122 and the multiplexer 140.
  • the disparity and motion compensating means 122 generates a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means 121 and provides the generated disparity compensation value and motion compensation value to the subtracter 123.
  • the subtracter 123 generates a difference image of the left or right image by subtracting the disparity compensation value and the motion compensation value input from the disparity and motion compensating means 122 from the left or right image input from the camera 300.
  • the image encoding means 124 provides the texture information generated by reconstructing and encoding the center image used as the reference image to the multiplexer 140.
  • the multiplexer 140 generates a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit 110, the left image encoding unit 120, and the right image encoding unit 130.
  • the buffer 150 stores the data stream multiplexed by the multiplexer 140.
  • MPEG4 Motion Picture Experts Group 4
  • the natural image is acquired from a camera
  • the synthetic image is a mixture of an artificial image generated by computer graphics or the like and the natural image.
  • MPEG4 has a considerably superior compression efficiency and image quality to a conventional method.
  • a source consisting of multimedia data including a binary format for scene, a video, an audio, an animation letter, and the like is divided into a single media by a demultiplexer, and the divided data is combined partly for special applications and provided to a user.
  • MPEG4 maintains three forms including I- VOP (Intra-coded Video Object Plane),
  • P-VOP Predictive coded VOP
  • B-VOP Bidirectionally predictive coded VOP
  • I- VOP encodes only by performing DCT (Discreate Cosine Transform) operation on the VOP without using motion compensation.
  • P-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed in reference with I- VOP or another P-VOP.
  • B-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed from two frames on a time axis, although B-VOP uses the motion compensation like P-VOP.
  • an apparatus for encoding a multi-view image at first, generates texture information by respectively encoding a first center image that is used as a reference image. Then, the apparatus in the embodiment generates disparity information by predicting and encoding the disparity of a first left and right images and the first center image that is a reference image. For images from second center, left and right images, the motion information is generated by predicting and encoding a motion of up to predetermined center, left, and right images, for example thirty frames of the images, in reference with previous images.
  • the spatial and temporal redundancy of adjacent images can be removed by encoding the first center image as texture information of I- VOP which has a large data size and other images as disparity information or motion information of P-VOP which has a small data size, so the amount of data can be decreased.
  • a compression rate of the images can be increased and it is possible to implement a high image-quality three-dimensional moving picture based on MPEG4 that supports for a low transmission speed.
  • a conventional MPEG4 encoding apparatus uses a full search block matching algorithm for motion prediction in which pixels of two images a and b within a predetermined region to be predicted for the motion are compared in units of a macro block, and the disparity from a macro block which has the least error is assigned to a motion vector c.
  • a range for searching for the prediction of the motion can be determined by a parameter file or the like.
  • the full search block matching algorithm shows a good performance.
  • the accuracy of the prediction decreases in the algorithm.
  • searching is performed by using a large diamond d until the least value of SAD (sum of absolute difference) is assigned to the center of the large diamond d, and searching for an optimized motion vector using a small diamond e is performed.
  • the diamond search algorithm shows a very short processing time.
  • an apparatus 200 for decoding a multi-view image includes a demultiplexer 210, a center image decoding unit 220, a left image decoding unit 230, a right image decoding unit 240, and an image memory 250.
  • the demultiplexer 210 demultiplexes the multiplexed data stream input from the multiplexer 140 of the apparatus for encoding a multi-view image and provides data streams of the center, left, and right images to the center image decoding unit 220, the left image decoding unit 230, and the right image decoding unit 240, respectively.
  • the center image decoding unit 220 includes an image decoding means 221, an image reconstructing means 222, a motion decoding means 223, and a motion compensating means 224.
  • the center image decoding unit 220 generates a reconstructed center image by decoding the data stream of the input center image and compensating for the motion.
  • the image decoding means 221 decodes the texture information included in the received data stream of the center image as input and provides the decoded texture information to the image reconstructing means 222.
  • the image reconstructing means 222 reconstructs the decoded image by the image decoding means 221 and reconstructs the image compensated by the motion compensating means 224.
  • the motion decoding means 223 decodes the motion information included in the data stream of the received center image and provides the decoded motion information to the motion compensating means 224.
  • the motion compensating means 224 compensates a previous image that is input from the image memory 250 as a reference image with the decoded motion information input from the motion decoding means 223 and provides the compensated previous image to the image reconstructing means 222.
  • the left image decoding unit 230 or the right image decoding unit 240 includes an image decoding means 231, a disparity and motion decoding means 232, a disparity and a motion compensating means 233, and an image reconstructing means 234.
  • the left or right image decoding unit 230 or 240 generates a reconstructed left or right image by decoding the input data stream of the left or right image and compensating the disparity and the motion.
  • the image decoding means 231 decodes the texture information of the center image, which is used as a reference image, included in the input data stream of the left or right image and provides the decoded texture information to the disparity and motion compensating means 233.
  • the disparity and motion decoding means 232 decodes the disparity information and the motion information included in the input data stream of the left or right image and provides the decoded disparity information and the motion information to the disparity and motion compensating means 233.
  • the motion compensating means 233 compensates the reference image input from the image decoding means 231 with the disparity information input from the disparity and motion decoding means 232, compensates the previous image input from the image memory 250 with the motion information input from the disparity and motion decoding means 232, and provides the compensated images to the image reconstructing means 234.
  • the image reconstructing means 234 reconstructs the image compensated by the disparity and motion compensating means 233.
  • the image memory 250 stores the previous image that becomes a reference for compensating the motion of the center image.
  • An apparatus for encoding and decoding a multi-view image removes spatial and temporal redundancy by predicting a motion and a disparity among central, left, and right images, thereby decreasing the amount of the data transmitted in a three-dimensional moving picture based on MPEG4.
  • an apparatus for encoding and decoding a multi-view image can decrease the processing time required for predicting a motion by using a diamond search algorithm, thereby capable of realtime compressing and transmitting a three dimensional moving picture based on MPEG4.
  • an apparatus for encoding and decoding a multi-view image is capable of implementing a high image- quality three-dimensional moving picture based on MPEG4, which supports for a low transmission speed, in a real time.

Abstract

The present invention provides a method of encoding and decoding a three-view based on predicting a motion, encoding and decoding of motion compensation, predicting a disparity, encoding and decoding of disparity compensation, and a data structure for implementing a high image-quality three-dimensional moving picture based on MPEG4, which supports for a low transmission speed.

Description

Description
APPARATUS FOR ENCODING AND DECODING MULTI-VIEW
IMAGE
Technical Field
[1] The present invention relates to an apparatus for encoding and decoding a multi- view image, and more particularly, to an apparatus for encoding and decoding a three- view image capable of implementing a high quality three dimensional moving picture transmittable at a low speed with a high compression rate, based on MPEG4 (Motion Picture Experts Group 4). Background Art
[2] MPEG (Motion Picture Experts Group) is a method of compression of a moving picture and representation of a code for transmission of information. Researches on MPEG have been performed from MPEGl, MPEG2, and MPEG4 to MPEG7.
[3] As video compression technology standardized as ISO (International Organization for Standardization) 13818, MPEG2 among MPEGs is video compression technology requiring processing of a high transmission speed of around 4 to 100 Mbps for the application for fields requiring a high quality image and a high quality sound such as a digital TV, an interactive TV, and a DVD (Digital Video Disc).
[4] In addition, MPEG4 is video compression technology for implementing a moving picture at a low transmission rate below 64 Kbps for multimedia communication in the Internet wired network and wireless networks such as a cellular network.
[5] Recently, apparatuses for encoding and decoding a multi-view image, which is based on MPEG2 or MPEG4, have been researched. For example, an apparatus for encoding and decoding a three-view image, which is based on MPEG2, is disclosed in Korean Unexamined Patent Application Publication No. 2004-65014. In addition, a binocular, that is two- view, system for three-dimensional process of a moving picture, which is based on MPEG4, is disclosed in Korean Examined Patent Registration Publication No. 397511.
[6] However, since the apparatus for encoding and decoding a three- view image disclosed in Korea Unexamined Patent Application Publication No 2004-65014 is based on MPEG2 requiring a high transmission speed of 4 to 100 Mbps, it is difficult to apply the apparatus for a multi-view image having high image quality based on MPEG4 requiring a low transmission speed below 64 Kbps.
[7] In addition, the binocular system and method for processing a three-dimensional moving picture is based on MPEG4, however, the view is limited to a two- view, so when an observer get out of the limited viewing region, or when a focus does not match, the observer cannot feel a three-dimensional effect, becomes tired in the eyes, and feels dizzy, thereby limiting practical application of the system and the method. Disclosure of Invention
Technical Problem
[8] In order to solve the aforementioned problems, an object of the present invention is to provide an apparatus for encoding and decoding a multi-view image capable of implementing a high quality three-dimensional moving picture with a high compression rate based on MPEG4 in which the transmission speed is low. Technical Solution
[9] According to an aspect of the present invention, there is provided an apparatus for encoding a multi-view image comprising: a center image encoding unit generating texture information on a center image by encoding a first center image and motion information on the center image by predicting, compensating, and encoding a motion for center images from a second center image to a predetermined center image with respect to a previous center image which is used as a reference image; a left image encoding unit generating disparity information on a left image by predicting, compensating, and encoding a disparity of a first left image and the first center image, which is used as a reference image, and generating motion information on the left image by predicting, compensating, and encoding a motion for left images from a second left image to a predetermined left image with respect to a previous left image which is used as a reference image; a right image encoding unit generating disparity information on a right image by predicting, compensating, and encoding a disparity of a first right image and the first center image, which is used as a reference image, and generating motion information on the right image by predicting, compensating, and encoding a motion for right images from a second right image to a predetermined right image with respect to a previous right image which is used as a reference image; and a multiplexer generating a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit, the left image encoding unit, and the right image encoding unit.
[10] In the aspect above, the center image encoding unit of the apparatus may comprise: an image decoding means generating the texture information on the center image by encoding the first center image, reconstructing the first center image, and providing the reconstructed first center image as the reference image, a motion predicting means generating the motion information on the center image by predicting a motion from the second center image to the predetermined center image with respect to the previous center image which is used as a reference image; a motion compensating means generating a motion compensation value based on the motion information predicted by the motion predicting means; and a subtracter generating a difference image of the center image by subtracting the motion compensation value compensated by the motion compensating means from the center image.
[11] In addition, the left image encoding unit or the right image encoding unit may comprise: a disparity and motion predicting means generating the disparity information on the first left image or the first right image by predicting a disparity of the first left image or the first right image and the first center image, which is used as the reference image, and generating the motion information on the left or right image by predicting a motion from the second left or right image to the predetermined left or right image with respect to the previous left or right image which is used as the reference image; a disparity and motion compensating means generating a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means; and a subtracter generating a difference image of the left or right image by subtracting the disparity compensation value or the motion compensation value compensated by the disparity and motion compensating means from the left or right image.
[12] In addition, the left image encoding unit and the right image encoding unit may further comprise an image encoding means generating the texture information by reconstructing and encoding the first center image used as the reference image, respectively.
[13] In addition, the center image encoding unit, the left image encoding unit, and the right image encoding unit may predict the motion using a diamond search algorithm.
[14] According to another aspect of the present invention, there is provided an apparatus for decoding a multi-view image comprising: a demultiplexer demultiplexing a multiplexed data stream and providing data streams of center, left, and right images; a center image decoding unit generating a reconstructed first center image by decoding texture information included in the data stream of the center image input from the demultiplexer and center images from a reconstructed second center image to a reconstructed predetermined center image by decoding and compensating motion information included in the center image data stream with respect to a previous center image which is used as a reference image; a left image decoding unit generating a first center image by decoding texture information included in the data stream of the left image, a first left image by decoding and compensating disparity information included in the data stream of the left image with respect to the first center image which is used as a reference image, and left images from a second left image to a predetermined left image by decoding and compensating motion information included in the data stream of the left image with respect to a previous left image which is used as a reference image; and a right image decoding unit generating a first center image by decoding texture information included in the data stream of the right image, a first right image by decoding and compensating disparity information included in the data stream of the right image with respect to the first center image which is used as a reference image, and right images from a second right image to a predetermined right image by decoding and compensating motion information included in the data stream of the right image with respect to a previous right image which is used as a reference image; and an image memory storing the reference images.
[15] In the aspect above, the center image decoding unit may comprise: an image decoding means decoding the texture information included in the data stream of the center image; a motion decoding means decoding the motion information included in the data stream of the center image; a motion compensating means compensating the previous center image stored in the image memory with the motion information decoded by the motion decoding means; and an image reconstructing means generating the first center image from the decoded texture information and center images from the second center image to the predetermined center image from the decoded and compensated motion information.
[16] In addition, the left image decoding unit and the right image decoding unit respectively may comprise: an image decoding means decoding the texture information included in the data stream of the left image or the right image; a disparity and motion decoding means decoding the disparity information and the motion information included in the data stream of the left image or the right image; a disparity and motion compensating means compensating the texture information decoded by the image decoding unit with the disparity information decoded by the disparity and motion decoding means and the previous left or right image stored in the image memory with the motion information decoded by the disparity and motion means; and an image reconstructing means generating the first left or right image from the decoded and compensated disparity information and images from the second left or right image to the predetermined left or right image from the decoded and compensated motion information. Brief Description of the Drawings
[17] The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
[18] FIG. 1 is a block diagram illustrating an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention;
[19] FIG. 2 is a block diagram illustrating an apparatus for encoding a multi-view image according to an embodiment of the present invention; [20] FIG. 3 is a diagram illustrating relation between motion prediction and disparity prediction of a multi-view image in an apparatus for encoding a multi-view image according to an embodiment of the present invention;
[21] FIG. 4 is a diagram illustrating a method of predicting a motion in a conventional full search block matching algorithm;
[22] FIG. 5 is a diagram illustrating a method of predicting a motion in a diamond search algorithm according to an embodiment of the present invention; and
[23] FIG. 6 is a block diagram illustrating an apparatus for decoding a multi-view image according to an embodiment of the present invention. Best Mode for Carrying Out the Invention
[24] Hereinafter, the present will be described in detail with reference to accompanying drawings.
[25] As illustrated in FIG. 1, an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention includes an apparatus 100 for encoding a multi-view image receiving and encoding center, left, and right images from three cameras 300 installed at locations having a same interval and horizontally at a same height and an apparatus 200 for decoding a multi-view image decoding the encoded image by a reverse process.
[26] Referring to FIGS. 1 and 2, the apparatus 100 for encoding a multi-view image according to an embodiment of the present invention includes a center image encoding unit 110, a left image encoding unit 120, a right image encoding unit 130, a multiplexer 140, and a buffer 150.
[27] The center image encoding unit 110 includes an image encoding means 111, a motion predicting means 112, a motion compensating means 113, and a subtracter 114. The center image encoding unit 110 generates texture information and motion information on the center image by encoding the center image while predicting a motion of the received center image as input and compensating for the motion.
[28] The image encoding means 111 provides the texture information on the center image generated by encoding the center image to the multiplexer 140 and a reconstructed center image as a reference image to the motion predicting means 112 and the subtracter 114.
[29] The motion predicting means 112 generates the motion information by predicting the motion of the center image in reference with the reconstructed center image and provides the generated motion information to the motion compensating means 113 and the multiplexer 140.
[30] The motion compensating means 113 generates a motion compensation value based on the motion information predicted by the motion predicting means 112 and provides the generated motion compensation value to the subtracter 114.
[31] The subtracter 114 generates a difference image of the center image by subtracting the motion compensation value input from the motion compensating means 113 from the reconstructed center image input from the image encoding means 111.
[32] The left or right encoding unit 120 or 130 includes a disparity and motion predicting means 121, a disparity and motion compensating means 122, a subtracter 123, and an image encoding means 124. The left or right encoding unit 120 or 130 generates texture information on the center image and disparity information on the left or right image and motion information on the left or right image by predicting and compensating a motion of the received left or right image as input, predicting and compensating a disparity in reference with the input center image as a reference image, and encoding the left or right image.
[33] The disparity and motion predicting means 121 generates disparity information by predicting the disparity of the left or right image in reference with the input center image used as a reference image and the motion information by predicting a motion of the next left or right image in reference with the previous left or right image used as a reference image, and provides the generated disparity information and the motion information to the disparity and motion compensating means 122 and the multiplexer 140.
[34] The disparity and motion compensating means 122 generates a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means 121 and provides the generated disparity compensation value and motion compensation value to the subtracter 123.
[35] The subtracter 123 generates a difference image of the left or right image by subtracting the disparity compensation value and the motion compensation value input from the disparity and motion compensating means 122 from the left or right image input from the camera 300.
[36] The image encoding means 124 provides the texture information generated by reconstructing and encoding the center image used as the reference image to the multiplexer 140.
[37] The multiplexer 140 generates a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit 110, the left image encoding unit 120, and the right image encoding unit 130.
[38] The buffer 150 stores the data stream multiplexed by the multiplexer 140.
[39] Generally, MPEG4 (Motion Picture Experts Group 4) encoding is divided into natural image encoding and synthetic image encoding. The natural image is acquired from a camera, and the synthetic image is a mixture of an artificial image generated by computer graphics or the like and the natural image. MPEG4 has a considerably superior compression efficiency and image quality to a conventional method. In MPEG4, a source consisting of multimedia data including a binary format for scene, a video, an audio, an animation letter, and the like is divided into a single media by a demultiplexer, and the divided data is combined partly for special applications and provided to a user.
[40] MPEG4 maintains three forms including I- VOP (Intra-coded Video Object Plane),
P-VOP (Predictive coded VOP), and B-VOP (Bidirectionally predictive coded VOP), as is similar to a conventional method. Here, I- VOP encodes only by performing DCT (Discreate Cosine Transform) operation on the VOP without using motion compensation. P-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed in reference with I- VOP or another P-VOP. B-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed from two frames on a time axis, although B-VOP uses the motion compensation like P-VOP.
[41] In this MPEG4 compression method, it is required to remove two-dimensional spatial redundancy and temporal redundancy between frames of the image data for efficient compression of continuity of screens which changes according to time.
[42] Referring to FIG. 3, an apparatus for encoding a multi-view image according to an embodiment of the present invention, at first, generates texture information by respectively encoding a first center image that is used as a reference image. Then, the apparatus in the embodiment generates disparity information by predicting and encoding the disparity of a first left and right images and the first center image that is a reference image. For images from second center, left and right images, the motion information is generated by predicting and encoding a motion of up to predetermined center, left, and right images, for example thirty frames of the images, in reference with previous images.
[43] Accordingly, in the apparatus for encoding a multi-view image in the embodiment, the spatial and temporal redundancy of adjacent images can be removed by encoding the first center image as texture information of I- VOP which has a large data size and other images as disparity information or motion information of P-VOP which has a small data size, so the amount of data can be decreased. According to the apparatus for encoding a multi-view image in the embodiment, a compression rate of the images can be increased and it is possible to implement a high image-quality three-dimensional moving picture based on MPEG4 that supports for a low transmission speed.
[44] Referring to FIG. 4, a conventional MPEG4 encoding apparatus uses a full search block matching algorithm for motion prediction in which pixels of two images a and b within a predetermined region to be predicted for the motion are compared in units of a macro block, and the disparity from a macro block which has the least error is assigned to a motion vector c. In the full search block matching algorithm, a range for searching for the prediction of the motion can be determined by a parameter file or the like. When a motion is within the range for searching, the full search block matching algorithm shows a good performance. However, when the motion is out of the range for searching due to a big motion of the image, the accuracy of the prediction decreases in the algorithm. In other words, when the accuracy of the motion prediction increases, the compression rate and the image quality can be improved on the whole, since the amount of data of the difference image after the motion compensation is decreased. However, when the range for searching is set wide for the improvement of the compression rate and the image quality, the amount of calculation is increased to be inappropriate for real-time processing. Generally, most of the processing time for the MPEG4 compression is for predicting the motion, accordingly an algorithm having a comparatively wide search range and capable of decreasing the processing time is required. A diamond search algorithm as illustrated in FIG. 5 is used according to an embodiment of the present invention.
[45] Referring to FIG. 5, in the diamond search algorithm according to an embodiment of the present invention, searching is performed by using a large diamond d until the least value of SAD (sum of absolute difference) is assigned to the center of the large diamond d, and searching for an optimized motion vector using a small diamond e is performed. The diamond search algorithm shows a very short processing time.
[46] Referring to FIGS. 1 and 6, an apparatus 200 for decoding a multi-view image according to an embodiment of the present invention includes a demultiplexer 210, a center image decoding unit 220, a left image decoding unit 230, a right image decoding unit 240, and an image memory 250.
[47] The demultiplexer 210 demultiplexes the multiplexed data stream input from the multiplexer 140 of the apparatus for encoding a multi-view image and provides data streams of the center, left, and right images to the center image decoding unit 220, the left image decoding unit 230, and the right image decoding unit 240, respectively.
[48] The center image decoding unit 220 includes an image decoding means 221, an image reconstructing means 222, a motion decoding means 223, and a motion compensating means 224. The center image decoding unit 220 generates a reconstructed center image by decoding the data stream of the input center image and compensating for the motion.
[49] The image decoding means 221 decodes the texture information included in the received data stream of the center image as input and provides the decoded texture information to the image reconstructing means 222. [50] The image reconstructing means 222 reconstructs the decoded image by the image decoding means 221 and reconstructs the image compensated by the motion compensating means 224.
[51] The motion decoding means 223 decodes the motion information included in the data stream of the received center image and provides the decoded motion information to the motion compensating means 224.
[52] The motion compensating means 224 compensates a previous image that is input from the image memory 250 as a reference image with the decoded motion information input from the motion decoding means 223 and provides the compensated previous image to the image reconstructing means 222.
[53] The left image decoding unit 230 or the right image decoding unit 240 includes an image decoding means 231, a disparity and motion decoding means 232, a disparity and a motion compensating means 233, and an image reconstructing means 234. The left or right image decoding unit 230 or 240 generates a reconstructed left or right image by decoding the input data stream of the left or right image and compensating the disparity and the motion.
[54] The image decoding means 231 decodes the texture information of the center image, which is used as a reference image, included in the input data stream of the left or right image and provides the decoded texture information to the disparity and motion compensating means 233.
[55] The disparity and motion decoding means 232 decodes the disparity information and the motion information included in the input data stream of the left or right image and provides the decoded disparity information and the motion information to the disparity and motion compensating means 233.
[56] The motion compensating means 233 compensates the reference image input from the image decoding means 231 with the disparity information input from the disparity and motion decoding means 232, compensates the previous image input from the image memory 250 with the motion information input from the disparity and motion decoding means 232, and provides the compensated images to the image reconstructing means 234.
[57] The image reconstructing means 234 reconstructs the image compensated by the disparity and motion compensating means 233.
[58] The image memory 250 stores the previous image that becomes a reference for compensating the motion of the center image.
[59] While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims. Industrial Applicability
[60] An apparatus for encoding and decoding a multi-view image according an embodiment of the present invention removes spatial and temporal redundancy by predicting a motion and a disparity among central, left, and right images, thereby decreasing the amount of the data transmitted in a three-dimensional moving picture based on MPEG4.
[61] In addition, an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention can decrease the processing time required for predicting a motion by using a diamond search algorithm, thereby capable of realtime compressing and transmitting a three dimensional moving picture based on MPEG4.
[62] Accordingly, an apparatus for encoding and decoding a multi-view image according an embodiment of the present invention is capable of implementing a high image- quality three-dimensional moving picture based on MPEG4, which supports for a low transmission speed, in a real time.

Claims

Claims
[1] An apparatus for encoding a multi-view image comprising: a center image encoding unit generating texture information on a center image by encoding a first center image and motion information on the center image by predicting, compensating, and encoding a motion for center images from a second center image to a predetermined center image with respect to a previous center image which is used as a reference image; a left image encoding unit generating disparity information on a left image by predicting, compensating, and encoding a disparity of a first left image and the first center image, which is used as a reference image, and generating motion information on the left image by predicting, compensating, and encoding a motion for left images from a second left image to a predetermined left image with respect to a previous left image which is used as a reference image; a right image encoding unit generating disparity information on a right image by predicting, compensating, and encoding a disparity of a first right image and the first center image, which is used as a reference image, and generating motion information on the right image by predicting, compensating, and encoding a motion for right images from a second right image to a predetermined right image with respect to a previous right image which is used as a reference image; and a multiplexer generating a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit, the left image encoding unit, and the right image encoding unit.
[2] The apparatus of claim 1, wherein the center image encoding unit comprises: an image decoding means generating the texture information on the center image by encoding the first center image, reconstructing the first center image, and providing the reconstructed first center image as the reference image; a motion predicting means generating the motion information on the center image by predicting a motion from the second center image to the predetermined center image with respect to the previous center image which is used as a reference image; a motion compensating means generating a motion compensation value based on the motion information predicted by the motion predicting means; and a subtracter generating a difference image of the center image by subtracting the motion compensation value compensated by the motion compensating means from the center image.
[3] The apparatus according to claim 1, wherein the left image encoding unit or the right image encoding unit comprises: a disparity and motion predicting means for generating the disparity information on the first left image or the first right image by predicting a disparity of the first left image or the first right image and the first center image, which is used as the reference image, and generating the motion information on the left or right image by predicting a motion from the second left or right image to the predetermined left or right image with respect to the previous left or right image which is used as the reference image; a disparity and motion compensating means for generating a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means; and a subtracter generating a difference image of the left or right image by subtracting the disparity compensation value or the motion compensation value compensated by the disparity and motion compensating means from the left or right image.
[4] The apparatus according to claim 3, wherein the left image encoding unit and the right image encoding unit further comprise an image encoding means for generating the texture information by reconstructing and encoding the first center image used as the reference image, respectively.
[5] The apparatus according to claim 1, wherein the center image encoding unit, the left image encoding unit, and the right image encoding unit predict the motion using a diamond search algorithm.
[6] An apparatus for decoding a multi-view image comprising: a demultiplexer demultiplexing a multiplexed data stream and providing data streams of center, left, and right images; a center image decoding unit generating a reconstructed first center image by decoding texture information included in the data stream of the center image input from the demultiplexer and center images from a reconstructed second center image to a reconstructed predetermined center image by decoding and compensating motion information included in the center image data stream with respect to a previous center image which is used as a reference image; a left image decoding unit generating a first center image by decoding texture information included in the data stream of the left image, a first left image by decoding and compensating disparity information included in the data stream of the left image with respect to the first center image which is used as a reference image, and left images from a second left image to a predetermined left image by decoding and compensating motion information included in the data stream of the left image with respect to a previous left image which is used as a reference image; and a right image decoding unit generating a first center image by decoding texture information included in the data stream of the right image, a first right image by decoding and compensating disparity information included in the data stream of the right image with respect to the first center image which is used as a reference image, and right images from a second right image to a predetermined right image by decoding and compensating motion information included in the data stream of the right image with respect to a previous right image which is used as a reference image; and an image memory storing the reference images.
[7] The apparatus according to claim 6, wherein the center image decoding unit comprises: an image decoding means for decoding the texture information included in the data stream of the center image; a motion decoding means for decoding the motion information included in the data stream of the center image; a motion compensating means for compensating the previous center image stored in the image memory with the motion information decoded by the motion decoding means; and an image reconstructing means for generating the first center image from the decoded texture information and center images from the second center image to the predetermined center image from the decoded and compensated motion information.
[8] The apparatus according to claim 6, wherein the left image decoding unit and the right image decoding unit respectively comprise: an image decoding means for decoding the texture information included in the data stream of the left image or the right image; a disparity and motion decoding means for decoding the disparity information and the motion information included in the data stream of the left image or the right image; a disparity and motion compensating means for compensating the texture information decoded by the image decoding unit with the disparity information decoded by the disparity and motion decoding means and the previous left or right image stored in the image memory with the motion information decoded by the disparity and motion means; and an image reconstructing means for generating the first left or right image from the decoded and compensated disparity information and images from the second left or right image to the predetermined left or right image from the decoded and compensated motion information.
PCT/KR2005/002226 2005-07-11 2005-07-11 Apparatus for encoding and decoding multi-view image WO2007007923A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2005-0062060 2005-07-11
KR1020050062060A KR100762783B1 (en) 2005-07-11 2005-07-11 Apparatus for encoding and decoding multi-view image

Publications (1)

Publication Number Publication Date
WO2007007923A1 true WO2007007923A1 (en) 2007-01-18

Family

ID=37637279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/002226 WO2007007923A1 (en) 2005-07-11 2005-07-11 Apparatus for encoding and decoding multi-view image

Country Status (2)

Country Link
KR (1) KR100762783B1 (en)
WO (1) WO2007007923A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010126221A2 (en) * 2009-04-27 2010-11-04 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video data processing method thereof
US8248461B2 (en) 2008-10-10 2012-08-21 Lg Electronics Inc. Receiving system and method of processing data
KR101479185B1 (en) * 2007-05-16 2015-01-06 톰슨 라이센싱 Methods and apparatus for the use of slice groups in encoding multi-view video coding (mvc) information

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101342783B1 (en) 2009-12-03 2013-12-19 한국전자통신연구원 Method for generating virtual view image and apparatus thereof
US9674534B2 (en) 2012-01-19 2017-06-06 Samsung Electronics Co., Ltd. Method and apparatus for encoding multi-view video prediction capable of view switching, and method and apparatus for decoding multi-view video prediction capable of view switching

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6043838A (en) * 1997-11-07 2000-03-28 General Instrument Corporation View offset estimation for stereoscopic video coding
US6057884A (en) * 1997-06-05 2000-05-02 General Instrument Corporation Temporal and spatial scaleable coding for video object planes
US6377309B1 (en) * 1999-01-13 2002-04-23 Canon Kabushiki Kaisha Image processing apparatus and method for reproducing at least an image from a digital data sequence
KR20040065014A (en) * 2003-01-13 2004-07-21 전자부품연구원 Apparatus and method for compressing/decompressing multi-viewpoint image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09261653A (en) * 1996-03-18 1997-10-03 Sharp Corp Multi-view-point picture encoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6057884A (en) * 1997-06-05 2000-05-02 General Instrument Corporation Temporal and spatial scaleable coding for video object planes
US6043838A (en) * 1997-11-07 2000-03-28 General Instrument Corporation View offset estimation for stereoscopic video coding
US6377309B1 (en) * 1999-01-13 2002-04-23 Canon Kabushiki Kaisha Image processing apparatus and method for reproducing at least an image from a digital data sequence
KR20040065014A (en) * 2003-01-13 2004-07-21 전자부품연구원 Apparatus and method for compressing/decompressing multi-viewpoint image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEUNG C.-H. ET AL.: "A NOVEL CROSS-DIAMOND SEARCH ALGORITHM FOR BLOCK MOTION ESTIMATION", IEEE TRANSACTION ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, vol. 12, no. 12, December 2002 (2002-12-01), XP011071898 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101479185B1 (en) * 2007-05-16 2015-01-06 톰슨 라이센싱 Methods and apparatus for the use of slice groups in encoding multi-view video coding (mvc) information
KR101482642B1 (en) * 2007-05-16 2015-01-15 톰슨 라이센싱 Methods and apparatus for the use of slice groups in decoding multi-view video coding (mvc) information
US9288502B2 (en) 2007-05-16 2016-03-15 Thomson Licensing Methods and apparatus for the use of slice groups in decoding multi-view video coding (MVC) information
US9313515B2 (en) 2007-05-16 2016-04-12 Thomson Licensing Methods and apparatus for the use of slice groups in encoding multi-view video coding (MVC) information
US9883206B2 (en) 2007-05-16 2018-01-30 Thomson Licensing Methods and apparatus for the use of slice groups in encoding multi-view video coding (MVC) information
US10158886B2 (en) 2007-05-16 2018-12-18 Interdigital Madison Patent Holdings Methods and apparatus for the use of slice groups in encoding multi-view video coding (MVC) information
US8248461B2 (en) 2008-10-10 2012-08-21 Lg Electronics Inc. Receiving system and method of processing data
US9712803B2 (en) 2008-10-10 2017-07-18 Lg Electronics Inc. Receiving system and method of processing data
WO2010126221A2 (en) * 2009-04-27 2010-11-04 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video data processing method thereof
WO2010126221A3 (en) * 2009-04-27 2010-12-23 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video data processing method thereof
US8730303B2 (en) 2009-04-27 2014-05-20 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3D video data processing method thereof

Also Published As

Publication number Publication date
KR100762783B1 (en) 2007-10-05
KR20070007442A (en) 2007-01-16

Similar Documents

Publication Publication Date Title
US6567427B1 (en) Image signal multiplexing apparatus and methods, image signal demultiplexing apparatus and methods, and transmission media
US6999513B2 (en) Apparatus for encoding a multi-view moving picture
CA2238900C (en) Temporal and spatial scaleable coding for video object planes
EP2538674A1 (en) Apparatus for universal coding for multi-view video
KR100417932B1 (en) Image encoder, image encoding method, image decoder, image decoding method, and distribution media
JP5072996B2 (en) System and method for 3D video coding
CN100512431C (en) Method and apparatus for encoding and decoding stereoscopic video
RU2511595C2 (en) Image signal decoding apparatus, image signal decoding method, image signal encoding apparatus, image encoding method and programme
EP2302939B1 (en) Method and system for frame buffer compression and memory reduction for 3D video
US20090190662A1 (en) Method and apparatus for encoding and decoding multiview video
EP0966161A2 (en) Apparatus and method for video encoding and decoding
EP1707010A1 (en) Method, medium, and apparatus for 3-dimensional encoding and/or decoding of video
US20070211796A1 (en) Method and apparatus for encoding and decoding multi-view video to provide uniform picture quality
JP2010507961A (en) Multi-view video scalable coding and decoding method, and coding and decoding apparatus
MX2008002391A (en) Method and apparatus for encoding multiview video.
Omori et al. A 120 fps high frame rate real-time HEVC video encoder with parallel configuration scalable to 4K
JP2007266749A (en) Encoding method
KR101898822B1 (en) Virtual reality video streaming with viewport information signaling
WO2007007923A1 (en) Apparatus for encoding and decoding multi-view image
Haskell et al. Mpeg video compression basics
KR101386651B1 (en) Multi-View video encoding and decoding method and apparatus thereof
WO2013146636A1 (en) Image encoding device, image decoding device, image encoding method, image decoding method and program
Santamaria et al. Coding of volumetric content with MIV using VVC subpictures
KR100566100B1 (en) Apparatus for adaptive multiplexing/demultiplexing for 3D multiview video processing and its method
KR20040065170A (en) Video information decoding apparatus and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1), EPO FORM 1205A SENT ON 09/04/08 .

122 Ep: pct application non-entry in european phase

Ref document number: 05780698

Country of ref document: EP

Kind code of ref document: A1