WO2007007923A1

WO2007007923A1 - Apparatus for encoding and decoding multi-view image

Info

Publication number: WO2007007923A1
Application number: PCT/KR2005/002226
Authority: WO
Inventors: Dong Sik Yi; Kyung Hoon Bae; Won Kyoung Lee
Original assignee: 3R Inc.
Priority date: 2005-07-11
Filing date: 2005-07-11
Publication date: 2007-01-18
Also published as: KR100762783B1; KR20070007442A

Abstract

The present invention provides a method of encoding and decoding a three-view based on predicting a motion, encoding and decoding of motion compensation, predicting a disparity, encoding and decoding of disparity compensation, and a data structure for implementing a high image-quality three-dimensional moving picture based on MPEG4, which supports for a low transmission speed.

Description

APPARATUS FOR ENCODING AND DECODING MULTI-VIEW

IMAGE

Technical Field

[1] The present invention relates to an apparatus for encoding and decoding a multi- view image, and more particularly, to an apparatus for encoding and decoding a three- view image capable of implementing a high quality three dimensional moving picture transmittable at a low speed with a high compression rate, based on MPEG4 (Motion Picture Experts Group 4). Background Art

[2] MPEG (Motion Picture Experts Group) is a method of compression of a moving picture and representation of a code for transmission of information. Researches on MPEG have been performed from MPEGl, MPEG2, and MPEG4 to MPEG7.

[3] As video compression technology standardized as ISO (International Organization for Standardization) 13818, MPEG2 among MPEGs is video compression technology requiring processing of a high transmission speed of around 4 to 100 Mbps for the application for fields requiring a high quality image and a high quality sound such as a digital TV, an interactive TV, and a DVD (Digital Video Disc).

[4] In addition, MPEG4 is video compression technology for implementing a moving picture at a low transmission rate below 64 Kbps for multimedia communication in the Internet wired network and wireless networks such as a cellular network.

[5] Recently, apparatuses for encoding and decoding a multi-view image, which is based on MPEG2 or MPEG4, have been researched. For example, an apparatus for encoding and decoding a three-view image, which is based on MPEG2, is disclosed in Korean Unexamined Patent Application Publication No. 2004-65014. In addition, a binocular, that is two- view, system for three-dimensional process of a moving picture, which is based on MPEG4, is disclosed in Korean Examined Patent Registration Publication No. 397511.

[6] However, since the apparatus for encoding and decoding a three- view image disclosed in Korea Unexamined Patent Application Publication No 2004-65014 is based on MPEG2 requiring a high transmission speed of 4 to 100 Mbps, it is difficult to apply the apparatus for a multi-view image having high image quality based on MPEG4 requiring a low transmission speed below 64 Kbps.

[7] In addition, the binocular system and method for processing a three-dimensional moving picture is based on MPEG4, however, the view is limited to a two- view, so when an observer get out of the limited viewing region, or when a focus does not match, the observer cannot feel a three-dimensional effect, becomes tired in the eyes, and feels dizzy, thereby limiting practical application of the system and the method. Disclosure of Invention

Technical Problem

[8] In order to solve the aforementioned problems, an object of the present invention is to provide an apparatus for encoding and decoding a multi-view image capable of implementing a high quality three-dimensional moving picture with a high compression rate based on MPEG4 in which the transmission speed is low. Technical Solution

[9] According to an aspect of the present invention, there is provided an apparatus for encoding a multi-view image comprising: a center image encoding unit generating texture information on a center image by encoding a first center image and motion information on the center image by predicting, compensating, and encoding a motion for center images from a second center image to a predetermined center image with respect to a previous center image which is used as a reference image; a left image encoding unit generating disparity information on a left image by predicting, compensating, and encoding a disparity of a first left image and the first center image, which is used as a reference image, and generating motion information on the left image by predicting, compensating, and encoding a motion for left images from a second left image to a predetermined left image with respect to a previous left image which is used as a reference image; a right image encoding unit generating disparity information on a right image by predicting, compensating, and encoding a disparity of a first right image and the first center image, which is used as a reference image, and generating motion information on the right image by predicting, compensating, and encoding a motion for right images from a second right image to a predetermined right image with respect to a previous right image which is used as a reference image; and a multiplexer generating a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit, the left image encoding unit, and the right image encoding unit.

[10] In the aspect above, the center image encoding unit of the apparatus may comprise: an image decoding means generating the texture information on the center image by encoding the first center image, reconstructing the first center image, and providing the reconstructed first center image as the reference image, a motion predicting means generating the motion information on the center image by predicting a motion from the second center image to the predetermined center image with respect to the previous center image which is used as a reference image; a motion compensating means generating a motion compensation value based on the motion information predicted by the motion predicting means; and a subtracter generating a difference image of the center image by subtracting the motion compensation value compensated by the motion compensating means from the center image.

[11] In addition, the left image encoding unit or the right image encoding unit may comprise: a disparity and motion predicting means generating the disparity information on the first left image or the first right image by predicting a disparity of the first left image or the first right image and the first center image, which is used as the reference image, and generating the motion information on the left or right image by predicting a motion from the second left or right image to the predetermined left or right image with respect to the previous left or right image which is used as the reference image; a disparity and motion compensating means generating a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means; and a subtracter generating a difference image of the left or right image by subtracting the disparity compensation value or the motion compensation value compensated by the disparity and motion compensating means from the left or right image.

[12] In addition, the left image encoding unit and the right image encoding unit may further comprise an image encoding means generating the texture information by reconstructing and encoding the first center image used as the reference image, respectively.

[13] In addition, the center image encoding unit, the left image encoding unit, and the right image encoding unit may predict the motion using a diamond search algorithm.

[14] According to another aspect of the present invention, there is provided an apparatus for decoding a multi-view image comprising: a demultiplexer demultiplexing a multiplexed data stream and providing data streams of center, left, and right images; a center image decoding unit generating a reconstructed first center image by decoding texture information included in the data stream of the center image input from the demultiplexer and center images from a reconstructed second center image to a reconstructed predetermined center image by decoding and compensating motion information included in the center image data stream with respect to a previous center image which is used as a reference image; a left image decoding unit generating a first center image by decoding texture information included in the data stream of the left image, a first left image by decoding and compensating disparity information included in the data stream of the left image with respect to the first center image which is used as a reference image, and left images from a second left image to a predetermined left image by decoding and compensating motion information included in the data stream of the left image with respect to a previous left image which is used as a reference image; and a right image decoding unit generating a first center image by decoding texture information included in the data stream of the right image, a first right image by decoding and compensating disparity information included in the data stream of the right image with respect to the first center image which is used as a reference image, and right images from a second right image to a predetermined right image by decoding and compensating motion information included in the data stream of the right image with respect to a previous right image which is used as a reference image; and an image memory storing the reference images.

[15] In the aspect above, the center image decoding unit may comprise: an image decoding means decoding the texture information included in the data stream of the center image; a motion decoding means decoding the motion information included in the data stream of the center image; a motion compensating means compensating the previous center image stored in the image memory with the motion information decoded by the motion decoding means; and an image reconstructing means generating the first center image from the decoded texture information and center images from the second center image to the predetermined center image from the decoded and compensated motion information.

[16] In addition, the left image decoding unit and the right image decoding unit respectively may comprise: an image decoding means decoding the texture information included in the data stream of the left image or the right image; a disparity and motion decoding means decoding the disparity information and the motion information included in the data stream of the left image or the right image; a disparity and motion compensating means compensating the texture information decoded by the image decoding unit with the disparity information decoded by the disparity and motion decoding means and the previous left or right image stored in the image memory with the motion information decoded by the disparity and motion means; and an image reconstructing means generating the first left or right image from the decoded and compensated disparity information and images from the second left or right image to the predetermined left or right image from the decoded and compensated motion information. Brief Description of the Drawings

[17] The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:

[18] FIG. 1 is a block diagram illustrating an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention;

[19] FIG. 2 is a block diagram illustrating an apparatus for encoding a multi-view image according to an embodiment of the present invention; [20] FIG. 3 is a diagram illustrating relation between motion prediction and disparity prediction of a multi-view image in an apparatus for encoding a multi-view image according to an embodiment of the present invention;

[21] FIG. 4 is a diagram illustrating a method of predicting a motion in a conventional full search block matching algorithm;

[22] FIG. 5 is a diagram illustrating a method of predicting a motion in a diamond search algorithm according to an embodiment of the present invention; and

[23] FIG. 6 is a block diagram illustrating an apparatus for decoding a multi-view image according to an embodiment of the present invention. Best Mode for Carrying Out the Invention

[24] Hereinafter, the present will be described in detail with reference to accompanying drawings.

[25] As illustrated in FIG. 1, an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention includes an apparatus 100 for encoding a multi-view image receiving and encoding center, left, and right images from three cameras 300 installed at locations having a same interval and horizontally at a same height and an apparatus 200 for decoding a multi-view image decoding the encoded image by a reverse process.

[26] Referring to FIGS. 1 and 2, the apparatus 100 for encoding a multi-view image according to an embodiment of the present invention includes a center image encoding unit 110, a left image encoding unit 120, a right image encoding unit 130, a multiplexer 140, and a buffer 150.

[27] The center image encoding unit 110 includes an image encoding means 111, a motion predicting means 112, a motion compensating means 113, and a subtracter 114. The center image encoding unit 110 generates texture information and motion information on the center image by encoding the center image while predicting a motion of the received center image as input and compensating for the motion.

[28] The image encoding means 111 provides the texture information on the center image generated by encoding the center image to the multiplexer 140 and a reconstructed center image as a reference image to the motion predicting means 112 and the subtracter 114.

[29] The motion predicting means 112 generates the motion information by predicting the motion of the center image in reference with the reconstructed center image and provides the generated motion information to the motion compensating means 113 and the multiplexer 140.

[30] The motion compensating means 113 generates a motion compensation value based on the motion information predicted by the motion predicting means 112 and provides the generated motion compensation value to the subtracter 114.

[31] The subtracter 114 generates a difference image of the center image by subtracting the motion compensation value input from the motion compensating means 113 from the reconstructed center image input from the image encoding means 111.

[32] The left or right encoding unit 120 or 130 includes a disparity and motion predicting means 121, a disparity and motion compensating means 122, a subtracter 123, and an image encoding means 124. The left or right encoding unit 120 or 130 generates texture information on the center image and disparity information on the left or right image and motion information on the left or right image by predicting and compensating a motion of the received left or right image as input, predicting and compensating a disparity in reference with the input center image as a reference image, and encoding the left or right image.

[33] The disparity and motion predicting means 121 generates disparity information by predicting the disparity of the left or right image in reference with the input center image used as a reference image and the motion information by predicting a motion of the next left or right image in reference with the previous left or right image used as a reference image, and provides the generated disparity information and the motion information to the disparity and motion compensating means 122 and the multiplexer 140.

[34] The disparity and motion compensating means 122 generates a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means 121 and provides the generated disparity compensation value and motion compensation value to the subtracter 123.

[35] The subtracter 123 generates a difference image of the left or right image by subtracting the disparity compensation value and the motion compensation value input from the disparity and motion compensating means 122 from the left or right image input from the camera 300.

[36] The image encoding means 124 provides the texture information generated by reconstructing and encoding the center image used as the reference image to the multiplexer 140.

[37] The multiplexer 140 generates a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit 110, the left image encoding unit 120, and the right image encoding unit 130.

[38] The buffer 150 stores the data stream multiplexed by the multiplexer 140.

[39] Generally, MPEG4 (Motion Picture Experts Group 4) encoding is divided into natural image encoding and synthetic image encoding. The natural image is acquired from a camera, and the synthetic image is a mixture of an artificial image generated by computer graphics or the like and the natural image. MPEG4 has a considerably superior compression efficiency and image quality to a conventional method. In MPEG4, a source consisting of multimedia data including a binary format for scene, a video, an audio, an animation letter, and the like is divided into a single media by a demultiplexer, and the divided data is combined partly for special applications and provided to a user.

[40] MPEG4 maintains three forms including I- VOP (Intra-coded Video Object Plane),

P-VOP (Predictive coded VOP), and B-VOP (Bidirectionally predictive coded VOP), as is similar to a conventional method. Here, I- VOP encodes only by performing DCT (Discreate Cosine Transform) operation on the VOP without using motion compensation. P-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed in reference with I- VOP or another P-VOP. B-VOP encodes by performing DCT operation on a difference component remaining after the motion compensation is performed from two frames on a time axis, although B-VOP uses the motion compensation like P-VOP.

[41] In this MPEG4 compression method, it is required to remove two-dimensional spatial redundancy and temporal redundancy between frames of the image data for efficient compression of continuity of screens which changes according to time.

[42] Referring to FIG. 3, an apparatus for encoding a multi-view image according to an embodiment of the present invention, at first, generates texture information by respectively encoding a first center image that is used as a reference image. Then, the apparatus in the embodiment generates disparity information by predicting and encoding the disparity of a first left and right images and the first center image that is a reference image. For images from second center, left and right images, the motion information is generated by predicting and encoding a motion of up to predetermined center, left, and right images, for example thirty frames of the images, in reference with previous images.

[43] Accordingly, in the apparatus for encoding a multi-view image in the embodiment, the spatial and temporal redundancy of adjacent images can be removed by encoding the first center image as texture information of I- VOP which has a large data size and other images as disparity information or motion information of P-VOP which has a small data size, so the amount of data can be decreased. According to the apparatus for encoding a multi-view image in the embodiment, a compression rate of the images can be increased and it is possible to implement a high image-quality three-dimensional moving picture based on MPEG4 that supports for a low transmission speed.

[44] Referring to FIG. 4, a conventional MPEG4 encoding apparatus uses a full search block matching algorithm for motion prediction in which pixels of two images a and b within a predetermined region to be predicted for the motion are compared in units of a macro block, and the disparity from a macro block which has the least error is assigned to a motion vector c. In the full search block matching algorithm, a range for searching for the prediction of the motion can be determined by a parameter file or the like. When a motion is within the range for searching, the full search block matching algorithm shows a good performance. However, when the motion is out of the range for searching due to a big motion of the image, the accuracy of the prediction decreases in the algorithm. In other words, when the accuracy of the motion prediction increases, the compression rate and the image quality can be improved on the whole, since the amount of data of the difference image after the motion compensation is decreased. However, when the range for searching is set wide for the improvement of the compression rate and the image quality, the amount of calculation is increased to be inappropriate for real-time processing. Generally, most of the processing time for the MPEG4 compression is for predicting the motion, accordingly an algorithm having a comparatively wide search range and capable of decreasing the processing time is required. A diamond search algorithm as illustrated in FIG. 5 is used according to an embodiment of the present invention.

[45] Referring to FIG. 5, in the diamond search algorithm according to an embodiment of the present invention, searching is performed by using a large diamond d until the least value of SAD (sum of absolute difference) is assigned to the center of the large diamond d, and searching for an optimized motion vector using a small diamond e is performed. The diamond search algorithm shows a very short processing time.

[46] Referring to FIGS. 1 and 6, an apparatus 200 for decoding a multi-view image according to an embodiment of the present invention includes a demultiplexer 210, a center image decoding unit 220, a left image decoding unit 230, a right image decoding unit 240, and an image memory 250.

[47] The demultiplexer 210 demultiplexes the multiplexed data stream input from the multiplexer 140 of the apparatus for encoding a multi-view image and provides data streams of the center, left, and right images to the center image decoding unit 220, the left image decoding unit 230, and the right image decoding unit 240, respectively.

[48] The center image decoding unit 220 includes an image decoding means 221, an image reconstructing means 222, a motion decoding means 223, and a motion compensating means 224. The center image decoding unit 220 generates a reconstructed center image by decoding the data stream of the input center image and compensating for the motion.

[49] The image decoding means 221 decodes the texture information included in the received data stream of the center image as input and provides the decoded texture information to the image reconstructing means 222. [50] The image reconstructing means 222 reconstructs the decoded image by the image decoding means 221 and reconstructs the image compensated by the motion compensating means 224.

[51] The motion decoding means 223 decodes the motion information included in the data stream of the received center image and provides the decoded motion information to the motion compensating means 224.

[52] The motion compensating means 224 compensates a previous image that is input from the image memory 250 as a reference image with the decoded motion information input from the motion decoding means 223 and provides the compensated previous image to the image reconstructing means 222.

[53] The left image decoding unit 230 or the right image decoding unit 240 includes an image decoding means 231, a disparity and motion decoding means 232, a disparity and a motion compensating means 233, and an image reconstructing means 234. The left or right image decoding unit 230 or 240 generates a reconstructed left or right image by decoding the input data stream of the left or right image and compensating the disparity and the motion.

[54] The image decoding means 231 decodes the texture information of the center image, which is used as a reference image, included in the input data stream of the left or right image and provides the decoded texture information to the disparity and motion compensating means 233.

[55] The disparity and motion decoding means 232 decodes the disparity information and the motion information included in the input data stream of the left or right image and provides the decoded disparity information and the motion information to the disparity and motion compensating means 233.

[56] The motion compensating means 233 compensates the reference image input from the image decoding means 231 with the disparity information input from the disparity and motion decoding means 232, compensates the previous image input from the image memory 250 with the motion information input from the disparity and motion decoding means 232, and provides the compensated images to the image reconstructing means 234.

[57] The image reconstructing means 234 reconstructs the image compensated by the disparity and motion compensating means 233.

[58] The image memory 250 stores the previous image that becomes a reference for compensating the motion of the center image.

[59] While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims. Industrial Applicability

[60] An apparatus for encoding and decoding a multi-view image according an embodiment of the present invention removes spatial and temporal redundancy by predicting a motion and a disparity among central, left, and right images, thereby decreasing the amount of the data transmitted in a three-dimensional moving picture based on MPEG4.

[61] In addition, an apparatus for encoding and decoding a multi-view image according to an embodiment of the present invention can decrease the processing time required for predicting a motion by using a diamond search algorithm, thereby capable of realtime compressing and transmitting a three dimensional moving picture based on MPEG4.

[62] Accordingly, an apparatus for encoding and decoding a multi-view image according an embodiment of the present invention is capable of implementing a high image- quality three-dimensional moving picture based on MPEG4, which supports for a low transmission speed, in a real time.

Claims

[1] An apparatus for encoding a multi-view image comprising: a center image encoding unit generating texture information on a center image by encoding a first center image and motion information on the center image by predicting, compensating, and encoding a motion for center images from a second center image to a predetermined center image with respect to a previous center image which is used as a reference image; a left image encoding unit generating disparity information on a left image by predicting, compensating, and encoding a disparity of a first left image and the first center image, which is used as a reference image, and generating motion information on the left image by predicting, compensating, and encoding a motion for left images from a second left image to a predetermined left image with respect to a previous left image which is used as a reference image; a right image encoding unit generating disparity information on a right image by predicting, compensating, and encoding a disparity of a first right image and the first center image, which is used as a reference image, and generating motion information on the right image by predicting, compensating, and encoding a motion for right images from a second right image to a predetermined right image with respect to a previous right image which is used as a reference image; and a multiplexer generating a data stream by multiplexing the texture information, the motion information, and the disparity information respectively input from the center image encoding unit, the left image encoding unit, and the right image encoding unit.

[2] The apparatus of claim 1, wherein the center image encoding unit comprises: an image decoding means generating the texture information on the center image by encoding the first center image, reconstructing the first center image, and providing the reconstructed first center image as the reference image; a motion predicting means generating the motion information on the center image by predicting a motion from the second center image to the predetermined center image with respect to the previous center image which is used as a reference image; a motion compensating means generating a motion compensation value based on the motion information predicted by the motion predicting means; and a subtracter generating a difference image of the center image by subtracting the motion compensation value compensated by the motion compensating means from the center image.

[3] The apparatus according to claim 1, wherein the left image encoding unit or the right image encoding unit comprises: a disparity and motion predicting means for generating the disparity information on the first left image or the first right image by predicting a disparity of the first left image or the first right image and the first center image, which is used as the reference image, and generating the motion information on the left or right image by predicting a motion from the second left or right image to the predetermined left or right image with respect to the previous left or right image which is used as the reference image; a disparity and motion compensating means for generating a disparity compensation value and a motion compensation value based on the disparity information and the motion information predicted by the disparity and motion predicting means; and a subtracter generating a difference image of the left or right image by subtracting the disparity compensation value or the motion compensation value compensated by the disparity and motion compensating means from the left or right image.

[4] The apparatus according to claim 3, wherein the left image encoding unit and the right image encoding unit further comprise an image encoding means for generating the texture information by reconstructing and encoding the first center image used as the reference image, respectively.

[5] The apparatus according to claim 1, wherein the center image encoding unit, the left image encoding unit, and the right image encoding unit predict the motion using a diamond search algorithm.

[6] An apparatus for decoding a multi-view image comprising: a demultiplexer demultiplexing a multiplexed data stream and providing data streams of center, left, and right images; a center image decoding unit generating a reconstructed first center image by decoding texture information included in the data stream of the center image input from the demultiplexer and center images from a reconstructed second center image to a reconstructed predetermined center image by decoding and compensating motion information included in the center image data stream with respect to a previous center image which is used as a reference image; a left image decoding unit generating a first center image by decoding texture information included in the data stream of the left image, a first left image by decoding and compensating disparity information included in the data stream of the left image with respect to the first center image which is used as a reference image, and left images from a second left image to a predetermined left image by decoding and compensating motion information included in the data stream of the left image with respect to a previous left image which is used as a reference image; and a right image decoding unit generating a first center image by decoding texture information included in the data stream of the right image, a first right image by decoding and compensating disparity information included in the data stream of the right image with respect to the first center image which is used as a reference image, and right images from a second right image to a predetermined right image by decoding and compensating motion information included in the data stream of the right image with respect to a previous right image which is used as a reference image; and an image memory storing the reference images.

[7] The apparatus according to claim 6, wherein the center image decoding unit comprises: an image decoding means for decoding the texture information included in the data stream of the center image; a motion decoding means for decoding the motion information included in the data stream of the center image; a motion compensating means for compensating the previous center image stored in the image memory with the motion information decoded by the motion decoding means; and an image reconstructing means for generating the first center image from the decoded texture information and center images from the second center image to the predetermined center image from the decoded and compensated motion information.

[8] The apparatus according to claim 6, wherein the left image decoding unit and the right image decoding unit respectively comprise: an image decoding means for decoding the texture information included in the data stream of the left image or the right image; a disparity and motion decoding means for decoding the disparity information and the motion information included in the data stream of the left image or the right image; a disparity and motion compensating means for compensating the texture information decoded by the image decoding unit with the disparity information decoded by the disparity and motion decoding means and the previous left or right image stored in the image memory with the motion information decoded by the disparity and motion means; and an image reconstructing means for generating the first left or right image from the decoded and compensated disparity information and images from the second left or right image to the predetermined left or right image from the decoded and compensated motion information.