US20060109908A1 - Method of retrieving video picture and apparatus therefor - Google Patents
Method of retrieving video picture and apparatus therefor Download PDFInfo
- Publication number
- US20060109908A1 US20060109908A1 US11/328,054 US32805406A US2006109908A1 US 20060109908 A1 US20060109908 A1 US 20060109908A1 US 32805406 A US32805406 A US 32805406A US 2006109908 A1 US2006109908 A1 US 2006109908A1
- Authority
- US
- United States
- Prior art keywords
- information
- picture
- section
- retrieval
- shape
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using shape
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7857—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using texture
Definitions
- the present invention relates to a retrieval technique for video data and, more particularly, to a method of retrieving a video picture and apparatus therefor, which use the function of coding a video picture in units of arbitrary shape objects and are implemented by MPEG4 as an internal standard scheme for video coding in the process of standardization in ISO/IECJTC/SC29/WG11.
- MPEG4 in the process of standardization, the function of coding a video picture in units of arbitrary shape objects (e.g., a human figure in a picture), which cannot be implemented by MPEG1 or MPEG2 that is a conventional international standard scheme for video coding, can be implemented.
- arbitrary shape objects e.g., a human figure in a picture
- information representing the shape or size of each object is required. This information is coded together with texture information representing changes in luminance and chrominance inside each object, and the resultant data is transmitted or stored.
- a change in luminance in a frame e.g., edge information
- change in luminance between frames e.g., optical flow
- An index for retrieval is assigned to each classified group.
- retrieval processing can be performed in units of objects in each frame as minimum units.
- a method of retrieving a video picture comprising decoding a coded bit stream of video data representing an arbitrary shape object and including shape information and texture information, and supplying a retrieval condition for retrieval of a desired picture and retrieving a picture meeting the retrieval condition by using shape information of the object decoded by the above decoding.
- an apparatus for retrieving a video picture comprising a decoder section which decodes a coded bit stream of video picture data representing an arbitrary shape object and including shape information and texture information, a retrieval condition input section which inputs a retrieval condition for retrieval of a desired picture, and a retrieval section which retrieves a picture meeting the retrieval condition by using shape information of the object decoded by the decoder section.
- a method of retrieving a video picture comprising decoding a coded bit stream of an arbitrary shape object including shape information and texture information and corresponding to video data coded by MPEG4 when retrieving a desired picture from the video data coded by MPEG4, retrieving a video picture meeting a supplied retrieval condition for retrieval of a desired video picture by using shape information of a decoded object, and presenting a retrieved result.
- a video picture retrieving apparatus for retrieving a desired picture from video data coded by MPEG4, comprising a decoder section for decoding a coded bit stream of an arbitrary shape object which includes shape information and texture information and corresponds to video data coded by MPEG4, a retrieval condition input section which inputs a retrieval condition for retrieval of a desired picture, a retrieval section which retrieves a video picture meeting a retrieval condition by using the shape information of the object decoded by the decoder section, and an output section which presents a retrieved result obtained by the retrieval section.
- a method of retrieving a video picture and apparatus therefor which can perform sophisticated video picture retrieval in consideration of the contents of a picture by using shape information (size, shape, motion, and position in a picture) of each object without using any complicated signal processing section.
- FIG. 1 is a view for explaining a coding area including an object
- FIGS. 2A to 2 C are views for explaining the arrangement of coded shape data in detail
- FIG. 3 is a view for explaining the attribute of each macroblock
- FIG. 4 is a block diagram for explaining an outline of an MPEG4 system
- FIG. 5 is a block diagram showing the schematic arrangement of a video picture retrieving apparatus according to an embodiment of the present invention.
- FIG. 6 is a block diagram showing the schematic arrangement of a video picture retrieving apparatus having a display section for synthesizing objects and displaying the resultant information according to the second embodiment of the present invention.
- FIGS. 7A and 7B are flowcharts showing two processes of providing a retrieved result to a user.
- a shape information coding technique used in the present invention will be briefly described first.
- a shape information coding method in MPEG4 is described in “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, pp. 1984-1986, 1997.
- picture information is coded in units of macroblocks each containing shape information in addition to texture information as video data.
- a macroblock is one of the blocks obtained by dividing picture information in “16 ⁇ 16” pixels.
- This picture information is expressed by binary data called an alpha-map prepared as information indicating the shape and distribution of an object in texture information as video data.
- a coding area containing an object in MPEG4 will be described with reference to FIG. 1 .
- a coding area (called a Bounding-Box or Bounding-Rectangle) containing an object (called a VOP (Video Object Plane) in MPEG4) as a coding target is set in a picture (frame), and this area is divided into “16 ⁇ 16”-pixel blocks.
- the object is then coded in units of “16 ⁇ 16”-pixel blocks, i.e., macroblocks.
- the sizes (vop_width, vop_height) and position vectors (spatial_reference (vop_horizontal_mcspatial_ref, vop_vertical_mc_spatial_ref)) of Bounding-Boxes are coded in units of VOPs.
- FIGS. 2A, 2B , and 2 C show the format of coded data.
- This format corresponds to the arrangement of coded data in MPEG4 (see “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, p. 1974, 1997).
- the header information of each frame is written, and a macroblock follows this header information.
- each macroblock includes shape information A 1 , motion vector information A 2 , and DCT coefficient information A 3 .
- the shape information A 1 includes mode information S 1 , shape motion vector information S 2 , and coded binary picture information S 3 .
- the mode information S 1 is information indicating the attribute of each macroblock.
- the shape motion vector information S 2 is motion vector information for motion compensation prediction of the shape of each macroblock.
- the coded binary picture information S 3 is information obtained by handling the detailed shape of each macroblock as a binary picture and coding the binary picture.
- Macroblocks are classified into three types, i.e., a “transparent macroblock” in which the 16 ⁇ 16 pixels include no object pixel; an “opaque macroblock” in which all the 16 ⁇ 16 pixels are object pixels; and a “boundary macroblock in which some of the 16 ⁇ 16 pixels are object pixels.
- MOOD 1 indicates that the macroblock is a transparent macroblock.
- MOOD 2 indicates that the macroblock is an opaque macroblock.
- MOOD 3 indicates that the macroblock is coded binary picture (intraframe) information.
- MOOD 5 indicates that the macroblock is constituted by zero motion vector information and non-zero motion vector information (MV ⁇ 0).
- MOOD 7 indicates that the macroblock is constituted by non-zero motion vector information and coded binary picture (interframe) information.
- the shape motion vector information S 2 appears when mode 6 (MOOD 6) and mode 7 (MOOD 7) are set.
- the coded binary picture information S 3 appears when mode 3 (MOOD 3) and mode 7 (MOOD 7) are set.
- a target scene designated by a user is retrieved by using such mode information and shape motion vector information in shape information in MPEG4.
- An MPEG4 system is disclosed in “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, p. 1962, 1997. An outline of the MPEG4 system will be briefly described below.
- the MPEG4 system has an arrangement like the one shown in FIG. 4 .
- a coder apparatus is comprised of a video object coder section 11 for coding a video object, an audio object coder section 12 for coding an audio object, a scene description object coder section 13 for coding a scene description object, and a media multiplexer section 14 for multiplexing and transmitting these coded objects.
- a decoder apparatus is comprised of a media demultiplexer section 15 , a video object decoder section 16 , an audio object decoder section 17 , a scene description object decoder section 18 , and an object reconstruction section 19 .
- the media demultiplexer section 15 demultiplexes the multiplex data transmitted from the coder apparatus to obtain the original video object, an audio object, and a scene description object.
- the video object decoder section 16 decodes the coded video object demultiplexed by the media demultiplexer section 15 into the original video object.
- the audio object decoder section 17 decodes the coded audio object demultiplexed by the media demultiplexer section 15 into the original audio object.
- the scene description object decoder section 18 decodes the coded scene description object demultiplexed by the media demultiplexer section 15 into the original scene description object.
- the object reconstruction section 19 synthesizes the video and audio objects in accordance with the scene description object to reconstruct the picture to be displayed.
- the supplied video and audio objects and the like are respectively coded by the corresponding coder sections 11 and 12 .
- the media multiplexer section 14 multiplexes these coded objects with the scene description object, which is obtained by the scene description object coder section 13 and describes how the respective objects are synthesized and provided to a user.
- the multiplex bit stream is then transmitted or stored.
- the media demultiplexer section 15 demultiplexes this transmitted or stored bit stream into the respective objects. These objects are then reconstructed into the original objects by the corresponding object decoder sections 16 , 17 , and 18 . Thereafter, the object reconstruction. section 19 synthesizes these objects in accordance with the scene description, and the display section presents the resultant information to the user.
- a video picture retrieving apparatus has the arrangement shown in FIG. 5 . More specifically, the video picture retrieving apparatus is basically comprised of a decoder section 101 , a retrieval section 102 , a retrieved result output section 103 , and a retrieval key information input section 104 . Of these components, the decoder section 101 serves to decode shape information. The decoder section 101 decodes the coded bit stream of an arbitrary shape object supplied through a coded bit stream input line 105 into shape information, and outputs the decoded shape information to a decoded information output line 106 .
- the retrieval section 102 retrieves the picture or scene desired by the user from the shape information supplied through the decoded information output line 106 . More specifically, when the user inputs conditions and the like for a desired picture or scene with the retrieval key information input section 104 , the information is supplied as retrieval key information to the retrieval section 102 through a retrieval key information input line 107 . The retrieval section 102 compares this retrieval key information with the shape information from the decoder section 101 to retrieve the desired picture or scene defined by the retrieval key information, and outputs the retrieved result to the retrieved result output section 103 .
- the retrieved result output section 103 is, for example, a display or printer, and presents the retrieved result from the retrieval section 102 to the user.
- step F 1 key information input by a user via the retrieval key information input section 104 (step F 1 ).
- the bit stream is decoded every frame or every several frames (step F 2 ).
- a desired scene is retrieved by the retrieval section 102 , using the key information obtained in step F 1 and the decoded result obtained in step F 2 (step F 3 ).
- step F 4 the retrieval result output section 103 provides the retrieved result.
- step F 2 the processing returns to step F 2 to restart the decoding of the bit stream.
- step F 5 After the retrieved result is provided in step F 4 , it is determined in step F 5 whether or not the entire bit stream has been decoded in step F 2 . If the entire bit stream is decoded, the processing is terminated.
- step F 5 even when the user forcefully terminates a processing, the decoding is determined as having been completed. In this case, the processing may be cut off.
- FIG. 7A embodiment the retrieved results are sequentially provided.
- FIG. 7B embodiment provides the retrieved results together after the completion of decoding of the bit stream.
- key information is provided by a user via the retrieval key information input section (step F 6 ).
- the bit stream is decoded every frame or every several frames in the decoder section 101 (step F 7 ).
- step 8 a desired seine is retrieved by the retrieval section 102 using the key information obtained in step F 6 and the decoded result obtained in step F 7 .
- the processing advances to step F 9 , while when the retrieval result is not obtained, the processing returns to step F 7 to restart decoding of the bit stream.
- step F 9 the indexes (e.g., the number (or time information) of the top frame of a scene obtained as a result) indicating the retrieved results are sequentially created by the retrieval section 102 .
- the indexes are stored in the retrieval section 102 until they are requested by the retrieved result provider section 102 .
- step F 10 it is determined whether or not the decoding of the entire bit stream is completed in step F 7 .
- the processing returns to the step F 7 to restart the decoding, while if the entire bit stream has been decoded, the processing is terminated. Even when the user forcefully terminates a processing in step 10 , the decoding is determined as having been completed. In this case, the processing may be cut off.
- This system having such an arrangement executes retrieval processing by using alpha-map data of the video data compressed/coded by MPEG4.
- the video data compressed/coded by MPEG4 has a picture component and an alpha-map information component obtained by binarizing an object shape or position information in the picture.
- the alpha-map information is therefore sent as the shape information A 1 having the format shown in FIG. 2C . This information is used for retrieval processing.
- the coded bit stream of an arbitrary shape object as information of an alpha-map is supplied to the decoder section 101 through the coded bit stream input line 105 .
- the decoder section 101 decodes the coded bit stream into the shape information A 1 and supplies the decoded shape information A 1 to the retrieval section 102 through the decoded information output line 106 .
- the retrieval section 102 compares the retrieval key information supplied from the user through the retrieval key information input line 107 with the shape information A 1 supplied through the decoded information output line 106 to retrieve a desired picture or scene.
- a given motion picture is compressed/coded by MPEG4, and the user wants to retrieve a picture of a close-up scene of a given character in the motion picture.
- a user inputs information, e.g., the approximate size and location of the character in a picture, with the retrieval key information input section 104 (an input terminal, operation unit (not shown), or the like). This information is input as retrieval key information to the retrieval section 102 through retrieval key information input line 107 .
- the retrieved result output section 103 compares the retrieval key information from the user with the shape information A 1 sequentially supplied through the decoded information output line 106 to search for information similar to the retrieval key information. If such information is present, the information is supplied to the retrieved result output section 103 through a retrieved result output line 108 . The information is then presented by the retrieved result output section 103 . That is, the information is displayed or printed. This presented information is a reconstructed picture of MPEG4 at this time. Upon seeing this picture, the user can know whether the picture is the target picture.
- the decoder section 101 may decode only the shape information A 1 of the arbitrary shape object and retrieve the information instead of decoding all the object data.
- a method of using only some of the three types of shape information A 1 in MPEG4, i.e., “mode information S 1 ”, “shape motion vector S 2 ”, and “coded binary picture information S 3 ”, is also available.
- mode information S 1 information indicating the approximate position of a target object in a picture is supplied from the user as retrieval key information.
- the retrieval section 102 may extract a picture in which mode 2 (MOOD 2) to mode 7 (MOOD 7) are distributed to almost coincide with the retrieval key information without completely reconstructing the shape.
- a scene corresponding to a request to retrieve “a close-up scene” from the user can be retrieved by searching for a scene in which the number of macroblocks corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7) gradually increases for every frame.
- a scene corresponding to a request to retrieve a scene including two objects can be retrieved by searching for a scene in which macroblock corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7) can be grouped into two sets.
- the retrieval section 102 retrieves a frame having the maximum number of macroblocks corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7)
- the retrieved result output section 103 may display a close-up of the target object.
- the size of the object may be estimated by decoding at least the value of the size (the values of vop-width, vop-height) of Bounding-Box and the value of the position (spatial_reference) thereof.
- the information of reconstructed Bounding-Box is output from the line 106 shown in FIG. 5 .
- the picture when a picture including a target object is to be retrieved, and the user knows the approximate position of the object in the picture, the picture can be retrieved by determining the position of the object in the picture in accordance with mode information contained in shape information in a data format conforming to MPEG4. If, however, the more approximate position of a target object is allowed, a target picture may be determined by decoding only a position vector.
- a picture can also be retrieved by using state information as key information, e.g., information indicating that the object is gradually crushed in the vertical direction or information indicating that the shape abruptly changes. That is, by retrieving a target picture using state information as key information, the user can search out the corresponding picture.
- state information e.g., information indicating that the object is gradually crushed in the vertical direction or information indicating that the shape abruptly changes. That is, by retrieving a target picture using state information as key information, the user can search out the corresponding picture.
- the shape motion vector S 2 indicates how the shape changes with time. If, therefore, key information indicating that an object is gradually crushed in the vertical direction is supplied, a corresponding motion vector may be searched out. If key information indicating that a shape abruptly changes is supplied, a scene whose motion vector abruptly changes may be searched out.
- the above retrieving method is used when the state of a picture is known.
- a target object or picture can be retrieved by using a camera parameter as retrieval information.
- a corresponding embodiment will be described below.
- the retrieval section 102 estimates a camera parameter from shape information (alpha-map) of MPEG4, and a picture is retrieved by using the estimated camera parameter as retrieval key information. This case will be described below as the first example.
- a zoom parameter for the camera can be estimated by obtaining a state in which the size of an object changes with time on the basis of the number of macroblocks of mode 2 (MOOD 2) to mode 7 (MOOD 7) or the value of (vop_width, vop_height).
- a pan/tilt parameter for the camera can be estimated by obtaining a change in the position of an object with time on the basis of shape motion vector information or position vector (spatial_reference).
- a method of obtaining a camera parameter will be described in detail below as the second example.
- decoded shape information is deformed by affine transform to perform matching between frames.
- detailed camera parameters such as “zoom”, “pan”, and “tilt”, can be obtained.
- the amount of processing for matching can be reduced by using only decoded pixel values in “boundary macroblocks” instead of using all the pixel values of decoded shape information.
- a camera parameter is estimated from shape information (alpha-map) of MPEG4, and a picture is retrieved by using the estimated camera parameter as retrieval key information.
- MPEG4 uses a technique of writing a scenario indicating how a target object in a picture is developed, and developing the picture according to the contents of the scenario. This scenario is implemented by information called a scene description object. The third example in which a target picture is retrieved from the information of this scene description object will be described next.
- FIG. 6 shows a selecting section for selecting a representative frame as a unit for presenting a retrieval result from information of a scene description object.
- This selecting section includes a scene description object output section 201 , object synthesis section 202 , and display section 203 .
- the scene description object output section 201 outputs information as a scenario which has been written by a contents producer to designate the composition of a picture.
- a plurality of objects are generally reconstructed by the decoder section 101 which has decoded a bit stream. These objects are synthesized in accordance with the scene description object output section 201 . Thereafter, the resultant object is supplied to the display section 203 to be presented to the user. In this manner, the object synthesis section 202 synthesizes objects and outputs the resultant object.
- the data from the scene description object output section 201 is multiplexed with data of another object and supplied.
- the display section 203 may be identical to the retrieved result output section 103 or not.
- the scene description object decoded by the scene description object decoder section 18 on the decoder apparatus side is supplied from the scene description object output section 201 to the object synthesis section 202 through a scene description object input line 204 .
- the object synthesis section 202 analyzes the information (e.g., “enlarging and displaying object B” or “synthesizing object A with the foreground of object B”) of a scene description object to search for a frame coinciding with a predetermined condition, and sets the frame as a representative frame.
- information e.g., “enlarging and displaying object B” or “synthesizing object A with the foreground of object B”
- predetermined condition is, for example, a condition indicating that when a specific object is closed up, the area of the object is computed and a frame corresponding to the maximum area of the object is set as a representative frame in the object synthesis section 202 .
- shape information size, shape, motion, and position in a picture
- sophisticated video picture retrieval can be implemented in consideration of the contents of a picture without requiring any complicated signal processing unit.
- the retrieval operation is performed using the shape information provided in the macroblock.
- the retrieval operation may be performed using the header information.
- the header block includes information spatial_reference, vop_width, and vop-height shown in FIG. 1 .
- the retrieval operation is performed on the basis of the above information of the header.
- the video picture may be retrieved using a position of the object within the frame which is indicated by the information vop-reference, a horizontal size of the object which is indicated by the information vop-width, a vertical size of the object which is indicated by the information vop-height, and an area of the bounding box surrounding the object which is indicated by vop-width and vop-height.
- a video picture retrieving method and apparatus which can implement sophisticated video picture retrieval in consideration of the contents of a picture without requiring any complicated signal processing unit.
Abstract
An apparatus for retrieving a video picture includes a decoder section for decoding a coded bit stream of video picture data representing an arbitrary shape object and including shape information and texture information, a retrieval condition input section for inputting a retrieval condition for retrieval of a desired picture, a retrieval section for retrieving a picture meeting the retrieval condition by using shape information of the object decoded by the decoder section, and a display section for outputting the retrieved result obtained by the retrieval section.
Description
- The present invention relates to a retrieval technique for video data and, more particularly, to a method of retrieving a video picture and apparatus therefor, which use the function of coding a video picture in units of arbitrary shape objects and are implemented by MPEG4 as an internal standard scheme for video coding in the process of standardization in ISO/IECJTC/SC29/WG11.
- According to MPEG4 in the process of standardization, the function of coding a video picture in units of arbitrary shape objects (e.g., a human figure in a picture), which cannot be implemented by MPEG1 or MPEG2 that is a conventional international standard scheme for video coding, can be implemented.
- To implement this function, information representing the shape or size of each object is required. This information is coded together with texture information representing changes in luminance and chrominance inside each object, and the resultant data is transmitted or stored.
- In a conventional video picture retrieving technique, a change in luminance in a frame (e.g., edge information), change in luminance between frames (e.g., optical flow), or the like is detected, and video sequences are classified by checking changes in these pieces of information. An index for retrieval is assigned to each classified group.
- When these processes are performed by a decoder apparatus, a complicated signal processing unit is often required after a reconstructed picture is generated. For this reason, techniques of reducing the processing amount by analyzing a video picture on the basis of motion vector information obtained in the process of generating a reconstructed picture or DCT (Discrete Cosine Transform) coefficient information have also been proposed (for example, Jpn. Pat. Appln. KOKAI Publication Nos. 6-113280 and 7-152779 and Japanese Patent Application No. 8-178778).
- In any case, there is a limit to the technique of analyzing video pictures in units of frames and retrieving a video picture with high precision.
- When a video picture is to be retrieved from conventional coded video data (MPEG1 or MPEG2)., since retrieval processing is performed in units of frames as minimum units, it is difficult to perform video picture retrieval with high precision.
- In contrast to this, according to MPEG4, retrieval processing can be performed in units of objects in each frame as minimum units.
- It is an object of the present invention to provide a method of retrieving a video picture and apparatus therefor, which are designed to process a video picture using MPEG4 as a video coding scheme, detect the size, shape, and motion of each object and its position in a picture by using the shape information of each object of a coded bit stream based on MPEG4, and can perform high-precision video picture retrieval by using these information without using any complicated signal processing unit.
- According to the present invention, there is provided a method of retrieving a video picture, comprising decoding a coded bit stream of video data representing an arbitrary shape object and including shape information and texture information, and supplying a retrieval condition for retrieval of a desired picture and retrieving a picture meeting the retrieval condition by using shape information of the object decoded by the above decoding.
- According to the present invention, there is provided an apparatus for retrieving a video picture, comprising a decoder section which decodes a coded bit stream of video picture data representing an arbitrary shape object and including shape information and texture information, a retrieval condition input section which inputs a retrieval condition for retrieval of a desired picture, and a retrieval section which retrieves a picture meeting the retrieval condition by using shape information of the object decoded by the decoder section.
- According to the present invention, there is provided a method of retrieving a video picture, comprising decoding a coded bit stream of an arbitrary shape object including shape information and texture information and corresponding to video data coded by MPEG4 when retrieving a desired picture from the video data coded by MPEG4, retrieving a video picture meeting a supplied retrieval condition for retrieval of a desired video picture by using shape information of a decoded object, and presenting a retrieved result.
- According to the present invention, there is provided a video picture retrieving apparatus for retrieving a desired picture from video data coded by MPEG4, comprising a decoder section for decoding a coded bit stream of an arbitrary shape object which includes shape information and texture information and corresponds to video data coded by MPEG4, a retrieval condition input section which inputs a retrieval condition for retrieval of a desired picture, a retrieval section which retrieves a video picture meeting a retrieval condition by using the shape information of the object decoded by the decoder section, and an output section which presents a retrieved result obtained by the retrieval section.
- According to the present invention, there is provided a method of retrieving a video picture and apparatus therefor, which can perform sophisticated video picture retrieval in consideration of the contents of a picture by using shape information (size, shape, motion, and position in a picture) of each object without using any complicated signal processing section.
- Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.
- The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate presently preferred embodiments of the invention, and together with the general description given above and the detailed description of the preferred embodiments given below, serve to explain the principles of the invention.
-
FIG. 1 is a view for explaining a coding area including an object; -
FIGS. 2A to 2C are views for explaining the arrangement of coded shape data in detail; -
FIG. 3 is a view for explaining the attribute of each macroblock; -
FIG. 4 is a block diagram for explaining an outline of an MPEG4 system; -
FIG. 5 is a block diagram showing the schematic arrangement of a video picture retrieving apparatus according to an embodiment of the present invention; -
FIG. 6 is a block diagram showing the schematic arrangement of a video picture retrieving apparatus having a display section for synthesizing objects and displaying the resultant information according to the second embodiment of the present invention; and -
FIGS. 7A and 7B are flowcharts showing two processes of providing a retrieved result to a user. - A video picture retrieving apparatus according to an embodiment of the present invention will be described below with reference to the views of the accompanying drawing.
- A shape information coding technique used in the present invention will be briefly described first.
- A shape information coding method in MPEG4 is described in “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, pp. 1984-1986, 1997.
- According to this reference, in MPEG4 as an international standard video coding scheme, picture information is coded in units of macroblocks each containing shape information in addition to texture information as video data. In this case, a macroblock is one of the blocks obtained by dividing picture information in “16×16” pixels. This picture information is expressed by binary data called an alpha-map prepared as information indicating the shape and distribution of an object in texture information as video data.
- A coding area containing an object in MPEG4 will be described with reference to
FIG. 1 . As shown inFIG. 1 , a coding area (called a Bounding-Box or Bounding-Rectangle) containing an object (called a VOP (Video Object Plane) in MPEG4) as a coding target is set in a picture (frame), and this area is divided into “16×16”-pixel blocks. The object is then coded in units of “16×16”-pixel blocks, i.e., macroblocks. - In this case, the sizes (vop_width, vop_height) and position vectors (spatial_reference (vop_horizontal_mcspatial_ref, vop_vertical_mc_spatial_ref)) of Bounding-Boxes are coded in units of VOPs.
-
FIGS. 2A, 2B , and 2C show the format of coded data. This format corresponds to the arrangement of coded data in MPEG4 (see “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, p. 1974, 1997). According to the format shown inFIG. 2A , the header information of each frame is written, and a macroblock follows this header information. As shown inFIG. 2B , each macroblock includes shape information A1, motion vector information A2, and DCT coefficient information A3. As shown inFIG. 2C , the shape information A1 includes mode information S1, shape motion vector information S2, and coded binary picture information S3. - The mode information S1 is information indicating the attribute of each macroblock. The shape motion vector information S2 is motion vector information for motion compensation prediction of the shape of each macroblock. The coded binary picture information S3 is information obtained by handling the detailed shape of each macroblock as a binary picture and coding the binary picture.
- The attribute of each macroblock will be described next with reference to
FIG. 3 . Macroblocks are classified into three types, i.e., a “transparent macroblock” in which the 16×16 pixels include no object pixel; an “opaque macroblock” in which all the 16×16 pixels are object pixels; and a “boundary macroblock in which some of the 16×16 pixels are object pixels. - In MPEG4, the mode information S1 of the shape information A1 is used to define the following seven modes:
(mode 1; MOOD 1) transparent (mode 2; MOOD 2) opaque ( mode 3; MOOD 3)coded binary picture (intraframe) (mode 4; MOOD 4) motion compensation (MV = 0) (mode 5; MOOD 5) motion compensation (MV = 0) + coded binary picture (interframe) (mode 6; MOOD 6) motion compensation (MV ≠ 0) (mode 7; MOOD 7) motion compensation (MV ≠ 0) + coded binary picture (interframe) - MOOD 1 indicates that the macroblock is a transparent macroblock. MOOD 2 indicates that the macroblock is an opaque macroblock.
MOOD 3 indicates that the macroblock is coded binary picture (intraframe) information. MOOD 4 indicates the macroblock is zero motion vector information (MV=0). MOOD 5 indicates that the macroblock is constituted by zero motion vector information and non-zero motion vector information (MV≠0). MOOD 7 indicates that the macroblock is constituted by non-zero motion vector information and coded binary picture (interframe) information. - The shape motion vector information S2 appears when mode 6 (MOOD 6) and mode 7 (MOOD 7) are set. The coded binary picture information S3 appears when mode 3 (MOOD 3) and mode 7 (MOOD 7) are set.
- According to the present invention, a target scene designated by a user is retrieved by using such mode information and shape motion vector information in shape information in MPEG4.
- An MPEG4 system is disclosed in “Standardization Trends in MPEG4 for Multimedia”, The Journal of The Institute of Image Information and Television Engineers, Vol. 51, No. 12, p. 1962, 1997. An outline of the MPEG4 system will be briefly described below. The MPEG4 system has an arrangement like the one shown in
FIG. 4 . - As shown in
FIG. 4 , in the MPEG4 system, a coder apparatus is comprised of a videoobject coder section 11 for coding a video object, an audioobject coder section 12 for coding an audio object, a scene descriptionobject coder section 13 for coding a scene description object, and amedia multiplexer section 14 for multiplexing and transmitting these coded objects. - A decoder apparatus is comprised of a
media demultiplexer section 15, a videoobject decoder section 16, an audioobject decoder section 17, a scene descriptionobject decoder section 18, and anobject reconstruction section 19. Themedia demultiplexer section 15 demultiplexes the multiplex data transmitted from the coder apparatus to obtain the original video object, an audio object, and a scene description object. The videoobject decoder section 16 decodes the coded video object demultiplexed by themedia demultiplexer section 15 into the original video object. The audioobject decoder section 17 decodes the coded audio object demultiplexed by themedia demultiplexer section 15 into the original audio object. The scene descriptionobject decoder section 18 decodes the coded scene description object demultiplexed by themedia demultiplexer section 15 into the original scene description object. Theobject reconstruction section 19 synthesizes the video and audio objects in accordance with the scene description object to reconstruct the picture to be displayed. - In the arrangement shown in
FIG. 4 , the supplied video and audio objects and the like are respectively coded by the correspondingcoder sections media multiplexer section 14 multiplexes these coded objects with the scene description object, which is obtained by the scene descriptionobject coder section 13 and describes how the respective objects are synthesized and provided to a user. The multiplex bit stream is then transmitted or stored. - On the decoder apparatus side, the
media demultiplexer section 15 demultiplexes this transmitted or stored bit stream into the respective objects. These objects are then reconstructed into the original objects by the correspondingobject decoder sections section 19 synthesizes these objects in accordance with the scene description, and the display section presents the resultant information to the user. - The present invention will be described below with reference to the views of the accompanying drawing in consideration of the above outline of the MPEG4 system.
- A video picture retrieving apparatus according to an embodiment of the present invention has the arrangement shown in
FIG. 5 . More specifically, the video picture retrieving apparatus is basically comprised of adecoder section 101, aretrieval section 102, a retrievedresult output section 103, and a retrieval keyinformation input section 104. Of these components, thedecoder section 101 serves to decode shape information. Thedecoder section 101 decodes the coded bit stream of an arbitrary shape object supplied through a coded bitstream input line 105 into shape information, and outputs the decoded shape information to a decodedinformation output line 106. - The
retrieval section 102 retrieves the picture or scene desired by the user from the shape information supplied through the decodedinformation output line 106. More specifically, when the user inputs conditions and the like for a desired picture or scene with the retrieval keyinformation input section 104, the information is supplied as retrieval key information to theretrieval section 102 through a retrieval keyinformation input line 107. Theretrieval section 102 compares this retrieval key information with the shape information from thedecoder section 101 to retrieve the desired picture or scene defined by the retrieval key information, and outputs the retrieved result to the retrievedresult output section 103. The retrievedresult output section 103 is, for example, a display or printer, and presents the retrieved result from theretrieval section 102 to the user. - There will now be described the flow of the above processing in conjunction with
FIGS. 7A and 7B . - First, as shown in
FIGS. 5 and 7 A, key information input by a user via the retrieval key information input section 104 (step F1). The bit stream is decoded every frame or every several frames (step F2). A desired scene is retrieved by theretrieval section 102, using the key information obtained in step F1 and the decoded result obtained in step F2 (step F3). When the retrieval result is obtained, the processing advances to step F4 wherein the retrievalresult output section 103 provides the retrieved result. When the retrieval result is not obtained, the processing returns to step F2 to restart the decoding of the bit stream. - After the retrieved result is provided in step F4, it is determined in step F5 whether or not the entire bit stream has been decoded in step F2. If the entire bit stream is decoded, the processing is terminated.
- In step F5, even when the user forcefully terminates a processing, the decoding is determined as having been completed. In this case, the processing may be cut off.
- In
FIG. 7A embodiment, the retrieved results are sequentially provided. In contrast,FIG. 7B embodiment provides the retrieved results together after the completion of decoding of the bit stream. In other words, first, key information is provided by a user via the retrieval key information input section (step F6). The bit stream is decoded every frame or every several frames in the decoder section 101 (step F7). - In step 8, a desired seine is retrieved by the
retrieval section 102 using the key information obtained in step F6 and the decoded result obtained in step F7. When the retrieval result is obtained, the processing advances to step F9, while when the retrieval result is not obtained, the processing returns to step F7 to restart decoding of the bit stream. - In step F9, the indexes (e.g., the number (or time information) of the top frame of a scene obtained as a result) indicating the retrieved results are sequentially created by the
retrieval section 102. The indexes are stored in theretrieval section 102 until they are requested by the retrievedresult provider section 102. - In step F10, it is determined whether or not the decoding of the entire bit stream is completed in step F7. When the decoding of the entire bit stream is not completed, the processing returns to the step F7 to restart the decoding, while if the entire bit stream has been decoded, the processing is terminated. Even when the user forcefully terminates a processing in step 10, the decoding is determined as having been completed. In this case, the processing may be cut off.
- This system having such an arrangement executes retrieval processing by using alpha-map data of the video data compressed/coded by MPEG4. The video data compressed/coded by MPEG4 has a picture component and an alpha-map information component obtained by binarizing an object shape or position information in the picture. The alpha-map information is therefore sent as the shape information A1 having the format shown in
FIG. 2C . This information is used for retrieval processing. - In this video picture retrieving apparatus, the coded bit stream of an arbitrary shape object as information of an alpha-map is supplied to the
decoder section 101 through the coded bitstream input line 105. Thedecoder section 101 decodes the coded bit stream into the shape information A1 and supplies the decoded shape information A1 to theretrieval section 102 through the decodedinformation output line 106. Theretrieval section 102 compares the retrieval key information supplied from the user through the retrieval keyinformation input line 107 with the shape information A1 supplied through the decodedinformation output line 106 to retrieve a desired picture or scene. - Assume that a given motion picture is compressed/coded by MPEG4, and the user wants to retrieve a picture of a close-up scene of a given character in the motion picture. In this case, if the user knows the overall contents of the motion picture and the picture layout of the desired scene, a user inputs information, e.g., the approximate size and location of the character in a picture, with the retrieval key information input section 104 (an input terminal, operation unit (not shown), or the like). This information is input as retrieval key information to the
retrieval section 102 through retrieval keyinformation input line 107. - The retrieved
result output section 103 compares the retrieval key information from the user with the shape information A1 sequentially supplied through the decodedinformation output line 106 to search for information similar to the retrieval key information. If such information is present, the information is supplied to the retrievedresult output section 103 through a retrievedresult output line 108. The information is then presented by the retrievedresult output section 103. That is, the information is displayed or printed. This presented information is a reconstructed picture of MPEG4 at this time. Upon seeing this picture, the user can know whether the picture is the target picture. - Note that the
decoder section 101 may decode only the shape information A1 of the arbitrary shape object and retrieve the information instead of decoding all the object data. - A method of using only some of the three types of shape information A1 in MPEG4, i.e., “mode information S1”, “shape motion vector S2”, and “coded binary picture information S3”, is also available. Consider a case wherein information indicating the approximate position of a target object in a picture is supplied from the user as retrieval key information. In this case, since the target object is present in macroblocks in each of which the mode information S1 is set to one of mode 2 (MOOD 2) to mode 7 (MOOD 7), the
retrieval section 102 may extract a picture in which mode 2 (MOOD 2) to mode 7 (MOOD 7) are distributed to almost coincide with the retrieval key information without completely reconstructing the shape. - For example, for a scene corresponding to a request to retrieve “a close-up scene” from the user can be retrieved by searching for a scene in which the number of macroblocks corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7) gradually increases for every frame. A scene corresponding to a request to retrieve a scene including two objects can be retrieved by searching for a scene in which macroblock corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7) can be grouped into two sets.
- Assume that a predetermined scene continues, and a given frame is selected as a representative frame of the scene from a plurality of frames constituting the scene. In this case, when the
retrieval section 102 retrieves a frame having the maximum number of macroblocks corresponding to mode 2 (MOOD 2) to mode 7 (MOOD 7), the retrievedresult output section 103 may display a close-up of the target object. In addition, the size of the object may be estimated by decoding at least the value of the size (the values of vop-width, vop-height) of Bounding-Box and the value of the position (spatial_reference) thereof. In this case, the information of reconstructed Bounding-Box is output from theline 106 shown inFIG. 5 . - According to the method of the above embodiment, when a picture including a target object is to be retrieved, and the user knows the approximate position of the object in the picture, the picture can be retrieved by determining the position of the object in the picture in accordance with mode information contained in shape information in a data format conforming to MPEG4. If, however, the more approximate position of a target object is allowed, a target picture may be determined by decoding only a position vector.
- A picture can also be retrieved by using state information as key information, e.g., information indicating that the object is gradually crushed in the vertical direction or information indicating that the shape abruptly changes. That is, by retrieving a target picture using state information as key information, the user can search out the corresponding picture.
- In MPEG4, the shape motion vector S2 indicates how the shape changes with time. If, therefore, key information indicating that an object is gradually crushed in the vertical direction is supplied, a corresponding motion vector may be searched out. If key information indicating that a shape abruptly changes is supplied, a scene whose motion vector abruptly changes may be searched out.
- The above retrieving method is used when the state of a picture is known. However, a target object or picture can be retrieved by using a camera parameter as retrieval information. A corresponding embodiment will be described below.
- Although a target object or picture is retrieved by using a camera parameter as retrieval information, since MPEG4 has no camera parameter as information, a camera parameter is estimated from a picture. When a camera parameter is supplied as retrieval key information, the
retrieval section 102 estimates a camera parameter from shape information (alpha-map) of MPEG4, and a picture is retrieved by using the estimated camera parameter as retrieval key information. This case will be described below as the first example. - In MPEG4, since shape information (alpha-map) is prepared, and the alpha-map is made up of a plurality of macroblocks, mode information of each of these macroblocks is used. More specifically, a zoom parameter for the camera can be estimated by obtaining a state in which the size of an object changes with time on the basis of the number of macroblocks of mode 2 (MOOD 2) to mode 7 (MOOD 7) or the value of (vop_width, vop_height).
- In addition, a pan/tilt parameter for the camera can be estimated by obtaining a change in the position of an object with time on the basis of shape motion vector information or position vector (spatial_reference).
- A method of obtaining a camera parameter will be described in detail below as the second example.
- To obtain a more precise camera parameter than that in the first example, decoded shape information is deformed by affine transform to perform matching between frames. With this operation, detailed camera parameters, such as “zoom”, “pan”, and “tilt”, can be obtained.
- The amount of processing for matching can be reduced by using only decoded pixel values in “boundary macroblocks” instead of using all the pixel values of decoded shape information.
- In the case described above, a camera parameter is estimated from shape information (alpha-map) of MPEG4, and a picture is retrieved by using the estimated camera parameter as retrieval key information. MPEG4 uses a technique of writing a scenario indicating how a target object in a picture is developed, and developing the picture according to the contents of the scenario. This scenario is implemented by information called a scene description object. The third example in which a target picture is retrieved from the information of this scene description object will be described next.
-
FIG. 6 shows a selecting section for selecting a representative frame as a unit for presenting a retrieval result from information of a scene description object. This selecting section includes a scene descriptionobject output section 201, objectsynthesis section 202, anddisplay section 203. In this case, the scene descriptionobject output section 201 outputs information as a scenario which has been written by a contents producer to designate the composition of a picture. - In a coding scheme based on MPEG4, a plurality of objects (for example, objects A and B in
FIG. 6 ) are generally reconstructed by thedecoder section 101 which has decoded a bit stream. These objects are synthesized in accordance with the scene descriptionobject output section 201. Thereafter, the resultant object is supplied to thedisplay section 203 to be presented to the user. In this manner, theobject synthesis section 202 synthesizes objects and outputs the resultant object. - In this case, the data from the scene description
object output section 201 is multiplexed with data of another object and supplied. Thedisplay section 203 may be identical to the retrievedresult output section 103 or not. - In the third example, when a given frame is to be selected from a predetermined scene to be displayed as a representative frame of the scene on the retrieved
result output section 103, the scene description object decoded by the scene descriptionobject decoder section 18 on the decoder apparatus side is supplied from the scene descriptionobject output section 201 to theobject synthesis section 202 through a scene descriptionobject input line 204. - The
object synthesis section 202 analyzes the information (e.g., “enlarging and displaying object B” or “synthesizing object A with the foreground of object B”) of a scene description object to search for a frame coinciding with a predetermined condition, and sets the frame as a representative frame. - The above “predetermined condition” is, for example, a condition indicating that when a specific object is closed up, the area of the object is computed and a frame corresponding to the maximum area of the object is set as a representative frame in the
object synthesis section 202. - As described above, according to the present invention, by using shape information (size, shape, motion, and position in a picture) of an object, sophisticated video picture retrieval can be implemented in consideration of the contents of a picture without requiring any complicated signal processing unit.
- In the above embodiments, the retrieval operation is performed using the shape information provided in the macroblock. However, the retrieval operation may be performed using the header information. In this case, the header block includes information spatial_reference, vop_width, and vop-height shown in
FIG. 1 . The retrieval operation is performed on the basis of the above information of the header. In other words, the video picture may be retrieved using a position of the object within the frame which is indicated by the information vop-reference, a horizontal size of the object which is indicated by the information vop-width, a vertical size of the object which is indicated by the information vop-height, and an area of the bounding box surrounding the object which is indicated by vop-width and vop-height. - As has been described in detail, according to the present invention, by using shape information (size, shape, motion, and position in a picture) of an object, there is provided a video picture retrieving method and apparatus which can implement sophisticated video picture retrieval in consideration of the contents of a picture without requiring any complicated signal processing unit.
- Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Claims (6)
1-10. (canceled)
11. A method of retrieving a video picture, comprising the steps of:
decoding a coded bit stream of video data representing an arbitrary shape object and including shape information, texture information, and scene description information of a scene description object;
synthesizing a plurality of decoded objects in accordance with the scene description information;
displaying a synthesized scene obtained by the synthesizing step;
analyzing the scene description information;
comparing the information with a predetermined condition; and
selecting a frame meeting a predetermined condition as a representative frame.
12-21. (canceled)
22. An apparatus for retrieving a video picture, comprising:
a decoder section which decodes a coded bit stream of video data representing an arbitrary shape object and including shape information, texture information, and information of a scene description object;
a synthesizing section which synthesizes a plurality of decoded objects in accordance with the information of the scene description object;
a display unit which displays the synthesized scene; and
a selecting section which analyzes the scene description information, compares the information with a predetermined condition, and selects a frame meeting the predetermined condition as a representative frame.
23. The method according to claim 11 , wherein the predetermined condition is a condition indicating that when a specific object is closed up, a frame corresponding to a maximum area of the object is set as a representative frame.
24. The apparatus according to claim 22 , wherein the predetermined condition is a condition indicating that when a specific object is closed up, a frame corresponding to a maximum area of the object is set as a representative frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/328,054 US20060109908A1 (en) | 1998-07-31 | 2006-01-10 | Method of retrieving video picture and apparatus therefor |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP21740898A JP2000050258A (en) | 1998-07-31 | 1998-07-31 | Video retrieval method and video retrieval device |
JP10-217408 | 1998-07-31 | ||
US09/363,881 US7020192B1 (en) | 1998-07-31 | 1999-07-30 | Method of retrieving video picture and apparatus therefor |
US11/328,054 US20060109908A1 (en) | 1998-07-31 | 2006-01-10 | Method of retrieving video picture and apparatus therefor |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/363,881 Division US7020192B1 (en) | 1998-07-31 | 1999-07-30 | Method of retrieving video picture and apparatus therefor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060109908A1 true US20060109908A1 (en) | 2006-05-25 |
Family
ID=16703740
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/363,881 Expired - Fee Related US7020192B1 (en) | 1998-07-31 | 1999-07-30 | Method of retrieving video picture and apparatus therefor |
US11/328,054 Abandoned US20060109908A1 (en) | 1998-07-31 | 2006-01-10 | Method of retrieving video picture and apparatus therefor |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/363,881 Expired - Fee Related US7020192B1 (en) | 1998-07-31 | 1999-07-30 | Method of retrieving video picture and apparatus therefor |
Country Status (3)
Country | Link |
---|---|
US (2) | US7020192B1 (en) |
EP (1) | EP0977135A3 (en) |
JP (1) | JP2000050258A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809248A (en) * | 2015-05-18 | 2015-07-29 | 成都索贝数码科技股份有限公司 | Video fingerprint extraction and retrieval method |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2391677B (en) * | 1999-07-05 | 2004-05-12 | Mitsubishi Electric Inf Tech | Method and apparatus for representing and searching for an object in an image |
CN1178512C (en) * | 2000-05-15 | 2004-12-01 | 松下电器产业株式会社 | Image decoding method and device and image decoding program recording medium |
JP2001333389A (en) * | 2000-05-17 | 2001-11-30 | Mitsubishi Electric Research Laboratories Inc | Video reproduction system and method for processing video signal |
US6724933B1 (en) * | 2000-07-28 | 2004-04-20 | Microsoft Corporation | Media segmentation system and related methods |
US6774908B2 (en) * | 2000-10-03 | 2004-08-10 | Creative Frontier Inc. | System and method for tracking an object in a video and linking information thereto |
JP3636983B2 (en) | 2000-10-23 | 2005-04-06 | 日本放送協会 | Encoder |
US20030098869A1 (en) * | 2001-11-09 | 2003-05-29 | Arnold Glenn Christopher | Real time interactive video system |
KR100682911B1 (en) * | 2004-12-31 | 2007-02-15 | 삼성전자주식회사 | Method and apparatus for MPEG-4 encoding/decoding |
US8341152B1 (en) | 2006-09-12 | 2012-12-25 | Creatier Interactive Llc | System and method for enabling objects within video to be searched on the internet or intranet |
JP5013840B2 (en) * | 2006-12-12 | 2012-08-29 | ヤフー株式会社 | Information providing apparatus, information providing method, and computer program |
JP2008278467A (en) * | 2007-03-30 | 2008-11-13 | Sanyo Electric Co Ltd | Image processing apparatus, and image processing method |
CA2689065C (en) * | 2007-05-30 | 2017-08-29 | Creatier Interactive, Llc | Method and system for enabling advertising and transaction within user generated video content |
CA2820461A1 (en) * | 2010-12-10 | 2012-06-14 | Delta Vidyo, Inc. | Video stream presentation system and protocol |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5978034A (en) * | 1997-02-20 | 1999-11-02 | Sony Corporation | Moving picture encoding method and apparatus, moving picture decoding method and apparatus and recording medium |
US5978048A (en) * | 1997-09-25 | 1999-11-02 | Daewoo Electronics Co., Inc. | Method and apparatus for encoding a motion vector based on the number of valid reference motion vectors |
US6011872A (en) * | 1996-11-08 | 2000-01-04 | Sharp Laboratories Of America, Inc. | Method of generalized content-scalable shape representation and coding |
US6049567A (en) * | 1997-10-14 | 2000-04-11 | Daewoo Electronics Co., Ltd. | Mode coding method in a binary shape encoding |
US6055330A (en) * | 1996-10-09 | 2000-04-25 | The Trustees Of Columbia University In The City Of New York | Methods and apparatus for performing digital image and video segmentation and compression using 3-D depth information |
US6301303B1 (en) * | 1997-03-20 | 2001-10-09 | Hyundai Electronics Industries Co. Ltd. | Method and apparatus for predictively coding shape information of video signal |
US6377309B1 (en) * | 1999-01-13 | 2002-04-23 | Canon Kabushiki Kaisha | Image processing apparatus and method for reproducing at least an image from a digital data sequence |
US6381277B1 (en) * | 1997-12-12 | 2002-04-30 | Hyundai Electronics Ind. Co, Ltd. | Shaped information coding device for interlaced scanning video and method therefor |
US6389168B2 (en) * | 1998-10-13 | 2002-05-14 | Hewlett Packard Co | Object-based parsing and indexing of compressed video streams |
US6408025B1 (en) * | 1997-01-31 | 2002-06-18 | Siemens Aktiengesellschaft | Method and configuration for coding and decoding digitized pictures |
US6665318B1 (en) * | 1998-05-15 | 2003-12-16 | Hitachi, Ltd. | Stream decoder |
US6711379B1 (en) * | 1998-05-28 | 2004-03-23 | Kabushiki Kaisha Toshiba | Digital broadcasting system and terminal therefor |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6055335A (en) | 1994-09-14 | 2000-04-25 | Kabushiki Kaisha Toshiba | Method and apparatus for image representation and/or reorientation |
US6335985B1 (en) | 1998-01-07 | 2002-01-01 | Kabushiki Kaisha Toshiba | Object extraction apparatus |
-
1998
- 1998-07-31 JP JP21740898A patent/JP2000050258A/en active Pending
-
1999
- 1999-07-30 US US09/363,881 patent/US7020192B1/en not_active Expired - Fee Related
- 1999-07-30 EP EP19990306075 patent/EP0977135A3/en not_active Withdrawn
-
2006
- 2006-01-10 US US11/328,054 patent/US20060109908A1/en not_active Abandoned
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6055330A (en) * | 1996-10-09 | 2000-04-25 | The Trustees Of Columbia University In The City Of New York | Methods and apparatus for performing digital image and video segmentation and compression using 3-D depth information |
US6011872A (en) * | 1996-11-08 | 2000-01-04 | Sharp Laboratories Of America, Inc. | Method of generalized content-scalable shape representation and coding |
US6408025B1 (en) * | 1997-01-31 | 2002-06-18 | Siemens Aktiengesellschaft | Method and configuration for coding and decoding digitized pictures |
US5978034A (en) * | 1997-02-20 | 1999-11-02 | Sony Corporation | Moving picture encoding method and apparatus, moving picture decoding method and apparatus and recording medium |
US6301303B1 (en) * | 1997-03-20 | 2001-10-09 | Hyundai Electronics Industries Co. Ltd. | Method and apparatus for predictively coding shape information of video signal |
US5978048A (en) * | 1997-09-25 | 1999-11-02 | Daewoo Electronics Co., Inc. | Method and apparatus for encoding a motion vector based on the number of valid reference motion vectors |
US6049567A (en) * | 1997-10-14 | 2000-04-11 | Daewoo Electronics Co., Ltd. | Mode coding method in a binary shape encoding |
US6381277B1 (en) * | 1997-12-12 | 2002-04-30 | Hyundai Electronics Ind. Co, Ltd. | Shaped information coding device for interlaced scanning video and method therefor |
US6665318B1 (en) * | 1998-05-15 | 2003-12-16 | Hitachi, Ltd. | Stream decoder |
US6711379B1 (en) * | 1998-05-28 | 2004-03-23 | Kabushiki Kaisha Toshiba | Digital broadcasting system and terminal therefor |
US6389168B2 (en) * | 1998-10-13 | 2002-05-14 | Hewlett Packard Co | Object-based parsing and indexing of compressed video streams |
US6377309B1 (en) * | 1999-01-13 | 2002-04-23 | Canon Kabushiki Kaisha | Image processing apparatus and method for reproducing at least an image from a digital data sequence |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809248A (en) * | 2015-05-18 | 2015-07-29 | 成都索贝数码科技股份有限公司 | Video fingerprint extraction and retrieval method |
Also Published As
Publication number | Publication date |
---|---|
EP0977135A3 (en) | 2003-11-26 |
EP0977135A2 (en) | 2000-02-02 |
US7020192B1 (en) | 2006-03-28 |
JP2000050258A (en) | 2000-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060109908A1 (en) | Method of retrieving video picture and apparatus therefor | |
US9420309B2 (en) | Generalized scalability for video coder based on video objects | |
US6483874B1 (en) | Efficient motion estimation for an arbitrarily-shaped object | |
US6400768B1 (en) | Picture encoding apparatus, picture encoding method, picture decoding apparatus, picture decoding method and presentation medium | |
US6466624B1 (en) | Video decoder with bit stream based enhancements | |
US6404814B1 (en) | Transcoding method and transcoder for transcoding a predictively-coded object-based picture signal to a predictively-coded block-based picture signal | |
US6259828B1 (en) | Sprite-based video coding system with automatic segmentation integrated into coding and sprite building processes | |
KR101177663B1 (en) | Method and system for digital decoding 3d stereoscopic video images | |
US6507618B1 (en) | Compressed video signal including independently coded regions | |
US6993201B1 (en) | Generalized scalability for video coder based on video objects | |
US7084877B1 (en) | Global motion estimation for sprite generation | |
JP2000023193A (en) | Method and device for picture encoding, method and device for picture decoding and provision medium | |
JP2000023194A (en) | Method and device for picture encoding, method and device for picture decoding and provision medium | |
US8571101B2 (en) | Method and system for encoding a video signal, encoded video signal, method and system for decoding a video signal | |
US11915390B2 (en) | Image processing device and method | |
Kim et al. | Efficient patch merging for atlas construction in 3DoF+ video coding | |
Teixeira et al. | Video compression: The mpeg standards | |
JP3439146B2 (en) | Method and apparatus for extracting color difference signal shape information for interlaced scanning video | |
Li et al. | MPEG Video Coding: MPEG-1, 2, 4, and 7 | |
Puri et al. | Performance evaluation of the MPEG-4 visual coding standard | |
Guaragnella et al. | Object oriented motion estimation by sliced-block matching algorithm | |
JP2011193486A (en) | System and method for stereoscopic 3d video image digital decoding | |
O’Connor et al. | Current Developments in MPEG-4 Video | |
Hidalgo | On the Synergy between Indexing and Compression Representations for Video Sequences | |
Cooray | Semi-automatic video object segmentation for multimedia applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |