US20070024756A1 - System and method for defocus difference matting - Google Patents

System and method for defocus difference matting Download PDF

Info

Publication number
US20070024756A1
US20070024756A1 US11/193,742 US19374205A US2007024756A1 US 20070024756 A1 US20070024756 A1 US 20070024756A1 US 19374205 A US19374205 A US 19374205A US 2007024756 A1 US2007024756 A1 US 2007024756A1
Authority
US
United States
Prior art keywords
aperture
scene
image
images
narrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/193,742
Other versions
US7408591B2 (en
Inventor
Wojciech Matusik
Morgan McGuire
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Research Laboratories Inc
Original Assignee
Mitsubishi Electric Research Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Research Laboratories Inc filed Critical Mitsubishi Electric Research Laboratories Inc
Priority to US11/193,742 priority Critical patent/US7408591B2/en
Assigned to MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. reassignment MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MATUSIK, WOJCIECH
Assigned to MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. reassignment MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCGUIRE, MORGAN
Priority to JP2006188917A priority patent/JP2007043686A/en
Publication of US20070024756A1 publication Critical patent/US20070024756A1/en
Application granted granted Critical
Publication of US7408591B2 publication Critical patent/US7408591B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • H04N5/275Generation of keying signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • This invention relates generally to image editing, and more particularly to matting.
  • Matting and compositing are frequently used in image and video editing, 3D photography, and film production. Matting separates a foreground region from an input image by estimating a color F and an opacity a for each pixel in the image. Compositing uses the matte to blend the extracted foreground with a novel background to produce an output image representing a novel scene.
  • the opacity ⁇ measures a ‘coverage’ of the foreground region, due to either partial spatial coverage or partial temporal coverage, i.e., motion blur.
  • the set of all opacity values ⁇ is called the alpha matte, the alpha channel, or simply a matte.
  • the matting problem can be formulated as follows. An image of a foreground against an opaque black background in a scene is ⁇ F. An image of the background without the foreground is B. An alpha image, where each pixel represents a partial coverage of that pixel by the foreground, is ⁇ . The image ⁇ is essentially an image of the foreground object ‘painted’ white, evenly lit, and held against the opaque background. The scale and resolution of the foreground and background images can differ due to perspective foreshortening.
  • Matting is the inverse problem of solving for the unknown values of the variables ( ⁇ , F r , F g , F b , B r , B g , B b ), given the composite image pixel values (I P r , I P g , I P b ), where r, g, and b are color channels.
  • the ‘P’ subscript denotes that Equation (1) holds for a pinhole camera, i.e., where the entire scene is in focus. One can approximate a pinhole camera with a very narrow aperture. Blue screen matting is easier to solve because the background color B is known.
  • Matting is described generally by Smith et al., “Blue screen matting,” Proceedings of the 23 rd Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 259-268; and U.S. Pat. No. 4,100,569, “Comprehensive electronic compositing system,” issued to Vlahos on Jul. 11, 1978.
  • Blue screen matting is the predominant technique in the film and broadcast industry. For example, broadcast studios use blue matting for presenting weather reports. The background is a blue screen, and the foreground region includes the weatherman standing in front of the blue screen. The foreground is extracted, and then superimposed onto a weather map so that it appears that the weatherman is actually standing in front of the map.
  • blue screen matting is costly and not readily available to casual users. Even production studios would prefer a lower-cost and less intrusive alternative.
  • trimaps To determine the distribution of color in the foreground and background regions.
  • a trimap segments an image into background, foreground and unknown pixels.
  • those methods estimate likely values of the foreground and background colors of unknown pixels, and use the colors to solve the matting Equation (1).
  • Bayesian matting and its extension to image sequences, produces the best results in many applications.
  • those methods require manually defined trimaps for key frames. This is tedious for a long image sequences. It is desired to provide a method that does not require user intervention, and that can operate in real-time as an image sequence is acquired.
  • the Poisson matting of Sun et al. solves a Poisson equation for the matte by assuming that the foreground and background are slowly varying. Their method interacts closely with the user by beginning from a manually constructed trimap. They also provide ‘painting’ tools to correct errors in the matte.
  • Difference matting also known as background subtraction, solves for ⁇ and the alpha-multiplied foreground ⁇ F, given background and trimap images, Qian et al., “Video background replacement without a blue screen,” Proceedings of ICIP, vol. 4, 143-146, 1999.
  • difference matting has limited discrimination at the borders of the foreground.
  • Back lighting is a common segmentation method used in many computer vision systems. Back lighting has also been used in image-based rendering systems, Debevec et al., “A lighting reproduction approach to live action compositing,” ACM Transactions on Graphics 21, 3, pp. 547-556, 2002. That method has two drawbacks. First, active illumination is required, and second, incorrect results may be produced near object boundaries because some objects become highly reflective near grazing angles of the light.
  • Another method uses a depth-from-focus system to recover overlapping objects with fractional alphas, Schechner et al, “Separation of transparent layers using focus,” International Journal of Computer Vision, pp. 25-39, 2000. They position a motorized CCD axially behind a lens to acquire images with slightly varying points of focus. Depth is recovered by selecting the image plane location that has the best focused image. That method is limited to static scenes.
  • Another method uses three video streams acquired by three cameras with different depth-of-field and focus and that share the same center of projection to extract mattes for scenes with unconstrained, dynamic backgrounds, McGuire et al., “Defocus Video Matting,” ACM Transactions on Graphics 24, 3, 2003; and U.S. patent application Ser. No. 11/092,376, filed by McGuire et al. on Mar. 29, 2005, “System and Method for Image Matting.”
  • Matting is a process for extracting a high-quality alpha matte and foreground from an image or a video.
  • Conventional techniques require either a known background, e.g., a blue screen, or extensive manual interaction, e.g., manually specified foreground and background regions. Matting is generally under-constrained, because not enough information is obtained when the images are acquired.
  • One embodiment of the invention provides a system and method:for extracting a matte automatically from a video.
  • the video includes sets of pinhole (narrow aperture) images and wide apertures images (frames) that are produced either in parallel or in a time-interleaved manner.
  • the parallel sets of images can be acquired with a camera having two optical systems that have a common center of projection.
  • One optical system has a large depth-of-field to acquire the pinhole images, while the other optical system has a small depth-of-field to acquire the wide apertures images.
  • a single camera can acquire the time-interleaved images using a fast switching aperture.
  • the aperture includes polarizing elements that can rapidly switch between different aperture sizes.
  • the aperture size is manipulated using optical techniques.
  • the aperture does not require any moving parts, and can be switched at rates far exceeding mechanical apertures.
  • FIG. 1 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention
  • FIG. 2 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention
  • FIG. 3A is an exploded diagram of a camera aperture according to an embodiment of the invention.
  • FIG. 3B is a side view of the camera aperture of FIG. 3A ;
  • FIG. 3C is an exploded diagram of a camera aperture according to an embodiment of the invention.
  • FIG. 3D is a side vide of the camera aperture of FIG. 3C ;
  • FIG. 3E is an exploded view of a camera aperture in the form of slits
  • FIG. 3F is a view of an aperture offset from the optical axis
  • FIG. 3G is a view of a camera aperture in the form of a torus
  • FIG. 4 is a diagram of a high frequency background pattern according to an embodiment of the invention.
  • FIG. 5 is a block diagram of a method for extracting a matte according to one embodiment of the invention.
  • FIG. 1 shows a system 100 and method 500 for automatically extracting a matte 141 from a video 110 acquired of a scene 120 according to an embodiment of our invention.
  • the scene 120 includes a background (B) 121 and a foreground (F) 122 .
  • the scene can be a natural, real-word scene illuminated only by ambient light.
  • the video 110 is acquired by a camera 101 including a pinhole (narrow aperture) optical system 102 and a foreground optical system 103 .
  • the optical systems 102 - 103 have a single center of projection on an optical axis 160 , and use a beam splitter 151 .
  • the optical systems are calibrated with respect to each other.
  • the video 110 is provided to a processor 140 performing the method 500 .
  • the video 110 includes sets of images 111 - 112 acquired parallel in time.
  • the set of images I P 111 is acquired at a large depth-of-field 131 , i.e., the images I P 111 are acquired with a very narrow aperture focused at the foreground.
  • the images I P 111 can be approximated using a pinhole camera model, see Equation (1).
  • a corresponding set of wide aperture mages I F 112 is acquired in parallel with a small depth-of-field 132 focused at the foreground.
  • FIG. 2 shows another embodiment of the invention.
  • the camera 201 uses a single optical system, and the images I P and I F of the sets 111 - 112 are serially interleaved in time.
  • pairs of corresponding narrow aperture images I P and wide aperture images I F may not be registered when the scene 120 includes moving objects.
  • a conventional optical flow process can be used to register the sets of images I P and I F .
  • the camera 201 uses a fast switching aperture 300 .
  • Most conventional camera apertures use a mechanical shutter.
  • the speed at which the mechanical shutter can open and close is limited by the weight of the leaves of the aperture and the strengths of the springs and actuators driving the leaves of the aperture. Even if the aperture mechanism were made very strongly, and driven with high-energy actuators, the resulting large mechanical motions would induce vibrations in the camera assembly blurring the image.
  • the aperture 300 can switch size at a frame rate of the camera 201 or higher, e.g., at a rate up to about 10 KHz.
  • the aperture mechanism 300 includes two polarizers 301 - 303 having a first diameter D.
  • the second polarizer 303 has a pinhole or narrow aperture (through hole) 304 having a second diameter d.
  • the polarization of the polarizers 301 - 303 are rotated 90° with respect to each other, as shown. Therefore, light only passes through the pinhole 304 .
  • a polarization rotator element 302 is disposed between the polarizers 301 - 303 .
  • the element 302 rotates the polarization of the light field passing though it by 90° when a voltage (V) is applied to the polarization rotator 302 .
  • V a voltage
  • the element is a ferroelectric liquid crystal.
  • ferroelectric liquid crystal devices can switch from a zero-rotation state to a 90-degree rotation state in less than 10 microseconds with power inputs on the order of a few volts at a fraction of a milliamp.
  • Other polarization rotators are possible to use.
  • a Kerr cell can rotate polarization.
  • Conventional nematic, supertwist liquid crystals can also rotate the polarization.
  • FIG. 3B is an end view of the aperture shown in FIG. 3A .
  • FIGS. 3C and 3D show another embodiment with multiple aperture settings possible.
  • This aperture includes polarizers 321 , 323 , 325 , and polarization rotators 322 and 324 .
  • Two though holes 331 - 332 are shown to provide three aperture settings. It should be understood that any number of fast switching aperture settings can be provided in this manner.
  • the sizes of the apertures decrease in a direction of the light field passing through the aperture.
  • FIG. 3E shows a configuration where the through holes are slits 341 - 342 spaced apart at varying distances. This can be used to separate low (DC) and high (AC) frequency components in a light field.
  • FIG. 3F shows an embodiment where the one or more through holes 351 are offset from the center (optical axis).
  • FIG. 3G show an embodiment where the through hole 461 is a torus.
  • FIG. 5 shows the steps of the basic method.
  • a known background pattern can be used to guarantee that ⁇ B-B F ⁇ is substantially larger than zero, see below.
  • alpha values can be interpolated from the neighboring pixels.
  • known scattered data interpolation methods e.g., push-pull as described by Gortler et al., “The lumigraph,” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, pp. 43-54, 1996, incorporated herein by reference.
  • the background can be illuminated using a projector, or the background can include a known pattern so that the expression used to determine a is well-conditioned.
  • the denominator ⁇ B-B F ⁇ determines the precision of ⁇ .
  • the background can have values from B min to B max .
  • the image B F is a low-pass version of the background.
  • Optimal values for B are B min and B max .
  • the value of the denominator is always 0.5 ⁇ (B max -B min ).
  • B min 0
  • the denominator is reduced to 0.5 ⁇ B max and one bit of alpha precision is lost, e.g., if B max is an 8 bit value, then the maximum precision of alpha is 7 bits.
  • the minimum frequency of the pattern is k ⁇ 0.5 pixels. If the large depth-of-field pixels are aligned with the pattern, then the maximum frequency of the pattern is one pixel. In cases of higher frequencies, different patterns values are averaged. The use of color improves the conditioning of the problem when the pixels are misaligned with pattern transitions. It is desired to shift the pattern for different colors such that the value of the denominator is always large for at least one color.
  • a pattern in one dimension for the color red 401 is shifted by 1 ⁇ 4 in front of the pattern period with respect to the pattern 402 for the blue color and the pattern 403 for the green color.
  • repeated vertical bars 410 of the colors red, white, green/blue, and black are produced to produce the desired high frequency background pattern.
  • the bars are about 2-3 millimeters wide. It should be understood that the pattern can easily be printed on wallpaper for covering an entire sound stage.

Abstract

A method extracts an alpha matte from a video acquired of a scene. A set of pinhole or narrow aperture images IP is acquired of the scene with a camera aperture set to a relatively large depth-of-field. The scene includes a background B and a foreground F. A corresponding set of wide aperture images IF is acquired of the scene with the camera aperture set to a relatively small depth-of-field. The respective pinhole and wide aperture images are combined to extract an alpha matte according to
α=1+(I F −I P)/(B−B F).

Description

    FIELD OF THE INVENTION
  • This invention relates generally to image editing, and more particularly to matting.
  • BACKGROUND OF THE INVENTION
  • Matting and compositing are frequently used in image and video editing, 3D photography, and film production. Matting separates a foreground region from an input image by estimating a color F and an opacity a for each pixel in the image. Compositing uses the matte to blend the extracted foreground with a novel background to produce an output image representing a novel scene. The opacity α measures a ‘coverage’ of the foreground region, due to either partial spatial coverage or partial temporal coverage, i.e., motion blur. The set of all opacity values α is called the alpha matte, the alpha channel, or simply a matte.
  • The matting problem can be formulated as follows. An image of a foreground against an opaque black background in a scene is αF. An image of the background without the foreground is B. An alpha image, where each pixel represents a partial coverage of that pixel by the foreground, is α. The image α is essentially an image of the foreground object ‘painted’ white, evenly lit, and held against the opaque background. The scale and resolution of the foreground and background images can differ due to perspective foreshortening.
  • The notions of an alpha matte, pre-multiplied alpha, and the algebra of composition have been formalized by Porter et al., “Compositing digital images,” in Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 253-259, 1984. They showed that for a pinhole (narrow aperture) camera, the image αF in front of the background image B can be expressed 501 (see FIG. 5) by a linear interpolation:
    I P =αF+(1−α)B   (1)
    where IP is a pinhole (narrow aperture) image, αF is the pre-multiplied image of the foreground against an opaque background, and B is the image of the opaque background in the absence of the foreground.
  • Matting is the inverse problem of solving for the unknown values of the variables (α, Fr, Fg, Fb, Br, Bg, Bb), given the composite image pixel values (IP r , IP g , IP b ), where r, g, and b are color channels. The ‘P’ subscript denotes that Equation (1) holds for a pinhole camera, i.e., where the entire scene is in focus. One can approximate a pinhole camera with a very narrow aperture. Blue screen matting is easier to solve because the background color B is known.
  • Matting is described generally by Smith et al., “Blue screen matting,” Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 259-268; and U.S. Pat. No. 4,100,569, “Comprehensive electronic compositing system,” issued to Vlahos on Jul. 11, 1978.
  • Conventional matting requires a background with known, constant color, which is referred to as blue screen matting. If a digital camera is used, then a green matte is preferred. Blue screen matting is the predominant technique in the film and broadcast industry. For example, broadcast studios use blue matting for presenting weather reports. The background is a blue screen, and the foreground region includes the weatherman standing in front of the blue screen. The foreground is extracted, and then superimposed onto a weather map so that it appears that the weatherman is actually standing in front of the map. However, blue screen matting is costly and not readily available to casual users. Even production studios would prefer a lower-cost and less intrusive alternative.
  • Ideally, one would like to extract a high-quality matte from an image or video with an arbitrary, i.e., unknown, background. This process is known as natural image matting. Recently, there has been substantial progress in this area, Ruzon et al., “Alpha estimation in natural images,” CVPR, vol. 1, pp. 18-25, 2000; Hillman et al., “Alpha channel estimation in high resolution images and image sequences,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 1, pp. 1063-1068, 2001; Chuang et al., “A Bayesian approach to digital matting,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 2, pp. 264-271, 2001; Chuang et al., “Video matting of complex scenes,” ACM Trans. on Graphics 21, 3, pp. 243-248, July, 2002; and Sun et al., “Poisson matting,” ACM Trans. on Graphics, August 2004. Unfortunately, all of those methods require substantial manual intervention, which becomes prohibitive for long image sequences and for non-professional users. The difficulty arises because matting from a single image is fundamentally under-constrained.
  • It desired to perform matting using non-intrusive techniques. That is, the scene does not need to be modified. It is also desired to perform the matting automatically. Furthermore, it is desired to provide matting for ‘rich’ natural images, i.e., images with a lot of fine, detailed structure.
  • Most natural image matting methods require manually defined trimaps to determine the distribution of color in the foreground and background regions. A trimap segments an image into background, foreground and unknown pixels. Using the trimaps, those methods estimate likely values of the foreground and background colors of unknown pixels, and use the colors to solve the matting Equation (1).
  • Bayesian matting, and its extension to image sequences, produces the best results in many applications. However, those methods require manually defined trimaps for key frames. This is tedious for a long image sequences. It is desired to provide a method that does not require user intervention, and that can operate in real-time as an image sequence is acquired.
  • The prior art estimation of the color distributions works only when the foreground and background are sufficiently different in a neighborhood of an unknown pixel. Therefore, it is desired to provide a method that can extract a matte where the foreground and background pixels have substantially similar color distributions.
  • The Poisson matting of Sun et al. solves a Poisson equation for the matte by assuming that the foreground and background are slowly varying. Their method interacts closely with the user by beginning from a manually constructed trimap. They also provide ‘painting’ tools to correct errors in the matte.
  • An unassisted, natural video matting system is described by Zitnick et al., “High-quality video view interpolation using a layered representation,” ACM Trans. on Graphics 23, 3, pp. 600-608, 2004. They acquire videos with a horizontal row of eight cameras spaced over about two meters. They measure depth discrepancies from stereo disparity using sophisticated region processing, and then construct a trimap from the depth discrepancies. The actual matting is determined by the Bayesian matting of Chuang et al. However, that method has the view dependent problems that are unavoidable with stereo cameras, e.g., reflections, specular highlights, and occlusions. It is desired to avoid view dependent problems.
  • Difference matting, also known as background subtraction, solves for α and the alpha-multiplied foreground αF, given background and trimap images, Qian et al., “Video background replacement without a blue screen,” Proceedings of ICIP, vol. 4, 143-146, 1999. However, difference matting has limited discrimination at the borders of the foreground.
  • Another method uses back lighting to determine the matte. Back lighting is a common segmentation method used in many computer vision systems. Back lighting has also been used in image-based rendering systems, Debevec et al., “A lighting reproduction approach to live action compositing,” ACM Transactions on Graphics 21, 3, pp. 547-556, 2002. That method has two drawbacks. First, active illumination is required, and second, incorrect results may be produced near object boundaries because some objects become highly reflective near grazing angles of the light.
  • Scene reconstruction is described by Favaro et al., “Seeing beyond occlusions (and other marvels of a finite lens aperture),” Proc. of the IEEE Intl. Conf. on Computer Vision and Pattern Recognition, p. 579, 2003. That method uses defocused images and gradient descent minimization of a sum-squared error. The method solves for coarse depth and a binary alpha.
  • Another method uses a depth-from-focus system to recover overlapping objects with fractional alphas, Schechner et al, “Separation of transparent layers using focus,” International Journal of Computer Vision, pp. 25-39, 2000. They position a motorized CCD axially behind a lens to acquire images with slightly varying points of focus. Depth is recovered by selecting the image plane location that has the best focused image. That method is limited to static scenes.
  • Another method uses three video streams acquired by three cameras with different depth-of-field and focus and that share the same center of projection to extract mattes for scenes with unconstrained, dynamic backgrounds, McGuire et al., “Defocus Video Matting,” ACM Transactions on Graphics 24, 3, 2003; and U.S. patent application Ser. No. 11/092,376, filed by McGuire et al. on Mar. 29, 2005, “System and Method for Image Matting.”
  • SUMMARY OF THE INVENTION
  • Matting is a process for extracting a high-quality alpha matte and foreground from an image or a video. Conventional techniques require either a known background, e.g., a blue screen, or extensive manual interaction, e.g., manually specified foreground and background regions. Matting is generally under-constrained, because not enough information is obtained when the images are acquired.
  • One embodiment of the invention provides a system and method:for extracting a matte automatically from a video. The video includes sets of pinhole (narrow aperture) images and wide apertures images (frames) that are produced either in parallel or in a time-interleaved manner.
  • The parallel sets of images can be acquired with a camera having two optical systems that have a common center of projection. One optical system has a large depth-of-field to acquire the pinhole images, while the other optical system has a small depth-of-field to acquire the wide apertures images.
  • A single camera can acquire the time-interleaved images using a fast switching aperture. The aperture includes polarizing elements that can rapidly switch between different aperture sizes. As an advantage, the aperture size is manipulated using optical techniques. Thus, the aperture does not require any moving parts, and can be switched at rates far exceeding mechanical apertures.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention;
  • FIG. 2 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention;
  • FIG. 3A is an exploded diagram of a camera aperture according to an embodiment of the invention;
  • FIG. 3B is a side view of the camera aperture of FIG. 3A;
  • FIG. 3C is an exploded diagram of a camera aperture according to an embodiment of the invention;
  • FIG. 3D is a side vide of the camera aperture of FIG. 3C;
  • FIG. 3E is an exploded view of a camera aperture in the form of slits;
  • FIG. 3F is a view of an aperture offset from the optical axis;
  • FIG. 3G is a view of a camera aperture in the form of a torus;
  • FIG. 4 is a diagram of a high frequency background pattern according to an embodiment of the invention; and
  • FIG. 5 is a block diagram of a method for extracting a matte according to one embodiment of the invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • System Structure
  • FIG. 1 shows a system 100 and method 500 for automatically extracting a matte 141 from a video 110 acquired of a scene 120 according to an embodiment of our invention. The scene 120 includes a background (B) 121 and a foreground (F) 122. The scene can be a natural, real-word scene illuminated only by ambient light.
  • The video 110 is acquired by a camera 101 including a pinhole (narrow aperture) optical system 102 and a foreground optical system 103. The optical systems 102-103 have a single center of projection on an optical axis 160, and use a beam splitter 151. The optical systems are calibrated with respect to each other. The video 110 is provided to a processor 140 performing the method 500.
  • The video 110 includes sets of images 111-112 acquired parallel in time. The set of images IP 111 is acquired at a large depth-of-field 131, i.e., the images IP 111 are acquired with a very narrow aperture focused at the foreground. The images IP 111 can be approximated using a pinhole camera model, see Equation (1). A corresponding set of wide aperture mages IF 112 is acquired in parallel with a small depth-of-field 132 focused at the foreground.
  • FIG. 2 shows another embodiment of the invention. The camera 201 uses a single optical system, and the images IP and IF of the sets 111-112 are serially interleaved in time.
  • Because the images are interleaved in time, pairs of corresponding narrow aperture images IP and wide aperture images IF may not be registered when the scene 120 includes moving objects. In this case, a conventional optical flow process can be used to register the sets of images IP and IF.
  • Camera Aperture
  • Therefore, as shown in FIG. 3A, the camera 201 uses a fast switching aperture 300. Most conventional camera apertures use a mechanical shutter. The speed at which the mechanical shutter can open and close is limited by the weight of the leaves of the aperture and the strengths of the springs and actuators driving the leaves of the aperture. Even if the aperture mechanism were made very strongly, and driven with high-energy actuators, the resulting large mechanical motions would induce vibrations in the camera assembly blurring the image.
  • Most conventional camera apertures are mechanical, and include moving parts. There are two major problems with such apertures. First, the apertures are relatively slow to switch to different depths of field, and second, the rapid movement of the parts causes vibration in the camera body, which adds noise to the images, particularly if the imager is a CCD type of device. Therefore, it is desired to provide a fast switching camera aperture that operates on optical, and not mechanical, principals.
  • The aperture 300 can switch size at a frame rate of the camera 201 or higher, e.g., at a rate up to about 10 KHz. The aperture mechanism 300 includes two polarizers 301-303 having a first diameter D. The second polarizer 303 has a pinhole or narrow aperture (through hole) 304 having a second diameter d. The polarization of the polarizers 301-303 are rotated 90° with respect to each other, as shown. Therefore, light only passes through the pinhole 304.
  • A polarization rotator element 302, also having a diameter D, is disposed between the polarizers 301-303. The element 302 rotates the polarization of the light field passing though it by 90° when a voltage (V) is applied to the polarization rotator 302. For example, the element is a ferroelectric liquid crystal. Thus, when the voltage is applied to the element, the camera 201 has a large aperture diameter D because light passes though all three elements 301-303. Otherwise, absent the voltage, the light only passes through the pinhole 304 having an aperture diameter d. Commercial ferroelectric liquid crystal devices can switch from a zero-rotation state to a 90-degree rotation state in less than 10 microseconds with power inputs on the order of a few volts at a fraction of a milliamp. Other polarization rotators are possible to use. For example a Kerr cell can rotate polarization. Conventional nematic, supertwist liquid crystals can also rotate the polarization.
  • FIG. 3B is an end view of the aperture shown in FIG. 3A.
  • FIGS. 3C and 3D show another embodiment with multiple aperture settings possible. This aperture includes polarizers 321, 323, 325, and polarization rotators 322 and 324. Two though holes 331-332 are shown to provide three aperture settings. It should be understood that any number of fast switching aperture settings can be provided in this manner. The sizes of the apertures decrease in a direction of the light field passing through the aperture.
  • FIG. 3E shows a configuration where the through holes are slits 341-342 spaced apart at varying distances. This can be used to separate low (DC) and high (AC) frequency components in a light field. FIG. 3F shows an embodiment where the one or more through holes 351 are offset from the center (optical axis). FIG. 3G show an embodiment where the through hole 461 is a torus.
  • It should be noted, that if multiple through holes are used, it is actually possible to move the aperture along the optical axis, effectively changing the focal plane. This is not possible with mechanical apertures.
  • It should also be noted that various combinations of different through holes with differences in size, shape and offset from the optical axis can be used.
  • Method Operation
  • When the images are acquired according to the embodiments of the invention, then the following expression 502 (see FIG. 5) approximates the wide aperture images IF 112:
    I F =αF+(1−α)B F,   (2)
    where
    B F =B{circle around (×)}h B,   (3)
    and hB is a point spread function (PSF).
  • Given a known background, we can directly solve for α. Although we begin with a known background, we avoid many of the drawbacks of difference matting by using two sets of images: the pinhole images 111 and the wide aperture images 112.
  • From Equations (1) and (2), we obtain an expression 503 for α 141:
    α=C+(I F −I P)/(B−B F),   (4)
    where C is a constant, e.g., 1.
  • FIG. 5 shows the steps of the basic method.
  • To produce better results, we use:
    α=(B−B F +I F −I P)/(B−B F).   (5)
  • If color images are used, then B, BF, IP, and IF are vectors. Thus, the expression for α is
    α=(∥B−B F +I F −I P∥)/(∥B −B F∥),   (6)
    where ∥.∥ is a length operator for color vectors.
  • Given α we can determine αF using:
    αF=I F+(α−1)B F   (7)
    or
    αF=I P+(α−1)B   (8)
    or
    αF=0.5×(I F +I P+(α−1)(B F +B)).   (9)
    However, the α expression can be ill-conditioned when B=BF. Therefore, alpha values for these pixels can not be determined reliably.
  • There are two possible solutions to this problem. A known background pattern can be used to guarantee that ∥B-BF∥ is substantially larger than zero, see below. Alternatively, alpha values can be interpolated from the neighboring pixels. One can either use a threshold for the denominator ∥B-BF∥ with the alpha values being interpolated from the neighbors when the denominator is less than a threshold, or one can use a confidence map for the value of the denominator. In this context, it is possible to use known scattered data interpolation methods, e.g., push-pull as described by Gortler et al., “The lumigraph,” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, pp. 43-54, 1996, incorporated herein by reference.
  • Matte Background Pattern
  • In many applications, it is necessary to guarantee perfect matting results. In this setting, any type of incorrect matting and compositing is simply not acceptable. In such applications, the background can be illuminated using a projector, or the background can include a known pattern so that the expression used to determine a is well-conditioned.
  • According to Equation (6), the denominator ∥B-BF∥ determines the precision of α. The background can have values from Bmin to Bmax. The image BF is a low-pass version of the background. An optimal low-pass (average) value of the background is:
    B F=0.5×(B min +B max).   (10)
  • Optimal values for B are Bmin and Bmax. Thus, the value of the denominator is always 0.5×(Bmax-Bmin). In the best case, when Bmin=0, the denominator is reduced to 0.5×Bmax and one bit of alpha precision is lost, e.g., if Bmax is an 8 bit value, then the maximum precision of alpha is 7 bits.
  • There are many background patterns that give these optimal results. Here, we described an example background. If the PSF hB is rotationally symmetric, then the pattern can be specified in 1D. Next, we determine allowed frequencies of the pattern.
  • If the size of the PSF hB is k pixels, then the minimum frequency of the pattern is k×0.5 pixels. If the large depth-of-field pixels are aligned with the pattern, then the maximum frequency of the pattern is one pixel. In cases of higher frequencies, different patterns values are averaged. The use of color improves the conditioning of the problem when the pixels are misaligned with pattern transitions. It is desired to shift the pattern for different colors such that the value of the denominator is always large for at least one color.
  • For example, as shown in FIG. 4, a pattern in one dimension for the color red 401 is shifted by ¼ in front of the pattern period with respect to the pattern 402 for the blue color and the pattern 403 for the green color. When the above patterns are superimposed and printed or projected onto a white surface in 2D, repeated vertical bars 410 of the colors red, white, green/blue, and black are produced to produce the desired high frequency background pattern. In an actual implementation, the bars are about 2-3 millimeters wide. It should be understood that the pattern can easily be printed on wallpaper for covering an entire sound stage.
  • Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Claims (19)

1. A method for extracting an alpha matte from a video acquired of a scene, comprising the steps of:
acquiring a set of narrow aperture images IP of a scene with a first camera aperture set to a relatively large depth-of-field and a narrow aperture, the scene including a background B and a foreground F, and in which each narrow aperture image IP is expressed as IP=αF+(1−α)B, where αF is an image in front of a background image B, and α is an alpha matte;
acquiring a corresponding set of wide aperture images IF of the scene with a second camera aperture set to a relatively small depth-of-field and a wide aperture, and in which each wide aperture image IF is expressed as IF=αF+(1−α)BF, where BF=B{circle around (×)}hB, and hB is a point spread function; and
extracting an alpha matte according to α=C+(IF−IP)/(B−BF), where C is a constant.
2. The method of claim 1, in which α=(B−BF+IF−IP)/(B−BF).
3. The method of claim 1, in which C is equal to 1.
4. The method of claim 1, in which the narrow aperture images and the wide aperture images are color images, and the alpha matte is extracted according to a (∥B−BF+IF−IP∥)(∥B−BF∥), where ∥.∥ is a length operator for color vectors.
5. The method of claim 1, in which αF=IF+(α−1)BF.
6. The method of claim 1, in which αF=IP+(α−1)B.
7. The method of claim 1, in which αF=0.5×(IF+IP+(α−1)(BF+B)).
8. The method of claim 1, in which the background includes a known pattern when B=BF to ensure that ∥B−BF∥ is substantially larger than zero.
9. The method of claim 1, in which BF=0.5×(Bmin+Bmax), where Bmin and Bmax are minimum and maximum background values, respectively.
10. The method of claim 1, in which the narrow aperture images and the wide aperture images are acquired in parallel by two optical systems having a common center of projection.
11. The method of claim 1, in which the narrow aperture images are acquired using a pinhole optical system.
12. The method of claim 1, in which the narrow aperture images and the wide aperture images are acquired serially by a single optical system, and the narrow and wide aperture images are interleaved in time.
13. The method of claim 1, in which the scene is natural and illuminated by ambient light.
14. The method of claim 1, in which corresponding the narrow and wide aperture images are registered according to an optical flow.
15. A method for extracting an alpha matte from a video acquired of a scene, comprising the steps of:
acquiring a set of narrow aperture images IP of a scene with a camera aperture set to a relatively large depth-of-field and a narrow aperture, the scene including a background B and a foreground F, and in which each narrow aperture image IP is expressed as IP=αF+(1−α)B, where αF is an image in front of a background image B, and α is an alpha matte;
acquiring a corresponding set of wide aperture images IF of the scene with the camera aperture set to a relatively small depth-of-field and a wide aperture, and in which each wide aperture image IF is expressed as IF=αF+(1−α)BF, where BF=B{circle around (×)}hB, and hB is a point spread function; and
extracting an alpha matte according to α=C+(IF−IP)/(B−BF), where C is a constant.
16. The method of claim 15, in which the acquiring further comprises:
passing a light field emanating from the scene through a first polarizer with a first polarization;
passing the light field through a polarizing rotator while selectively applying a voltage to the polarizing rotator; and
passing the light field through a second polarizer with a polarization offset by 90° from the polariztion of the first polarizer, the second polarizer having a through hole.
17. A system for extracting an alpha matte from a video acquired of a scene, comprising:
a first optical system configured to acquire a set of narrow aperture images IP of a scene with a first camera aperture set to a relatively large depth-of-field and a narrow aperture, the scene including a background B and a foreground F, and in which each narrow aperture image IP is expressed as IP=αF+(1−α)B, where αF is an image in front of a background image B, and α is an alpha matte;
a second optical system configured to acquire a corresponding set of wide aperture images IF of the scene with a second camera aperture set to a relatively small depth-of-field and a wide aperture, and in which each wide aperture image IF is expressed as IF=αF+(1−α)BF, where BF=B{circle around (×)}hB, and hB is a point spread function; and
means for extracting an alpha matte according to αL=C+(IF−Ip)/(B−BF), where C is a constant.
18. A system for extracting an alpha matte from a video acquired of a scene, comprising:
an optical system configured to acquire a set of narrow aperture images IP of a scene with a camera aperture set to a relatively large depth-of-field, the scene including a background B and a foreground F, and in which each narrow aperture image IP is expressed as IP=αF+(1−α)B, where αF is an image in front of a background image B, and α is an alpha matte, and further configured to acquire a corresponding set of wide aperture images IF of the scene with the camera aperture set to a relatively small depth-of-field and a wide aperture, and in which each wide aperture image IF is expressed as IF=αF+(1−α)BF, where BF=B{circle around (×)}hB, and hB is a point spread function; and
means for extracting an alpha matte according to α=C+(IF−IP)/(B−BF), where C is a constant.
19. The system of claim 18, in which the camera aperture comprises:
a first polarizer;
a second polarizer with a polarization offset by 90° from the first polarizer, the second polarizer having a through hole; and
a polarizing rotator disposed between the first polarizer and the second polarizer.
US11/193,742 2005-07-29 2005-07-29 System and method for defocus difference matting Expired - Fee Related US7408591B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/193,742 US7408591B2 (en) 2005-07-29 2005-07-29 System and method for defocus difference matting
JP2006188917A JP2007043686A (en) 2005-07-29 2006-07-10 System and method for extracting alpha matte from video acquired of certain scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/193,742 US7408591B2 (en) 2005-07-29 2005-07-29 System and method for defocus difference matting

Publications (2)

Publication Number Publication Date
US20070024756A1 true US20070024756A1 (en) 2007-02-01
US7408591B2 US7408591B2 (en) 2008-08-05

Family

ID=37693888

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/193,742 Expired - Fee Related US7408591B2 (en) 2005-07-29 2005-07-29 System and method for defocus difference matting

Country Status (2)

Country Link
US (1) US7408591B2 (en)
JP (1) JP2007043686A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090091711A1 (en) * 2004-08-18 2009-04-09 Ricardo Rivera Image Projection Kit and Method and System of Distributing Image Content For Use With The Same
US20100283842A1 (en) * 2007-04-19 2010-11-11 Dvp Technologies Ltd. Imaging system and method for use in monitoring a field of regard
CN104200470A (en) * 2014-08-29 2014-12-10 电子科技大学 Blue screen image-matting method
CN113129814A (en) * 2021-04-23 2021-07-16 浙江博采传媒有限公司 Color correction method and system applied to virtual production of LED (light-emitting diode) ring screen

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7609327B2 (en) * 2006-04-19 2009-10-27 Mitsubishi Electric Research Laboratories, Inc. Polarization difference matting using a screen configured to reflect polarized light
US7630541B2 (en) * 2006-05-30 2009-12-08 Microsoft Corporation Image-wide matting
KR20100051359A (en) * 2008-11-07 2010-05-17 삼성전자주식회사 Method and apparatus for generating of image data
CN103997687B (en) * 2013-02-20 2017-07-28 英特尔公司 For increasing the method and device of interaction feature to video
US9330718B2 (en) 2013-02-20 2016-05-03 Intel Corporation Techniques for adding interactive features to videos

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6496599B1 (en) * 1998-04-01 2002-12-17 Autodesk Canada Inc. Facilitating the compositing of video images
US6538396B1 (en) * 2001-09-24 2003-03-25 Ultimatte Corporation Automatic foreground lighting effects in a composited scene
US6571012B1 (en) * 1998-04-01 2003-05-27 Autodesk Canada Inc. Adjusting a softness region

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6496599B1 (en) * 1998-04-01 2002-12-17 Autodesk Canada Inc. Facilitating the compositing of video images
US6571012B1 (en) * 1998-04-01 2003-05-27 Autodesk Canada Inc. Adjusting a softness region
US6538396B1 (en) * 2001-09-24 2003-03-25 Ultimatte Corporation Automatic foreground lighting effects in a composited scene

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090091711A1 (en) * 2004-08-18 2009-04-09 Ricardo Rivera Image Projection Kit and Method and System of Distributing Image Content For Use With The Same
US8066384B2 (en) 2004-08-18 2011-11-29 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US8632192B2 (en) 2004-08-18 2014-01-21 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US9078029B2 (en) 2004-08-18 2015-07-07 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US9560307B2 (en) 2004-08-18 2017-01-31 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US10084998B2 (en) 2004-08-18 2018-09-25 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US10567718B2 (en) 2004-08-18 2020-02-18 Klip Collective, Inc. Image projection kit and method and system of distributing image content for use with the same
US10986319B2 (en) 2004-08-18 2021-04-20 Klip Collective, Inc. Method for projecting image content
US20100283842A1 (en) * 2007-04-19 2010-11-11 Dvp Technologies Ltd. Imaging system and method for use in monitoring a field of regard
US8937651B2 (en) * 2007-04-19 2015-01-20 Dvp Technologies Ltd. Imaging system and method for use in monitoring a field of regard
CN104200470A (en) * 2014-08-29 2014-12-10 电子科技大学 Blue screen image-matting method
CN113129814A (en) * 2021-04-23 2021-07-16 浙江博采传媒有限公司 Color correction method and system applied to virtual production of LED (light-emitting diode) ring screen

Also Published As

Publication number Publication date
JP2007043686A (en) 2007-02-15
US7408591B2 (en) 2008-08-05

Similar Documents

Publication Publication Date Title
US7367723B2 (en) Fast switching camera aperture
US7408591B2 (en) System and method for defocus difference matting
US7602990B2 (en) Matting using camera arrays
JP4309171B2 (en) Image processing method and image processing apparatus
US8363117B2 (en) Method and apparatus for photographing and projecting moving images
Deselaers et al. Pan, zoom, scan—time-coherent, trained automatic video cropping
US20070126918A1 (en) Cameras with multiple sensors
US9679369B2 (en) Depth key compositing for video and holographic projection
US7609327B2 (en) Polarization difference matting using a screen configured to reflect polarized light
CN109493283A (en) A kind of method that high dynamic range images ghost is eliminated
CN114651275B (en) Image stitching of full field of view reference images
Trottnow et al. The potential of light fields in media productions
US7463821B2 (en) Flat panel image to film transfer method and apparatus
Lancelle et al. Controlling motion blur in synthetic long time exposures
McGuire et al. Practical, Real-time Studio Matting using Dual Imagers.
Youm et al. High Dynamic Range Video through Fusion of Exposured-Controlled Frames.
Alzayer et al. DC2: Dual-Camera Defocus Control by Learning To Refocus
US11935285B1 (en) Real-time synthetic out of focus highlight rendering
Jeong et al. Digital panning shot generator from photographs
Li A hybrid camera system for low-light imaging
McGuire et al. Defocus difference matting
Liang et al. High-Quality Light Field Acquisition and Processing
Luong Painted Aperture for Portraits
JP2000152278A (en) Method, device, and system for segmenting object image and medium where program thereof is recorded
Langer et al. Capturing Non-Periodic Omnistereo Motion

Legal Events

Date Code Title Description
AS Assignment

Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MATUSIK, WOJCIECH;REEL/FRAME:016829/0708

Effective date: 20050729

AS Assignment

Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MCGUIRE, MORGAN;REEL/FRAME:017004/0904

Effective date: 20050907

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

SULP Surcharge for late payment

Year of fee payment: 7

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20200805