US20070024756A1 - System and method for defocus difference matting - Google Patents
System and method for defocus difference matting Download PDFInfo
- Publication number
- US20070024756A1 US20070024756A1 US11/193,742 US19374205A US2007024756A1 US 20070024756 A1 US20070024756 A1 US 20070024756A1 US 19374205 A US19374205 A US 19374205A US 2007024756 A1 US2007024756 A1 US 2007024756A1
- Authority
- US
- United States
- Prior art keywords
- aperture
- scene
- image
- images
- narrow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000003287 optical effect Effects 0.000 claims description 23
- 230000010287 polarization Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 239000003086 colorant Substances 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 239000005262 ferroelectric liquid crystals (FLCs) Substances 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013476 bayesian approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 238000009304 pastoral farming Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
- H04N5/275—Generation of keying signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Definitions
- This invention relates generally to image editing, and more particularly to matting.
- Matting and compositing are frequently used in image and video editing, 3D photography, and film production. Matting separates a foreground region from an input image by estimating a color F and an opacity a for each pixel in the image. Compositing uses the matte to blend the extracted foreground with a novel background to produce an output image representing a novel scene.
- the opacity ⁇ measures a ‘coverage’ of the foreground region, due to either partial spatial coverage or partial temporal coverage, i.e., motion blur.
- the set of all opacity values ⁇ is called the alpha matte, the alpha channel, or simply a matte.
- the matting problem can be formulated as follows. An image of a foreground against an opaque black background in a scene is ⁇ F. An image of the background without the foreground is B. An alpha image, where each pixel represents a partial coverage of that pixel by the foreground, is ⁇ . The image ⁇ is essentially an image of the foreground object ‘painted’ white, evenly lit, and held against the opaque background. The scale and resolution of the foreground and background images can differ due to perspective foreshortening.
- Matting is the inverse problem of solving for the unknown values of the variables ( ⁇ , F r , F g , F b , B r , B g , B b ), given the composite image pixel values (I P r , I P g , I P b ), where r, g, and b are color channels.
- the ‘P’ subscript denotes that Equation (1) holds for a pinhole camera, i.e., where the entire scene is in focus. One can approximate a pinhole camera with a very narrow aperture. Blue screen matting is easier to solve because the background color B is known.
- Matting is described generally by Smith et al., “Blue screen matting,” Proceedings of the 23 rd Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 259-268; and U.S. Pat. No. 4,100,569, “Comprehensive electronic compositing system,” issued to Vlahos on Jul. 11, 1978.
- Blue screen matting is the predominant technique in the film and broadcast industry. For example, broadcast studios use blue matting for presenting weather reports. The background is a blue screen, and the foreground region includes the weatherman standing in front of the blue screen. The foreground is extracted, and then superimposed onto a weather map so that it appears that the weatherman is actually standing in front of the map.
- blue screen matting is costly and not readily available to casual users. Even production studios would prefer a lower-cost and less intrusive alternative.
- trimaps To determine the distribution of color in the foreground and background regions.
- a trimap segments an image into background, foreground and unknown pixels.
- those methods estimate likely values of the foreground and background colors of unknown pixels, and use the colors to solve the matting Equation (1).
- Bayesian matting and its extension to image sequences, produces the best results in many applications.
- those methods require manually defined trimaps for key frames. This is tedious for a long image sequences. It is desired to provide a method that does not require user intervention, and that can operate in real-time as an image sequence is acquired.
- the Poisson matting of Sun et al. solves a Poisson equation for the matte by assuming that the foreground and background are slowly varying. Their method interacts closely with the user by beginning from a manually constructed trimap. They also provide ‘painting’ tools to correct errors in the matte.
- Difference matting also known as background subtraction, solves for ⁇ and the alpha-multiplied foreground ⁇ F, given background and trimap images, Qian et al., “Video background replacement without a blue screen,” Proceedings of ICIP, vol. 4, 143-146, 1999.
- difference matting has limited discrimination at the borders of the foreground.
- Back lighting is a common segmentation method used in many computer vision systems. Back lighting has also been used in image-based rendering systems, Debevec et al., “A lighting reproduction approach to live action compositing,” ACM Transactions on Graphics 21, 3, pp. 547-556, 2002. That method has two drawbacks. First, active illumination is required, and second, incorrect results may be produced near object boundaries because some objects become highly reflective near grazing angles of the light.
- Another method uses a depth-from-focus system to recover overlapping objects with fractional alphas, Schechner et al, “Separation of transparent layers using focus,” International Journal of Computer Vision, pp. 25-39, 2000. They position a motorized CCD axially behind a lens to acquire images with slightly varying points of focus. Depth is recovered by selecting the image plane location that has the best focused image. That method is limited to static scenes.
- Another method uses three video streams acquired by three cameras with different depth-of-field and focus and that share the same center of projection to extract mattes for scenes with unconstrained, dynamic backgrounds, McGuire et al., “Defocus Video Matting,” ACM Transactions on Graphics 24, 3, 2003; and U.S. patent application Ser. No. 11/092,376, filed by McGuire et al. on Mar. 29, 2005, “System and Method for Image Matting.”
- Matting is a process for extracting a high-quality alpha matte and foreground from an image or a video.
- Conventional techniques require either a known background, e.g., a blue screen, or extensive manual interaction, e.g., manually specified foreground and background regions. Matting is generally under-constrained, because not enough information is obtained when the images are acquired.
- One embodiment of the invention provides a system and method:for extracting a matte automatically from a video.
- the video includes sets of pinhole (narrow aperture) images and wide apertures images (frames) that are produced either in parallel or in a time-interleaved manner.
- the parallel sets of images can be acquired with a camera having two optical systems that have a common center of projection.
- One optical system has a large depth-of-field to acquire the pinhole images, while the other optical system has a small depth-of-field to acquire the wide apertures images.
- a single camera can acquire the time-interleaved images using a fast switching aperture.
- the aperture includes polarizing elements that can rapidly switch between different aperture sizes.
- the aperture size is manipulated using optical techniques.
- the aperture does not require any moving parts, and can be switched at rates far exceeding mechanical apertures.
- FIG. 1 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention
- FIG. 2 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention
- FIG. 3A is an exploded diagram of a camera aperture according to an embodiment of the invention.
- FIG. 3B is a side view of the camera aperture of FIG. 3A ;
- FIG. 3C is an exploded diagram of a camera aperture according to an embodiment of the invention.
- FIG. 3D is a side vide of the camera aperture of FIG. 3C ;
- FIG. 3E is an exploded view of a camera aperture in the form of slits
- FIG. 3F is a view of an aperture offset from the optical axis
- FIG. 3G is a view of a camera aperture in the form of a torus
- FIG. 4 is a diagram of a high frequency background pattern according to an embodiment of the invention.
- FIG. 5 is a block diagram of a method for extracting a matte according to one embodiment of the invention.
- FIG. 1 shows a system 100 and method 500 for automatically extracting a matte 141 from a video 110 acquired of a scene 120 according to an embodiment of our invention.
- the scene 120 includes a background (B) 121 and a foreground (F) 122 .
- the scene can be a natural, real-word scene illuminated only by ambient light.
- the video 110 is acquired by a camera 101 including a pinhole (narrow aperture) optical system 102 and a foreground optical system 103 .
- the optical systems 102 - 103 have a single center of projection on an optical axis 160 , and use a beam splitter 151 .
- the optical systems are calibrated with respect to each other.
- the video 110 is provided to a processor 140 performing the method 500 .
- the video 110 includes sets of images 111 - 112 acquired parallel in time.
- the set of images I P 111 is acquired at a large depth-of-field 131 , i.e., the images I P 111 are acquired with a very narrow aperture focused at the foreground.
- the images I P 111 can be approximated using a pinhole camera model, see Equation (1).
- a corresponding set of wide aperture mages I F 112 is acquired in parallel with a small depth-of-field 132 focused at the foreground.
- FIG. 2 shows another embodiment of the invention.
- the camera 201 uses a single optical system, and the images I P and I F of the sets 111 - 112 are serially interleaved in time.
- pairs of corresponding narrow aperture images I P and wide aperture images I F may not be registered when the scene 120 includes moving objects.
- a conventional optical flow process can be used to register the sets of images I P and I F .
- the camera 201 uses a fast switching aperture 300 .
- Most conventional camera apertures use a mechanical shutter.
- the speed at which the mechanical shutter can open and close is limited by the weight of the leaves of the aperture and the strengths of the springs and actuators driving the leaves of the aperture. Even if the aperture mechanism were made very strongly, and driven with high-energy actuators, the resulting large mechanical motions would induce vibrations in the camera assembly blurring the image.
- the aperture 300 can switch size at a frame rate of the camera 201 or higher, e.g., at a rate up to about 10 KHz.
- the aperture mechanism 300 includes two polarizers 301 - 303 having a first diameter D.
- the second polarizer 303 has a pinhole or narrow aperture (through hole) 304 having a second diameter d.
- the polarization of the polarizers 301 - 303 are rotated 90° with respect to each other, as shown. Therefore, light only passes through the pinhole 304 .
- a polarization rotator element 302 is disposed between the polarizers 301 - 303 .
- the element 302 rotates the polarization of the light field passing though it by 90° when a voltage (V) is applied to the polarization rotator 302 .
- V a voltage
- the element is a ferroelectric liquid crystal.
- ferroelectric liquid crystal devices can switch from a zero-rotation state to a 90-degree rotation state in less than 10 microseconds with power inputs on the order of a few volts at a fraction of a milliamp.
- Other polarization rotators are possible to use.
- a Kerr cell can rotate polarization.
- Conventional nematic, supertwist liquid crystals can also rotate the polarization.
- FIG. 3B is an end view of the aperture shown in FIG. 3A .
- FIGS. 3C and 3D show another embodiment with multiple aperture settings possible.
- This aperture includes polarizers 321 , 323 , 325 , and polarization rotators 322 and 324 .
- Two though holes 331 - 332 are shown to provide three aperture settings. It should be understood that any number of fast switching aperture settings can be provided in this manner.
- the sizes of the apertures decrease in a direction of the light field passing through the aperture.
- FIG. 3E shows a configuration where the through holes are slits 341 - 342 spaced apart at varying distances. This can be used to separate low (DC) and high (AC) frequency components in a light field.
- FIG. 3F shows an embodiment where the one or more through holes 351 are offset from the center (optical axis).
- FIG. 3G show an embodiment where the through hole 461 is a torus.
- FIG. 5 shows the steps of the basic method.
- a known background pattern can be used to guarantee that ⁇ B-B F ⁇ is substantially larger than zero, see below.
- alpha values can be interpolated from the neighboring pixels.
- known scattered data interpolation methods e.g., push-pull as described by Gortler et al., “The lumigraph,” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, pp. 43-54, 1996, incorporated herein by reference.
- the background can be illuminated using a projector, or the background can include a known pattern so that the expression used to determine a is well-conditioned.
- the denominator ⁇ B-B F ⁇ determines the precision of ⁇ .
- the background can have values from B min to B max .
- the image B F is a low-pass version of the background.
- Optimal values for B are B min and B max .
- the value of the denominator is always 0.5 ⁇ (B max -B min ).
- B min 0
- the denominator is reduced to 0.5 ⁇ B max and one bit of alpha precision is lost, e.g., if B max is an 8 bit value, then the maximum precision of alpha is 7 bits.
- the minimum frequency of the pattern is k ⁇ 0.5 pixels. If the large depth-of-field pixels are aligned with the pattern, then the maximum frequency of the pattern is one pixel. In cases of higher frequencies, different patterns values are averaged. The use of color improves the conditioning of the problem when the pixels are misaligned with pattern transitions. It is desired to shift the pattern for different colors such that the value of the denominator is always large for at least one color.
- a pattern in one dimension for the color red 401 is shifted by 1 ⁇ 4 in front of the pattern period with respect to the pattern 402 for the blue color and the pattern 403 for the green color.
- repeated vertical bars 410 of the colors red, white, green/blue, and black are produced to produce the desired high frequency background pattern.
- the bars are about 2-3 millimeters wide. It should be understood that the pattern can easily be printed on wallpaper for covering an entire sound stage.
Abstract
α=1+(I F −I P)/(B−B F).
Description
- This invention relates generally to image editing, and more particularly to matting.
- Matting and compositing are frequently used in image and video editing, 3D photography, and film production. Matting separates a foreground region from an input image by estimating a color F and an opacity a for each pixel in the image. Compositing uses the matte to blend the extracted foreground with a novel background to produce an output image representing a novel scene. The opacity α measures a ‘coverage’ of the foreground region, due to either partial spatial coverage or partial temporal coverage, i.e., motion blur. The set of all opacity values α is called the alpha matte, the alpha channel, or simply a matte.
- The matting problem can be formulated as follows. An image of a foreground against an opaque black background in a scene is αF. An image of the background without the foreground is B. An alpha image, where each pixel represents a partial coverage of that pixel by the foreground, is α. The image α is essentially an image of the foreground object ‘painted’ white, evenly lit, and held against the opaque background. The scale and resolution of the foreground and background images can differ due to perspective foreshortening.
- The notions of an alpha matte, pre-multiplied alpha, and the algebra of composition have been formalized by Porter et al., “Compositing digital images,” in Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 253-259, 1984. They showed that for a pinhole (narrow aperture) camera, the image αF in front of the background image B can be expressed 501 (see
FIG. 5 ) by a linear interpolation:
I P =αF+(1−α)B (1)
where IP is a pinhole (narrow aperture) image, αF is the pre-multiplied image of the foreground against an opaque background, and B is the image of the opaque background in the absence of the foreground. - Matting is the inverse problem of solving for the unknown values of the variables (α, Fr, Fg, Fb, Br, Bg, Bb), given the composite image pixel values (IP
r , IPg , IPb ), where r, g, and b are color channels. The ‘P’ subscript denotes that Equation (1) holds for a pinhole camera, i.e., where the entire scene is in focus. One can approximate a pinhole camera with a very narrow aperture. Blue screen matting is easier to solve because the background color B is known. - Matting is described generally by Smith et al., “Blue screen matting,” Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, ACM Press, pp. 259-268; and U.S. Pat. No. 4,100,569, “Comprehensive electronic compositing system,” issued to Vlahos on Jul. 11, 1978.
- Conventional matting requires a background with known, constant color, which is referred to as blue screen matting. If a digital camera is used, then a green matte is preferred. Blue screen matting is the predominant technique in the film and broadcast industry. For example, broadcast studios use blue matting for presenting weather reports. The background is a blue screen, and the foreground region includes the weatherman standing in front of the blue screen. The foreground is extracted, and then superimposed onto a weather map so that it appears that the weatherman is actually standing in front of the map. However, blue screen matting is costly and not readily available to casual users. Even production studios would prefer a lower-cost and less intrusive alternative.
- Ideally, one would like to extract a high-quality matte from an image or video with an arbitrary, i.e., unknown, background. This process is known as natural image matting. Recently, there has been substantial progress in this area, Ruzon et al., “Alpha estimation in natural images,” CVPR, vol. 1, pp. 18-25, 2000; Hillman et al., “Alpha channel estimation in high resolution images and image sequences,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 1, pp. 1063-1068, 2001; Chuang et al., “A Bayesian approach to digital matting,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 2, pp. 264-271, 2001; Chuang et al., “Video matting of complex scenes,” ACM Trans. on Graphics 21, 3, pp. 243-248, July, 2002; and Sun et al., “Poisson matting,” ACM Trans. on Graphics, August 2004. Unfortunately, all of those methods require substantial manual intervention, which becomes prohibitive for long image sequences and for non-professional users. The difficulty arises because matting from a single image is fundamentally under-constrained.
- It desired to perform matting using non-intrusive techniques. That is, the scene does not need to be modified. It is also desired to perform the matting automatically. Furthermore, it is desired to provide matting for ‘rich’ natural images, i.e., images with a lot of fine, detailed structure.
- Most natural image matting methods require manually defined trimaps to determine the distribution of color in the foreground and background regions. A trimap segments an image into background, foreground and unknown pixels. Using the trimaps, those methods estimate likely values of the foreground and background colors of unknown pixels, and use the colors to solve the matting Equation (1).
- Bayesian matting, and its extension to image sequences, produces the best results in many applications. However, those methods require manually defined trimaps for key frames. This is tedious for a long image sequences. It is desired to provide a method that does not require user intervention, and that can operate in real-time as an image sequence is acquired.
- The prior art estimation of the color distributions works only when the foreground and background are sufficiently different in a neighborhood of an unknown pixel. Therefore, it is desired to provide a method that can extract a matte where the foreground and background pixels have substantially similar color distributions.
- The Poisson matting of Sun et al. solves a Poisson equation for the matte by assuming that the foreground and background are slowly varying. Their method interacts closely with the user by beginning from a manually constructed trimap. They also provide ‘painting’ tools to correct errors in the matte.
- An unassisted, natural video matting system is described by Zitnick et al., “High-quality video view interpolation using a layered representation,” ACM Trans. on Graphics 23, 3, pp. 600-608, 2004. They acquire videos with a horizontal row of eight cameras spaced over about two meters. They measure depth discrepancies from stereo disparity using sophisticated region processing, and then construct a trimap from the depth discrepancies. The actual matting is determined by the Bayesian matting of Chuang et al. However, that method has the view dependent problems that are unavoidable with stereo cameras, e.g., reflections, specular highlights, and occlusions. It is desired to avoid view dependent problems.
- Difference matting, also known as background subtraction, solves for α and the alpha-multiplied foreground αF, given background and trimap images, Qian et al., “Video background replacement without a blue screen,” Proceedings of ICIP, vol. 4, 143-146, 1999. However, difference matting has limited discrimination at the borders of the foreground.
- Another method uses back lighting to determine the matte. Back lighting is a common segmentation method used in many computer vision systems. Back lighting has also been used in image-based rendering systems, Debevec et al., “A lighting reproduction approach to live action compositing,” ACM Transactions on Graphics 21, 3, pp. 547-556, 2002. That method has two drawbacks. First, active illumination is required, and second, incorrect results may be produced near object boundaries because some objects become highly reflective near grazing angles of the light.
- Scene reconstruction is described by Favaro et al., “Seeing beyond occlusions (and other marvels of a finite lens aperture),” Proc. of the IEEE Intl. Conf. on Computer Vision and Pattern Recognition, p. 579, 2003. That method uses defocused images and gradient descent minimization of a sum-squared error. The method solves for coarse depth and a binary alpha.
- Another method uses a depth-from-focus system to recover overlapping objects with fractional alphas, Schechner et al, “Separation of transparent layers using focus,” International Journal of Computer Vision, pp. 25-39, 2000. They position a motorized CCD axially behind a lens to acquire images with slightly varying points of focus. Depth is recovered by selecting the image plane location that has the best focused image. That method is limited to static scenes.
- Another method uses three video streams acquired by three cameras with different depth-of-field and focus and that share the same center of projection to extract mattes for scenes with unconstrained, dynamic backgrounds, McGuire et al., “Defocus Video Matting,” ACM Transactions on Graphics 24, 3, 2003; and U.S. patent application Ser. No. 11/092,376, filed by McGuire et al. on Mar. 29, 2005, “System and Method for Image Matting.”
- Matting is a process for extracting a high-quality alpha matte and foreground from an image or a video. Conventional techniques require either a known background, e.g., a blue screen, or extensive manual interaction, e.g., manually specified foreground and background regions. Matting is generally under-constrained, because not enough information is obtained when the images are acquired.
- One embodiment of the invention provides a system and method:for extracting a matte automatically from a video. The video includes sets of pinhole (narrow aperture) images and wide apertures images (frames) that are produced either in parallel or in a time-interleaved manner.
- The parallel sets of images can be acquired with a camera having two optical systems that have a common center of projection. One optical system has a large depth-of-field to acquire the pinhole images, while the other optical system has a small depth-of-field to acquire the wide apertures images.
- A single camera can acquire the time-interleaved images using a fast switching aperture. The aperture includes polarizing elements that can rapidly switch between different aperture sizes. As an advantage, the aperture size is manipulated using optical techniques. Thus, the aperture does not require any moving parts, and can be switched at rates far exceeding mechanical apertures.
-
FIG. 1 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention; -
FIG. 2 is a block diagram of a method for extracting a matte from a video according to an embodiment of the invention; -
FIG. 3A is an exploded diagram of a camera aperture according to an embodiment of the invention; -
FIG. 3B is a side view of the camera aperture ofFIG. 3A ; -
FIG. 3C is an exploded diagram of a camera aperture according to an embodiment of the invention; -
FIG. 3D is a side vide of the camera aperture ofFIG. 3C ; -
FIG. 3E is an exploded view of a camera aperture in the form of slits; -
FIG. 3F is a view of an aperture offset from the optical axis; -
FIG. 3G is a view of a camera aperture in the form of a torus; -
FIG. 4 is a diagram of a high frequency background pattern according to an embodiment of the invention; and -
FIG. 5 is a block diagram of a method for extracting a matte according to one embodiment of the invention. - System Structure
-
FIG. 1 shows asystem 100 andmethod 500 for automatically extracting a matte 141 from avideo 110 acquired of ascene 120 according to an embodiment of our invention. Thescene 120 includes a background (B) 121 and a foreground (F) 122. The scene can be a natural, real-word scene illuminated only by ambient light. - The
video 110 is acquired by acamera 101 including a pinhole (narrow aperture)optical system 102 and a foregroundoptical system 103. The optical systems 102-103 have a single center of projection on anoptical axis 160, and use abeam splitter 151. The optical systems are calibrated with respect to each other. Thevideo 110 is provided to aprocessor 140 performing themethod 500. - The
video 110 includes sets of images 111-112 acquired parallel in time. The set of images IP 111 is acquired at a large depth-of-field 131, i.e., the images IP 111 are acquired with a very narrow aperture focused at the foreground. The images IP 111 can be approximated using a pinhole camera model, see Equation (1). A corresponding set of wide aperture mages IF 112 is acquired in parallel with a small depth-of-field 132 focused at the foreground. -
FIG. 2 shows another embodiment of the invention. The camera 201 uses a single optical system, and the images IP and IF of the sets 111-112 are serially interleaved in time. - Because the images are interleaved in time, pairs of corresponding narrow aperture images IP and wide aperture images IF may not be registered when the
scene 120 includes moving objects. In this case, a conventional optical flow process can be used to register the sets of images IP and IF. - Camera Aperture
- Therefore, as shown in
FIG. 3A , the camera 201 uses afast switching aperture 300. Most conventional camera apertures use a mechanical shutter. The speed at which the mechanical shutter can open and close is limited by the weight of the leaves of the aperture and the strengths of the springs and actuators driving the leaves of the aperture. Even if the aperture mechanism were made very strongly, and driven with high-energy actuators, the resulting large mechanical motions would induce vibrations in the camera assembly blurring the image. - Most conventional camera apertures are mechanical, and include moving parts. There are two major problems with such apertures. First, the apertures are relatively slow to switch to different depths of field, and second, the rapid movement of the parts causes vibration in the camera body, which adds noise to the images, particularly if the imager is a CCD type of device. Therefore, it is desired to provide a fast switching camera aperture that operates on optical, and not mechanical, principals.
- The
aperture 300 can switch size at a frame rate of the camera 201 or higher, e.g., at a rate up to about 10 KHz. Theaperture mechanism 300 includes two polarizers 301-303 having a first diameter D. Thesecond polarizer 303 has a pinhole or narrow aperture (through hole) 304 having a second diameter d. The polarization of the polarizers 301-303 are rotated 90° with respect to each other, as shown. Therefore, light only passes through thepinhole 304. - A
polarization rotator element 302, also having a diameter D, is disposed between the polarizers 301-303. Theelement 302 rotates the polarization of the light field passing though it by 90° when a voltage (V) is applied to thepolarization rotator 302. For example, the element is a ferroelectric liquid crystal. Thus, when the voltage is applied to the element, the camera 201 has a large aperture diameter D because light passes though all three elements 301-303. Otherwise, absent the voltage, the light only passes through thepinhole 304 having an aperture diameter d. Commercial ferroelectric liquid crystal devices can switch from a zero-rotation state to a 90-degree rotation state in less than 10 microseconds with power inputs on the order of a few volts at a fraction of a milliamp. Other polarization rotators are possible to use. For example a Kerr cell can rotate polarization. Conventional nematic, supertwist liquid crystals can also rotate the polarization. -
FIG. 3B is an end view of the aperture shown inFIG. 3A . -
FIGS. 3C and 3D show another embodiment with multiple aperture settings possible. This aperture includespolarizers polarization rotators -
FIG. 3E shows a configuration where the through holes are slits 341-342 spaced apart at varying distances. This can be used to separate low (DC) and high (AC) frequency components in a light field.FIG. 3F shows an embodiment where the one or more throughholes 351 are offset from the center (optical axis).FIG. 3G show an embodiment where the through hole 461 is a torus. - It should be noted, that if multiple through holes are used, it is actually possible to move the aperture along the optical axis, effectively changing the focal plane. This is not possible with mechanical apertures.
- It should also be noted that various combinations of different through holes with differences in size, shape and offset from the optical axis can be used.
- Method Operation
- When the images are acquired according to the embodiments of the invention, then the following expression 502 (see
FIG. 5 ) approximates the wide aperture images IF 112:
I F =αF+(1−α)B F, (2)
where
B F =B{circle around (×)}h B, (3)
and hB is a point spread function (PSF). - Given a known background, we can directly solve for α. Although we begin with a known background, we avoid many of the drawbacks of difference matting by using two sets of images: the
pinhole images 111 and thewide aperture images 112. - From Equations (1) and (2), we obtain an
expression 503 for α 141:
α=C+(I F −I P)/(B−B F), (4)
where C is a constant, e.g., 1. -
FIG. 5 shows the steps of the basic method. - To produce better results, we use:
α=(B−B F +I F −I P)/(B−B F). (5) - If color images are used, then B, BF, IP, and IF are vectors. Thus, the expression for α is
α=(∥B−B F +I F −I P∥)/(∥B −B F∥), (6)
where ∥.∥ is a length operator for color vectors. - Given α we can determine αF using:
αF=I F+(α−1)B F (7)
or
αF=I P+(α−1)B (8)
or
αF=0.5×(I F +I P+(α−1)(B F +B)). (9)
However, the α expression can be ill-conditioned when B=BF. Therefore, alpha values for these pixels can not be determined reliably. - There are two possible solutions to this problem. A known background pattern can be used to guarantee that ∥B-BF∥ is substantially larger than zero, see below. Alternatively, alpha values can be interpolated from the neighboring pixels. One can either use a threshold for the denominator ∥B-BF∥ with the alpha values being interpolated from the neighbors when the denominator is less than a threshold, or one can use a confidence map for the value of the denominator. In this context, it is possible to use known scattered data interpolation methods, e.g., push-pull as described by Gortler et al., “The lumigraph,” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, pp. 43-54, 1996, incorporated herein by reference.
- Matte Background Pattern
- In many applications, it is necessary to guarantee perfect matting results. In this setting, any type of incorrect matting and compositing is simply not acceptable. In such applications, the background can be illuminated using a projector, or the background can include a known pattern so that the expression used to determine a is well-conditioned.
- According to Equation (6), the denominator ∥B-BF∥ determines the precision of α. The background can have values from Bmin to Bmax. The image BF is a low-pass version of the background. An optimal low-pass (average) value of the background is:
B F=0.5×(B min +B max). (10) - Optimal values for B are Bmin and Bmax. Thus, the value of the denominator is always 0.5×(Bmax-Bmin). In the best case, when Bmin=0, the denominator is reduced to 0.5×Bmax and one bit of alpha precision is lost, e.g., if Bmax is an 8 bit value, then the maximum precision of alpha is 7 bits.
- There are many background patterns that give these optimal results. Here, we described an example background. If the PSF hB is rotationally symmetric, then the pattern can be specified in 1D. Next, we determine allowed frequencies of the pattern.
- If the size of the PSF hB is k pixels, then the minimum frequency of the pattern is k×0.5 pixels. If the large depth-of-field pixels are aligned with the pattern, then the maximum frequency of the pattern is one pixel. In cases of higher frequencies, different patterns values are averaged. The use of color improves the conditioning of the problem when the pixels are misaligned with pattern transitions. It is desired to shift the pattern for different colors such that the value of the denominator is always large for at least one color.
- For example, as shown in
FIG. 4 , a pattern in one dimension for thecolor red 401 is shifted by ¼ in front of the pattern period with respect to thepattern 402 for the blue color and thepattern 403 for the green color. When the above patterns are superimposed and printed or projected onto a white surface in 2D, repeatedvertical bars 410 of the colors red, white, green/blue, and black are produced to produce the desired high frequency background pattern. In an actual implementation, the bars are about 2-3 millimeters wide. It should be understood that the pattern can easily be printed on wallpaper for covering an entire sound stage. - Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Claims (19)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/193,742 US7408591B2 (en) | 2005-07-29 | 2005-07-29 | System and method for defocus difference matting |
JP2006188917A JP2007043686A (en) | 2005-07-29 | 2006-07-10 | System and method for extracting alpha matte from video acquired of certain scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/193,742 US7408591B2 (en) | 2005-07-29 | 2005-07-29 | System and method for defocus difference matting |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070024756A1 true US20070024756A1 (en) | 2007-02-01 |
US7408591B2 US7408591B2 (en) | 2008-08-05 |
Family
ID=37693888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/193,742 Expired - Fee Related US7408591B2 (en) | 2005-07-29 | 2005-07-29 | System and method for defocus difference matting |
Country Status (2)
Country | Link |
---|---|
US (1) | US7408591B2 (en) |
JP (1) | JP2007043686A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090091711A1 (en) * | 2004-08-18 | 2009-04-09 | Ricardo Rivera | Image Projection Kit and Method and System of Distributing Image Content For Use With The Same |
US20100283842A1 (en) * | 2007-04-19 | 2010-11-11 | Dvp Technologies Ltd. | Imaging system and method for use in monitoring a field of regard |
CN104200470A (en) * | 2014-08-29 | 2014-12-10 | 电子科技大学 | Blue screen image-matting method |
CN113129814A (en) * | 2021-04-23 | 2021-07-16 | 浙江博采传媒有限公司 | Color correction method and system applied to virtual production of LED (light-emitting diode) ring screen |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7609327B2 (en) * | 2006-04-19 | 2009-10-27 | Mitsubishi Electric Research Laboratories, Inc. | Polarization difference matting using a screen configured to reflect polarized light |
US7630541B2 (en) * | 2006-05-30 | 2009-12-08 | Microsoft Corporation | Image-wide matting |
KR20100051359A (en) * | 2008-11-07 | 2010-05-17 | 삼성전자주식회사 | Method and apparatus for generating of image data |
CN103997687B (en) * | 2013-02-20 | 2017-07-28 | 英特尔公司 | For increasing the method and device of interaction feature to video |
US9330718B2 (en) | 2013-02-20 | 2016-05-03 | Intel Corporation | Techniques for adding interactive features to videos |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6496599B1 (en) * | 1998-04-01 | 2002-12-17 | Autodesk Canada Inc. | Facilitating the compositing of video images |
US6538396B1 (en) * | 2001-09-24 | 2003-03-25 | Ultimatte Corporation | Automatic foreground lighting effects in a composited scene |
US6571012B1 (en) * | 1998-04-01 | 2003-05-27 | Autodesk Canada Inc. | Adjusting a softness region |
-
2005
- 2005-07-29 US US11/193,742 patent/US7408591B2/en not_active Expired - Fee Related
-
2006
- 2006-07-10 JP JP2006188917A patent/JP2007043686A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6496599B1 (en) * | 1998-04-01 | 2002-12-17 | Autodesk Canada Inc. | Facilitating the compositing of video images |
US6571012B1 (en) * | 1998-04-01 | 2003-05-27 | Autodesk Canada Inc. | Adjusting a softness region |
US6538396B1 (en) * | 2001-09-24 | 2003-03-25 | Ultimatte Corporation | Automatic foreground lighting effects in a composited scene |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090091711A1 (en) * | 2004-08-18 | 2009-04-09 | Ricardo Rivera | Image Projection Kit and Method and System of Distributing Image Content For Use With The Same |
US8066384B2 (en) | 2004-08-18 | 2011-11-29 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US8632192B2 (en) | 2004-08-18 | 2014-01-21 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US9078029B2 (en) | 2004-08-18 | 2015-07-07 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US9560307B2 (en) | 2004-08-18 | 2017-01-31 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US10084998B2 (en) | 2004-08-18 | 2018-09-25 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US10567718B2 (en) | 2004-08-18 | 2020-02-18 | Klip Collective, Inc. | Image projection kit and method and system of distributing image content for use with the same |
US10986319B2 (en) | 2004-08-18 | 2021-04-20 | Klip Collective, Inc. | Method for projecting image content |
US20100283842A1 (en) * | 2007-04-19 | 2010-11-11 | Dvp Technologies Ltd. | Imaging system and method for use in monitoring a field of regard |
US8937651B2 (en) * | 2007-04-19 | 2015-01-20 | Dvp Technologies Ltd. | Imaging system and method for use in monitoring a field of regard |
CN104200470A (en) * | 2014-08-29 | 2014-12-10 | 电子科技大学 | Blue screen image-matting method |
CN113129814A (en) * | 2021-04-23 | 2021-07-16 | 浙江博采传媒有限公司 | Color correction method and system applied to virtual production of LED (light-emitting diode) ring screen |
Also Published As
Publication number | Publication date |
---|---|
JP2007043686A (en) | 2007-02-15 |
US7408591B2 (en) | 2008-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7367723B2 (en) | Fast switching camera aperture | |
US7408591B2 (en) | System and method for defocus difference matting | |
US7602990B2 (en) | Matting using camera arrays | |
JP4309171B2 (en) | Image processing method and image processing apparatus | |
US8363117B2 (en) | Method and apparatus for photographing and projecting moving images | |
Deselaers et al. | Pan, zoom, scan—time-coherent, trained automatic video cropping | |
US20070126918A1 (en) | Cameras with multiple sensors | |
US9679369B2 (en) | Depth key compositing for video and holographic projection | |
US7609327B2 (en) | Polarization difference matting using a screen configured to reflect polarized light | |
CN109493283A (en) | A kind of method that high dynamic range images ghost is eliminated | |
CN114651275B (en) | Image stitching of full field of view reference images | |
Trottnow et al. | The potential of light fields in media productions | |
US7463821B2 (en) | Flat panel image to film transfer method and apparatus | |
Lancelle et al. | Controlling motion blur in synthetic long time exposures | |
McGuire et al. | Practical, Real-time Studio Matting using Dual Imagers. | |
Youm et al. | High Dynamic Range Video through Fusion of Exposured-Controlled Frames. | |
Alzayer et al. | DC2: Dual-Camera Defocus Control by Learning To Refocus | |
US11935285B1 (en) | Real-time synthetic out of focus highlight rendering | |
Jeong et al. | Digital panning shot generator from photographs | |
Li | A hybrid camera system for low-light imaging | |
McGuire et al. | Defocus difference matting | |
Liang et al. | High-Quality Light Field Acquisition and Processing | |
Luong | Painted Aperture for Portraits | |
JP2000152278A (en) | Method, device, and system for segmenting object image and medium where program thereof is recorded | |
Langer et al. | Capturing Non-Periodic Omnistereo Motion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MATUSIK, WOJCIECH;REEL/FRAME:016829/0708 Effective date: 20050729 |
|
AS | Assignment |
Owner name: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., M Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MCGUIRE, MORGAN;REEL/FRAME:017004/0904 Effective date: 20050907 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
SULP | Surcharge for late payment |
Year of fee payment: 7 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20200805 |