Final Project: Vertigo Shot and High Dynamic Range Imaging

Brandon Huang, Fall 2017

==========================================================================================================================================================================================================================================================================================================================================================================================================================

Vertigo Shot Overview

The vertigo shot or "dolly zoom" is a well known visual effect used in many older films. It involves keeping the size of a subject matter relatively constant while drastically changing the field of view (FOV) of the image. The FOV is changed using a zoom lens, and the resulting size difference is compensated by moving the camera closer to or farther from the subject. Because the subject is not moving, the foreground and/or background appear to be rapidly ballooning or shrinking in size, creating a creepy sensation of movement.

Results: Hallway

In these images, the stationary subject is not the door, but actually the wall which frames the scene. The brightness level changes because zooming in limits the incoming light; I tried auto-sensing the brightness levels but the ceiling lights messed that up so I just went with the slight brightness gradient.

Animated gif to make it a little more obvious...

Results: Friend at the table

Inspired by some of the dolly zoom examples I saw online, in which the background warps heavily behind a person who is the subject, I tried an example with a friend. The results were OK, however I think there was too much background information i.e. tables and benches, which gave perceptual context and reduced the disorienting effect of the dolly zoom.

And of course, a gif to bring it to life...

Videos

In the example I viewed online, I noticed that examples in the movies looked really good, so I surmised that the vertigo shot works best when viewed smoothly in motion. For example, just scrolling through the images I included, it is very hard to see any dolly-zoom effect. For this reason, I animated the gifs with a pause followed by a very fast zoom; I found this to be optimal to emphasize the perceived effect. I decided to take this one step further by simply imitating the examples and doing an actual video.

A video of a similar hallway zoom as the above images.

Some of the examples I saw limited depth of field using a large aperture to accentuate the effect. I tried that here, although I was hampered by bad auto-focusing.

HDR Imaging Overview

The real world contains scenes with an extremely high dynamic range, which is the range of radiance values (energy per unit area). However, cameras are limited by hardware constraints and can only capture a very small part of this dynamic range in one image through a fixed-duration exposure. Cameras map exposure values (irradiance * exposure time) to pixel values between 0 and 255; this mapping is known as the "response function". If we could just figure out the inverse of this function, we could map all our pixel values back to irradiance values, which are independent of exposure times. Luckily, using an algorithm described in Debevec and Malik 1997, if we take several pictures of the same scene at different levels of exposure, we can merge them together to derive a good estimate of the response function, and then solve for the absolute irradiance values incident on the camera. These irradiance values are a far richer source of information than any individual image capture, and can be used to produce an image where much more of the real world's dynamic range is discernible.

The (inverse) response function

The response function maps real-valued, positive exposure values into the range [0,255]. The inverse can thus be represented by exactly 256 values. Concretely, we'd like to have values of g in the following equations:

Z_ij = f(E_i * ∆t) (E_i is irradiance at pixel i, Z_ij is pixel value at pixel i in exposure number j)
ln(f^-1(Z_ij)) = g(Z_ij) = ln(E_i) + ln(∆t) (Logarithms make the equation linear which is useful for least squares)

Tone mapping operators

After recovering an HDR image, we have values that span several degrees of magnitude. How then to display this on a LDR display? There exist many ways to map the values to a linear range between 0 and 1 or 0 and 255; these are called tone mapping operators. There exist global operators, which apply the same transformation across the entire irradiance map, and local operators that have signal-dependent effects.

Procedure

My procedure is roughly as follows:

Randomly sample many locations from the image set. This number can be far less than the actual number of pixels in the image; if N is the number of sampled locations and P is the number of photographs captured, we need the relation (N-1)*P ~ Zmax-Zmin = 256 to hold, in order for a least squares formulation to work. To be safe, I used around 200 samples. However, we would like these pixels to have a good color distribution over the image. Thus I am also checking to ensure the number of unique pixel values in the sample is at least 50% of the number of unique values in the original image set. If it is not, I throw out the random locations and sample again until the 50% condition is met. This seems sufficient to guarantee the least squares to work well.
After acquiring N*P pixel values from the P photographs, set up a least squares problem to jointly solve for g(z), the values of the inverse response function, and E_i, the irradiance values of the sampled locations. We will throw out the values of E_i after solving, because we intend to compute E_i for all pixels afterward.
After recovering the inverse response function, we remap every image to an irradiance value, and average between all the images for noise reduction. This gives us an irradiance map.
We pick a tone mapping operator and apply it across the irradiance map to produce a final image that can be displayed.

Results

Basic example: Stanford Memorial Church

A set of images to combine. This image is suitable because it has extremely bright and dark areas, thus different exposures provide different parts of the overall detail, and the scene has high dynamic range that can be recovered.

HDR

A linear mapping of the bottom 0.1% of the dynamic range to [0,255]. Not very balanced and leaves many sections dark; we need something that can deal with orders of magnitude.

The operator X/(1+X) provides surprising good results. I tuned the irradiance values by a multiplicative factor to get this image.

Application of log-base2 followed by a rescale to [0,255]. Overemphasizes the blue channel, but shows the dark areas more clearly.

Bilateral filtering on two-scale decomposition

So far, these results are not bad. However, we can improve this using an operator based on a bilateral filter. Essentially, this involves splitting the HDR map into low and high frequencies (similar to the earlier hybrid images project), reducing the contrast of the low frequencies (this preserves detail but compresses the dynamic range), and recombining. After recombining the frequencies, we get an image with significantly improved visibility in the extreme values (near 0 or 255). We then apply one of the global operators to push the range into [0,255].

Low and high frequency components of log(intensity), separated by the bilateral filter. The low frequency component will be filtered. Intensity is the average of the three color radiance maps.

After applying the bilateral filter and removing some of the contrast, we can see much more of the dark areas in the upper right corner and left side; we also reduce the overwhelming brightness of the sunlight coming in the circular window.

New example: Sunset in Berkeley

I captured a set of LDR exposures from my apartment roof in Berkeley during the sunset. This should provide a great opportunity for an HDR shot because the high radiance of the sky and the low radiance of the city below cannot fit into a normal LDR image. In fact, the dynamic range was so high that none of the global operators performed well, and only bilateral filtering produced a good output. The response function I recovered wasn't entirely smooth/monotonic so the HDR images exhibit some oscillation artifacts visible in the sky (what should have been a monotonic change in brightness became up-and-down).

A set of images to combine. The top and bottom halves of the image appear well in different exposures, but never together.

HDR

Linear mapping provides nice color of the city, but little of the sky.

The operator X/(1+X) fails here as the dynamic range is just too large for a global tone operator.

The log-base2 operator washes out some of the image colors.

The bilateral filtering operator performs best in this case, as it removes the massive top-bottom contrast without eliminating much detail.

Another example: Rooftop patio

I captured this set of exposure at the same time, but with a scene of smaller dynamic range.

A set of images to combine. These are almost good in isolation but could use a little bit of dynamic range compression.

HDR

The log-base2 operator washes out the colors.

The bilateral filtering operator performs similarly to the X/(1+X) operator, but with slightly more saturation.