The vertigo shot or "dolly zoom" is a well known visual effect used in many older films. It involves keeping the size of a subject matter relatively constant while drastically changing the field of view (FOV) of the image. The FOV is changed using a zoom lens, and the resulting size difference is compensated by moving the camera closer to or farther from the subject. Because the subject is not moving, the foreground and/or background appear to be rapidly ballooning or shrinking in size, creating a creepy sensation of movement.
Results: Hallway
In these images, the stationary subject is not the door, but actually the wall which frames the scene. The brightness level changes because zooming in limits the incoming light; I tried auto-sensing the brightness levels but the ceiling lights messed that up so I just went with the slight brightness gradient.
Animated gif to make it a little more obvious...
Results: Friend at the table
Inspired by some of the dolly zoom examples I saw online, in which the background warps heavily behind a person who is the subject, I tried an example with a friend. The results were OK, however I think there was too much background information i.e. tables and benches, which gave perceptual context and reduced the disorienting effect of the dolly zoom.
And of course, a gif to bring it to life...
Videos
In the example I viewed online, I noticed that examples in the movies looked really good, so I surmised that the vertigo shot works best when viewed smoothly in motion. For example, just scrolling through the images I included, it is very hard to see any dolly-zoom effect. For this reason, I animated the gifs with a pause followed by a very fast zoom; I found this to be optimal to emphasize the perceived effect. I decided to take this one step further by simply imitating the examples and doing an actual video.
==========================================================================================================================================================================================================================================================================================================================================================================================================================
HDR Imaging Overview
The real world contains scenes with an extremely high dynamic range, which is the range of radiance values (energy per unit area). However, cameras are limited by hardware constraints and can only capture a very small part of this dynamic range in one image through a fixed-duration exposure. Cameras map exposure values (irradiance * exposure time) to pixel values between 0 and 255; this mapping is known as the "response function". If we could just figure out the inverse of this function, we could map all our pixel values back to irradiance values, which are independent of exposure times. Luckily, using an algorithm described in Debevec and Malik 1997, if we take several pictures of the same scene at different levels of exposure, we can merge them together to derive a good estimate of the response function, and then solve for the absolute irradiance values incident on the camera. These irradiance values are a far richer source of information than any individual image capture, and can be used to produce an image where much more of the real world's dynamic range is discernible.
The (inverse) response function
The response function maps real-valued, positive exposure values into the range [0,255]. The inverse can thus be represented by exactly 256 values. Concretely, we'd like to have values of g in the following equations:
Z_ij = f(E_i * ∆t) (E_i is irradiance at pixel i, Z_ij is pixel value at pixel i in exposure number j)
ln(f^-1(Z_ij)) = g(Z_ij) = ln(E_i) + ln(∆t) (Logarithms make the equation linear which is useful for least squares)
Tone mapping operators
After recovering an HDR image, we have values that span several degrees of magnitude. How then to display this on a LDR display? There exist many ways to map the values to a linear range between 0 and 1 or 0 and 255; these are called tone mapping operators. There exist global operators, which apply the same transformation across the entire irradiance map, and local operators that have signal-dependent effects.
Procedure
My procedure is roughly as follows:
Randomly sample many locations from the image set. This number can be far less than the actual number of pixels in the image; if N is the number of sampled locations and P is the number of photographs captured, we need the relation (N-1)*P ~ Zmax-Zmin = 256 to hold, in order for a least squares formulation to work. To be safe, I used around 200 samples. However, we would like these pixels to have a good color distribution over the image. Thus I am also checking to ensure the number of unique pixel values in the sample is at least 50% of the number of unique values in the original image set. If it is not, I throw out the random locations and sample again until the 50% condition is met. This seems sufficient to guarantee the least squares to work well.
After acquiring N*P pixel values from the P photographs, set up a least squares problem to jointly solve for g(z), the values of the inverse response function, and E_i, the irradiance values of the sampled locations. We will throw out the values of E_i after solving, because we intend to compute E_i for all pixels afterward.
After recovering the inverse response function, we remap every image to an irradiance value, and average between all the images for noise reduction. This gives us an irradiance map.
We pick a tone mapping operator and apply it across the irradiance map to produce a final image that can be displayed.
Results
Basic example: Stanford Memorial Church
HDR
Bilateral filtering on two-scale decomposition
So far, these results are not bad. However, we can improve this using an operator based on a bilateral filter. Essentially, this involves splitting the HDR map into low and high frequencies (similar to the earlier hybrid images project), reducing the contrast of the low frequencies (this preserves detail but compresses the dynamic range), and recombining. After recombining the frequencies, we get an image with significantly improved visibility in the extreme values (near 0 or 255). We then apply one of the global operators to push the range into [0,255].
New example: Sunset in Berkeley
I captured a set of LDR exposures from my apartment roof in Berkeley during the sunset. This should provide a great opportunity for an HDR shot because the high radiance of the sky and the low radiance of the city below cannot fit into a normal LDR image. In fact, the dynamic range was so high that none of the global operators performed well, and only bilateral filtering produced a good output. The response function I recovered wasn't entirely smooth/monotonic so the HDR images exhibit some oscillation artifacts visible in the sky (what should have been a monotonic change in brightness became up-and-down).
HDR
Another example: Rooftop patio
I captured this set of exposure at the same time, but with a scene of smaller dynamic range.