Marc Levoy
List of publications
This web page contains links to all my papers back to 1990, and selected ones
beyond that. The list is sorted by topic, and then in reverse chronological
order within each topic. A complete list may be found in my CV. For some of the older papers, PDFs have been
created from optical scans of the original publications. The entries for some
papers include links to software, data, other papers, or historical notes about
the paper. The visualization at right below was created from the words on this
page (with minor editing) using http://www.wordle.net.
 
 
Computational photography
 
(papers on light fields are farther down on this page)
 
 
 
Removing Reflections from RAW Photos
Eric Kee, Adam Pikielny,
Kevin Blackburn-Matzen,
Marc Levoy,
To appear in
Proc. CVPR 2025.
(preprint available in
arXiv:2404.14414)
This technology was first demoed
at an Adobe MAX
Sneak Peek in October 2023 ("Project SeeThrough").
See also this
Adobe blog.
 The technology is available in Adobe Camera Raw, Adobe
Lightroom, and on iPhone in the
Project Indigo camera app.
 
We describe a system to remove real-world reflections from images for consumer
photography. Our system operates on linear (RAW) photos, with the (optional)
addition of a contextual photo looking in the opposite direction, e.g., using the
selfie camera on a mobile device, which helps disambiguate what should be
considered the reflection. The system is trained using synthetic mixtures of
real-world RAW images, which are combined using a reflection simulation that is
photometrically and geometrically accurate. Our system consists of a base model
that accepts the captured photo and optional contextual photo as input, and runs
at 256p, followed by an up-sampling model that transforms output 256p images to
full resolution. The system can produce images for review at 1K in 4.5 to 6.5
seconds on a MacBook or iPhone 14 Pro. We test on RAW photos that were captured
in the field and embody typical consumer photographs.
 
 
 
 
Handheld Mobile Photography in Very Low Light
Orly Liba, Kiran Murthy, Yun-Ta Tsai, Tim Brooks, Tianfan Xue, Nikhil Karnad,
Qiurui He, Jonathan T. Barron, Dillon Sharlet, Ryan Geiss, Samuel W. Hasinoff,
Yael Pritch,
Marc Levoy,
ACM Transactions on Graphics 38(6)
(Proc. 
SIGGRAPH Asia 2019)
This paper describes the technology in Night Sight on Google Pixel 3.
Main and supplemental material in a single document here on
Arxiv.
Click here for an
earlier
article
in the Google Research blog.
For Pixel 4 and astrophotography, see this more recent
article.
 
Taking photographs in low light using a mobile phone is challenging and rarely
produces pleasing results. Aside from the physical limits imposed by read noise
and photon shot noise, these cameras are typically handheld, have small
apertures and sensors, use mass-produced analog electronics that cannot easily
be cooled, and are commonly used to photograph subjects that move, like
children and pets. In this paper we describe a system for capturing clean,
sharp, colorful photographs in light as low as 0.3~lux, where human vision
becomes monochromatic and indistinct. To permit handheld photography without
flash illumination, we capture, align, and combine multiple frames. Our system
employs "motion metering", which uses an estimate of motion magnitudes (whether
due to handshake or moving objects) to identify the number of frames and the
per-frame exposure times that together minimize both noise and motion blur in a
captured burst. We combine these frames using robust alignment and merging
techniques that are specialized for high-noise imagery. To ensure accurate
colors in such low light, we employ a learning-based auto white balancing
algorithm. To prevent the photographs from looking like they were shot in
daylight, we use tone mapping techniques inspired by illusionistic painting:
increasing contrast, crushing shadows to black, and surrounding the scene with
darkness. All of these processes are performed using the limited computational
resources of a mobile device. Our system can be used by novice photographers to
produce shareable pictures in a few seconds based on a single shutter press,
even in environments so dim that humans cannot see clearly.
 
 
 
 
Handheld Multi-Frame Super-Resolution
Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly,
	Michael Krainin, Chia-Kai Liang,
Marc Levoy,
Peyman Milanfar, 
ACM Transactions on Graphics,
(Proc. 
SIGGRAPH 2019)
Click here for an earlier related article in the
Google Research Blog.
See also these interviews by
DP Review (or
video) and
CNET.
 
Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits
their spatial resolution; smaller apertures, which limits their light gathering
ability; and smaller pixels, which reduces their signal-to- noise ratio. The
use of color filter arrays (CFAs) requires demosaicing, which further degrades
resolution. In this paper, we supplant the use of traditional demosaicing in
single-frame and burst photography pipelines with a multi- frame
super-resolution algorithm that creates a complete RGB image directly from a
burst of CFA raw images. We harness natural hand tremor, typical in handheld
photography, to acquire a burst of raw frames with small offsets. These frames
are then aligned and merged to form a single image with red, green, and blue
values at every pixel site. This approach, which includes no explicit
demosaicing step, serves to both increase image resolution and boost signal to
noise ratio. Our algorithm is robust to challenging scene conditions: local
motion, occlusion, or scene changes. It runs at 100 milliseconds per
12-megapixel RAW input burst frame on mass-produced mobile
phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature,
as well as the default merge method in Night Sight mode (whether zooming or
not) on Google’s flagship phone.
 
 
 
 
Synthetic Depth-of-Field with a Single-Camera Mobile Phone
Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feldman, Nori Kanazawa,
Robert Carroll, Yair Movshovitz-Attias, Jonathan T. Barron, Yael Pritch,
Marc Levoy
ACM Transactions on Graphics 37(4),
(Proc. 
SIGGRAPH 2018)
Click here for an earlier article in the
Google Research Blog.
And here for articles on
learning-based depth
(published in 
Proc. ICCV 2019)
and
dual-pixel + dual-camera
(published in 
Proc. ECCV 2020).
Click here for an
API for reading
dual-pixels from Pixel phones.
 
Shallow depth-of-field is commonly used by photographers to isolate a subject
from a distracting background. However, standard cell phone cameras cannot
produce such images optically, as their short focal lengths and small apertures
capture nearly all-in-focus images. We present a system to computationally
synthesize shallow depth-of-field images with a single mobile camera and a
single button press. If the image is of a person, we use a person segmentation
network to separate the person and their accessories from the background. If
available, we also use dense dual-pixel auto-focus hardware, effectively a
2-sample light field with an approximately 1 millimeter baseline, to compute a
dense depth map. These two signals are combined and used to render a defocused
image. Our system can process a 5.4 megapixel image in 4 seconds on a mobile
phone, is fully automatic, and is robust enough to be used by non-experts. The
modular nature of our system allows it to degrade naturally in the absence of a
dual-pixel sensor or a human subject.
 
 
 
 
 
Burst photography for high dynamic range and low-light imaging on mobile cameras,
Samuel W. Hasinoff,
Dillon Sharlet,
Ryan Geiss,
Andrew Adams,
Jonathan T. Barron,
Florian Kainz,
Jiawen Chen,
Marc Levoy
Proc SIGGRAPH Asia 2016.
Click here for
Supplemental material,
and for an
archive of burst photography data
and a
blog about it.
And here for a blog about
Live HDR+,
the real-time, learning-based approximation of HDR+ used to make
Pixel 4's viewfinder WYSIWYG relative to the final HDR+ photo.
And a 2021 blog about adding
bracketing to HDR+.
 
Cell phone cameras have small apertures, which limits the number of photons
they can gather, leading to noisy images in low light. They also have small
sensor pixels, which limits the number of electrons each pixel can store,
leading to limited dynamic range. We describe a computational photography
pipeline that captures, aligns, and merges a burst of frames to reduce noise
and increase dynamic range. Our solution differs from previous HDR systems in
several ways. First, we do not use bracketed exposures. Instead, we capture
frames of constant exposure, which makes alignment more robust, and we set this
exposure low enough to avoid blowing out highlights. The resulting merged image
has clean shadows and high bit depth, allowing us to apply standard HDR tone
mapping methods. Second, we begin from Bayer raw frames rather than the
demosaicked RGB (or YUV) frames produced by hardware Image Signal Processors
(ISPs) common on mobile platforms. This gives us more bits per pixel and allows
us to circumvent the ISP's unwanted tone mapping and spatial denoising. Third,
we use a novel FFT-based alignment algorithm and a hybrid 2D/3D Wiener filter
to denoise and merge the frames in a burst. Our implementation is built atop
Android's Camera2 API, which provides per-frame camera control and access to
raw imagery, and is written in the Halide domain-specific language (DSL). It
runs in 4 seconds on device (for a 12 Mpix image), requires no user
intervention, and ships on several mass-produced cell phones.
 
 
 
 
Simulating the Visual Experience of Very Bright and Very Dark Scenes,
 David E. Jacobs,
 
Orazio Gallo,
 Emily A. Cooper,
 Kari Pulli,
 
Marc Levoy
ACM Transactions on Graphics 34(3),
April 2015.
 
The human visual system can operate in a wide range of illumination levels, due
to several adaptation processes working in concert. For the most part, these
adaptation mechanisms are transparent, leaving the observer unaware of his or
her absolute adaptation state. At extreme illumination levels, however, some of
these mechanisms produce perceivable secondary effects, or epiphenomena. In
bright light, these include bleaching afterimages and adaptation afterimages,
while in dark conditions these include desaturation, loss of acuity, mesopic
hue shift, and the Purkinje effect. In this work we examine whether displaying
these effects explicitly can be used to extend the apparent dynamic range of a
conventional computer display. We present phenomenological models for each
effect, we describe efficient computer graphics methods for rendering our
models, and we propose a gaze-adaptive display that injects the effects into
imagery on a standard computer monitor. Finally, we report the results of
psychophysical experiments, which reveal that while mesopic epiphenomena are a
strong cue that a stimulus is very dark, afterimages have little impact on
perception that a stimulus is very bright.
 
 
 
 
HDR+: Low Light and High Dynamic Range photography in the Google Camera App
Marc Levoy
Google Research blog ,
October 27, 2014.
See also this
SIGGRAPH Asia 2016 paper.
 
As anybody who has tried to use a smartphone to photograph a dimly lit scene
knows, the resulting pictures are often blurry or full of random variations in
brightness from pixel to pixel, known as image noise. Equally frustrating are
smartphone photographs of scenes where there is a large range of brightness
levels, such as a family photo backlit by a bright sky. In high dynamic range
(HDR) situations like this, photographs will either come out with an
overexposed sky (turning it white) or an underexposed family (turning them into
silhouettes). HDR+ is a feature in the Google Camera app for Nexus 5 and Nexus
6 that uses computational photography to help you take better pictures in these
common situations. When you press the shutter button, HDR+ actually captures a
rapid burst of pictures, then quickly combines them into one. This improves
results in both low-light and high dynamic range situations. [In this article]
we delve into each case and describe how HDR+ works to produce a better
picture.
 
 
 
 
Gyro-Based Multi-Image Deconvolution for Removing Handshake Blur,
 Sung Hee Park,
 
Marc Levoy
Proc. CVPR 2014
Click here for the associated tech report on
handling moving objects and over-exposed regions.
 
Image deblurring to remove blur caused by camera shake has been intensively
studied. Nevertheless, most methods are brittle and computationally
expensive. In this paper we analyze multi-image approaches, which capture and
combine multiple frames in order to make deblurring more robust and
tractable. In particular, we compare the performance of two approaches:
align-and-average and multi-image deconvolution. Our deconvolution is
non-blind, using a blur model obtained from real camera motion as measured by a
gyroscope. We show that in most situations such deconvolution outperforms
align-and-average. We also show, perhaps surprisingly, that deconvolution does
not benefit from increasing exposure time beyond a certain threshold. To
demonstrate the effectiveness and efficiency of our method, we apply it to
still-resolution imagery of natural scenes captured using a mobile camera with
flexible camera control and an attached gyroscope.
 
 
 
 
WYSIWYG Computational Photography via Viewfinder Editing,
 Jongmin Baek,
 
Dawid Pająk,
 
Kihwan Kim,
 
Kari Pulli,
 
Marc Levoy
ACM Transactions on Graphics 
(Proc. 
SIGGRAPH Asia 2013)
 
Digital cameras with electronic viewfinders provide a relatively faithful
depiction of the final image, providing a WYSIWYG experience. If, however, the
image is created from a burst of differently captured images, or non-linear
interactive edits significantly alter the final outcome, then the photographer
cannot directly see the results, but instead must imagine the post-processing
effects. This paper explores the notion of viewfinder editing, which makes the
viewfinder more accurately reflect the final image the user intends to
create. We allow the user to alter the local or global appearance (tone, color,
saturation, or focus) via stroke-based input, and propagate the edits
spatiotemporally. The system then delivers a real-time visualization of these
modifications to the user, and drives the camera control routines to select
better capture parameters.
 
 
 
 
Applications of Multi-Bucket Sensors to Computational Photography,
 Gordon Wan,
 
Mark Horowitz,
 
Marc Levoy
Stanford Computer Graphics Laboratory Technical Report 2012-2
Later appeared in
IEEE JSSC, Vol. 47, No. 4, April 2012.
 
Many computational photography techniques take the form, "Capture a burst of
images varying camera setting X (exposure, gain, focus, lighting), then align
and combine them to produce a single photograph exhibiting better Y (dynamic
range, signal-to-noise, depth of field). Unfortunately, these techniques may
fail on moving scenes because the images are captured sequentially, so objects
are in different positions in each image, and robust local alignment is
difficult to achieve. To overcome this limitation, we propose using
multi-bucket sensors, which allow the images to be captured in
time-slice-interleaved fashion. This interleaving produces images with nearly
identical positions for moving objects, making alignment unnecessary. To test
our proposal, we have designed and fabricated a 4-bucket, VGA-resolution CMOS
image sensor, and we have applied it to high dynamic range (HDR)
photography. Our sensor permits 4 different exposures to be captured at once
with no motion difference between the exposures. Also, since our protocol
employs non-destructive analog addition of time slices, it requires less total
capture time than capturing a burst of images, thereby reducing total motion
blur. Finally, we apply our multi-bucket sensor to several other computational
photography applications, including flash/no-flash, multi-flash, and flash
matting.
 
 
 
 
Focal stack compositing for depth of field control,
 David E. Jacobs,
 
Jongmin Baek,
 
Marc Levoy
Stanford Computer Graphics Laboratory Technical Report 2012-1
 
Many cameras provide insufficient control over depth of field. Some have a
fixed aperture; others have a variable aperture that is either too small or too
large to produce the desired amount of blur. To overcome this limitation, one
can capture a focal stack, which is a collection of images each focused at a
different depth, then combine these slices to form a single composite that
exhibits the desired depth of field. In this paper, we present a theory of
focal stack compositing, and algorithms for computing images with extended
depth of field, shallower depth of field than the lens aperture naturally
provides, or even freeform (non-physical) depth of field. We show that while
these composites are subject to halo artifacts, there is a principled
methodology for avoiding these artifacts - by feathering a slice selection map
according to certain rules before computing the composite image.
 
 
 
 
Halide: decoupling algorithms from schedules for high performance image processing,
 Jonathan Ragan-Kelley,
 
Andrew Adams,
 Connelly Barnes, Dillon Sharlet,
 
Sylvain Paris,
 
Marc Levoy,
 
Saman Amarasinghe,
 
Fredo Durand
CACM, January 2018, with an introductory
technical perspective by Manuel Chakravarty.
See also original paper in SIGGRAPH 2012 (entry below this one).
 
Writing high-performance code on modern machines requires not just locally
optimizing inner loops, but globally reorganizing compu- tations to exploit
parallelism and locality—doing things like tiling and blocking whole pipelines
to fit in cache. This is especially true for image processing pipelines, where
individual stages do much too little work to amortize the cost of loading and
storing results to and from off-chip memory. As a result, the performance
differ- ence between a naive implementation of a pipeline and one globally
optimized for parallelism and locality is often an order of mag-
nitude. However, using existing programming tools, writing high- performance
image processing code requires sacrificing simplicity, portability, and
modularity. We argue that this is because traditional programming models
conflate what computations define the algo- rithm, with decisions about storage
and the order of computation, which we call the schedule. We propose a new
programming language for image process- ing pipelines, called Halide, that
separates the algorithm from its schedule. Programmers can change the schedule
to express many possible organizations of a single algorithm. The Halide
compiler automatically synthesizes a globally combined loop nest for an en-
tire algorithm, given a schedule. Halide models a space of schedules which is
expressive enough to describe organizations that match or outperform
state-of-the-art hand-written implementations of many computational photography
and computer vision algorithms. Its model is simple enough to do so often in
only a few lines of code, and small changes generate efficient implementations
for x86 and ARM multicores, GPUs, and specialized image processors, all from a
single algorithm. Halide has been public and open source for over four years,
during which it has been used by hundreds of programmers to deploy code to tens
of thousands of servers and hundreds of millions of phones, processing billions
of images every day.
 
 
 
 
 
CMOS Image Sensors With Multi-Bucket Pixels for Computational Photography,
 Gordon Wan,
 Xiangli Li,
 Gennadiy Agranov,
 
Marc Levoy,
 
Mark Horowitz
IEEE Journal of Solid-State Circuits,
Vol. 47, No. 4, April, 2012, pp. 1031-1042.
 
This paper presents new image sensors with multi-bucket pixels that enable
time-multiplexed exposure, an alternative imaging approach. This approach deals
nicely with scene motion, and greatly improves high dynamic range imaging,
structured light illumination, motion corrected photography, etc. To implement
an in-pixel memory or a bucket, the new image sensors incorporate the virtual
phase CCD concept into a standard 4-transistor CMOS imager pixel. This design
allows us to create a multi-bucket pixel which is compact, scalable, and
supports true correlated double sampling to cancel kTC noise. Two image
sensors with dual and quad-bucket pixels have been designed and fabricated. The
dual-bucket sensor consists of a array of 5.0 m pixel in 0.11 m CMOS technology
while the quad-bucket sensor comprises array of 5.6 m pixel in 0.13 m CMOS
technology. Some computational photography applications were implemented using
the two sensors to demonstrate their values in eliminating artifacts that
currently plague computational photography.
 
 
 
 
Digital Video Stabilization and Rolling Shutter Correction using Gyroscopes,
 Alexandre Karpenko,
 
David E. Jacobs,
 
Jongmin Baek,
 
Marc Levoy
Stanford Computer Science Tech Report CSTR 2011-03,
September, 2011.
Click here for the
source code.
 
In this paper we present a robust, real-time video stabilization and rolling
shutter correction technique based on commodity gyroscopes. First, we develop a
unified algorithm for modeling camera motion and rolling shutter warping. We
then present a novel framework for automatically calibrating the gyroscope and
camera outputs from a single video capture. This calibration allows us to use
only gyroscope data to effectively correct rolling shutter warping and to
stabilize the video. Using our algorithm, we show results for videos featuring
large moving foreground objects, parallax, and low-illumination. We also
compare our method with commercial image-based stabilization algorithms. We
find that our solution is more robust and computationally inexpensive. Finally,
we implement our algorithm directly on a mobile phone. We demonstrate that by
using the phone's inbuilt gyroscope and GPU, we can remove camera shake and
rolling shutter artifacts in real-time.
 
 
 
 
Experimental Platforms for Computational Photography
 Marc Levoy
IEEE Computer Graphics and Applications,
Vol. 30, No. 5, September/October, 2010, pp. 81-87.
If you're looking for our SIGGRAPH 2010 paper on the Frankencamera,
it's the next paper on this web page.
 
Although interest in computational photography has steadily increased among
graphics and vision researchers, few of these techniques have found their way
into commercial cameras. In this article I offer several possible
explanations, including barriers to entry that arise from the current structure
of the photography industry, and an incompleteness and lack of robustness in
current computational photography techniques. To begin addressing these
problems, my laboratory has designed an open architecture for programmable
cameras (called Frankencamera), an API (called FCam) with bindings for C++, and
two reference implementations: a Nokia N900 smartphone with a modified software
stack and a custom camera called the Frankencamera F2. Our short-term goal is
to standardize this architecture and distribute our reference platforms to
researchers and students worldwide. Our long-term goal is to help create an
open-source camera community, leading eventually to commercial cameras that
accept plugins and apps. I discuss the steps that might be needed to bootstrap
this community, including scaling up the world's educational programs in
photographic technology. Finally, I talk about some of future research
challenges in computational photography.
 
 
 
 
The Frankencamera: An Experimental Platform for Computational Photography
 Andrew Adams,
 
	Eino-Ville (Eddy) Talvala,
 
Sung Hee Park,
 
David E. Jacobs,
 
Boris Ajdin,
 
	Natasha Gelfand,
 
Jennifer Dolson,
 
Daniel Vaquero,
 
Jongmin Baek,
 
Marius Tico,
 
	Henrik P.A. Lensch,
 
Wojciech Matusik,
 
Kari Pulli,
 
Mark Horowitz,
 
Marc Levoy
Proc. 
SIGGRAPH 2010.
Reprinted in 
CACM, November 2012, with an introductory
technical perspective by Richard Szeliski
If you're looking for our release of the FCam API for the camera on the Nokia
N900 smartphone, click 
here.
 
Although there has been much interest in computational photography within the
research and photography communities, progress has been hampered by the lack of
a portable, programmable camera with sufficient image quality and computing
power. To address this problem, we have designed and implemented an open
architecture and API for such cameras: the Frankencamera. It consists of a base
hardware specification, a software stack based on Linux, and an API for
C++. Our architecture permits control and synchronization of the sensor and
image processing pipeline at the microsecond time scale, as well as the ability
to incorporate and synchronize external hardware like lenses and flashes. This
paper specifies our architecture and API, and it describes two reference
implementations we have built. Using these implementations we demonstrate six
computational photography applications: HDR viewfinding and capture, low-light
viewfinding and capture, automated acquisition of extended dynamic range
panoramas, foveal imaging, IMU-based hand shake detection, and
rephotography. Our goal is to standardize the architecture and distribute
Frankencameras to researchers and students, as a step towards creating a
community of photographer-programmers who develop algorithms, applications, and
hardware for computational cameras.
 
 
 
 
Gaussian KD-Trees for Fast High-Dimensional Filtering
 Andrew Adams
 Natasha Gelfand,
 
Jennifer Dolson,
 
Marc Levoy
ACM Transactions on Graphics 28(3),
Proc. SIGGRAPH 2009
A follow-on paper, which filters in high-D using a
permutohedral lattice, was runner-up for 
best paper
at Eurographics 2010.
 
We propose a method for accelerating a broad class of non-linear filters that
includes the bilateral, non-local means, and other related filters. These
filters can all be expressed in a similar way: First, assign each value to be
filtered a position in some vector space. Then, replace every value with a
weighted linear combination of all val- ues, with weights determined by a
Gaussian function of distance between the positions. If the values are pixel
colors and the posi- tions are (x, y) coordinates, this describes a Gaussian
blur. If the positions are instead (x, y, r, g, b) coordinates in a
five-dimensional space-color volume, this describes a bilateral filter. If we
instead set the positions to local patches of color around the associated
pixel, this describes non-local means. We describe a Monte-Carlo kd- tree
sampling algorithm that efficiently computes any filter that can be expressed
in this way, along with a GPU implementation of this technique. We use this
algorithm to implement an accelerated bilat- eral filter that respects full 3D
color distance; accelerated non-local means on single images, volumes, and
unaligned bursts of images for denoising; and a fast adaptation of non-local
means to geome- try. If we have n values to filter, and each is assigned a
position in a d-dimensional space, then our space complexity is O(dn) and our
time complexity is O(dn log n), whereas existing methods are typically either
exponential in d or quadratic in n.
 
 
 
 
Spatially Adaptive Photographic Flash
 Rolf Adelsberger, Remo Ziegler,
 
Marc Levoy,
 Markus Gross
Technical Report 612, ETH Zurich, Institute of Visual Computing,
December 2008.
 
Using photographic flash for candid shots often results in an unevenly lit
scene, in which objects in the back appear dark. We describe a spatially
adaptive photographic flash system, in which the intensity of illumination
varies depending on the depth and reflectivity of features in the scene. We
adapt to changes in depth using a single-shot method, and to changes in
reflectivity using a multi-shot method. The single-shot method requires only
a depth image, whereas the multi-shot method requires at least one color image
in addition to the depth data. To reduce noise in our depth images, we present
a novel filter that takes into account the amplitude-dependent noise
distribution of observed depth values. To demonstrate our ideas, we have built
a prototype consisting of a depth camera, a flash light, an LCD and a
lens. By attenuating the flash using the LCD, a variety of illumination
effects can be achieved.
 
 
 
 
Veiling Glare in High Dynamic Range Imaging
 Eino-Ville (Eddy)
 Talvala,
 
Andrew Adams,
 
Mark Horowitz,
 
Marc Levoy
ACM Transactions on Graphics 26(3),
Proc. SIGGRAPH 2007
 
The ability of a camera to record a high dynamic range image, whether by taking
one snapshot or a sequence, is limited by the presence of veiling glare - the
tendency of bright objects in the scene to reduce the contrast everywhere
within the field of view. Veiling glare is a global illumination effect that
arises from multiple scattering of light inside the camera's optics, body, and
sensor. By measuring separately the direct and indirect components of the
intra-camera light transport, one can increase the maximum dynamic range a
particular camera is capable of recording. In this paper, we quantify the
presence of veiling glare and related optical artifacts for several types of
digital cameras, and we describe two methods for removing them: deconvolution
by a measured glare spread function, and a novel direct-indirect separation of
the lens transport using a structured occlusion mask. By physically blocking
the light that contributes to veiling glare, we attain significantly higher
signal to noise ratios than with deconvolution. Finally, we demonstrate our
separation method for several combinations of cameras and realistic scenes.
 
 
 
 
Interactive Design of Multi-Perspective Images
for Visualizing Urban Landscapes
Augusto Román,
Gaurav Garg,
Marc Levoy
Proc. Visualization 2004
This project was the genesis of Google's StreetView;
see this
historical note for details.
In a follow-on paper in
EGSR 2006, Augusto Román and Hendrik Lensch describe
an automatic way to compute these multi-perspective panoramas.
 
Multi-perspective images are a useful representation of extended, roughly
planar scenes such as landscapes or city blocks. However, constructing
effective multi-perspective images is something of an art. In this paper, we
describe an interactive system for creating multi-perspective images composed
of serially blended cross-slits mosaics. Beginning with a sideways-looking
video of the scene as might be captured from a moving vehicle, we allow the
user to interactively specify a set of cross-slits cameras, possibly with gaps
between them. In each camera, one of the slits is defined to be the camera
path, which is typically horizontal, and the user is left to choose the second
slit, which is typically vertical. The system then generates intermediate views
between these cameras using a novel interpolation scheme, thereby producing a
multi-perspective image with no seams. The user can also choose the picture
surface in space onto which viewing rays are projected, thereby establishing a
parameterization for the image. We show how the choice of this surface can be
used to create interesting visual effects. We demonstrate our system by
constructing multi-perspective images that summarize city blocks, including
corners, blocks with deep plazas and other challenging urban situations.
 
 
 
Computational microscopy
 
 
 
Neuronal Dynamics Regulating Brain and Behavioral State Transitions
 Aaron S. Andalman,Vanessa M. Burns, Matthew Lovett-Barron,
 Michael Broxton, Ben Poole, Samuel J. Yang, Logan Grosenick,
 Talia N. Lerner, Ritchie Chen, Tyler Benster, Philippe Mourrain,
 
Marc Levoy,
 Kanaka Rajan,
 
Karl Deisseroth
Cell 177, May 2, 2019, pp. 970-985.
https://doi.org/10.1016/j.cell.2019年02月03日7.
 
Prolonged behavioral challenges can cause animals to switch from active to
passive coping strategies to manage effort-expenditure during stress; such
normally adaptive behavioral state transitions can become maladaptive in
psychiatric disorders such as depression. The underlying neuronal dynamics and
brainwide interactions important for passive coping have remained
unclear. Here, we develop a paradigm to study these behavioral state
transitions at cellular-resolution across the entire vertebrate brain. Using
brainwide imaging in zebrafish, we observed that the transition to passive
coping is manifested by progressive activation of neurons in the ventral
(lateral) habenula. Activation of these ventral-habenula neurons suppressed
downstream neurons in the serotonergic raphe nucleus and caused behavioral
passivity, whereas inhibition of these neurons prevented passivity. Data-driven
recurrent neural network modeling pointed to altered intra-habenula
interactions as a contributory mechanism. These results demonstrate ongoing
encoding of experience features in the habenula, which guides recruitment of
downstream networks and imposes a passive coping behavioral strategy.
 
 
 
 
Identification of cellular-activity dynamics across large tissue volumes in the
mammalian brain
 Logan Grosenick,
 
Michael Broxton,
 Christina K. Kim, Conor Liston,
 Ben Poole, Samuel Yang, Aaron Andalman, Edward Scharff, Noy Cohen,
 Ofer Yizhar, Charu Ramakrishnan, Surya Ganguli, Patrick Suppes,
 
Marc Levoy,
 
Karl Deisseroth
bioRxiv, May 1, 2017,
http://dx.doi.org/10.1101/132688.
 
Tracking the coordinated activity of cellular events through volumes of intact
tissue is a major challenge in biology that has inspired significant
technological innovation. Yet scanless measurement of the high-speed activity
of individual neurons across three dimensions in scattering mammalian tissue
remains an open problem. Here we develop and validate a computational imaging
approach (SWIFT) that integrates high-dimensional, structured statistics with
light field microscopy to allow the synchronous acquisition of single-neuron
resolution activity throughout intact tissue volumes as fast as a camera can
capture images (currently up to 100 Hz at full camera resolution), attaining
rates needed to keep pace with emerging fast calcium and voltage sensors. We
demonstrate that this large field-of-view, single-snapshot volume acquisition
method - -which requires only a simple and inexpensive modification to a
standard fluorescence microscope - -enables scanless capture of coordinated
activity patterns throughout mammalian neural volumes. Further, the volumetric
nature of SWIFT also allows fast in vivo imaging, motion correction, and cell
identification throughout curved subcortical structures like the dorsal
hippocampus, where cellular-resolution dynamics spanning hippocampal subfields
can be simultaneously observed during a virtual context learning task in a
behaving animal. SWIFT’s ability to rapidly and easily record from volumes of
many cells across layers opens the door to widespread identification of
dynamical motifs and timing dependencies among coordinated cell assemblies
during adaptive, modulated, or maladaptive physiological processes in neural
systems.
 
 
 
 
Enhancing the performance of the light field microscope using wavefront coding
 Noy Cohen, Samuel Yang, Aaron Andalman,
 
Michael Broxton,
 Logan Grosenick,
 
Karl Deisseroth,
 
Mark Horowitz,
 
Marc Levoy
Optics Express, Vol. 22, Issue 20 (2014).
 
Light field microscopy has been proposed as a new high-speed volumetric
computational imaging method that enables reconstruction of 3-D volumes from
captured projections of the 4-D light field. Recently, a detailed physical
optics model of the light field microscope has been derived, which led to the
development of a deconvolution algorithm that reconstructs 3-D volumes with
high spatial resolution. However, the spatial resolution of the reconstructions
has been shown to be non-uniform across depth, with some z planes showing high
resolution and others, particularly at the center of the imaged volume, showing
very low resolution. In this paper, we enhance the performance of the light
field microscope using wavefront coding techniques. By including phase masks in
the optical path of the microscope we are able to address this non-uniform
resolution limitation. We have also found that superior control over the
performance of the light field microscope can be achieved by using two phase
masks rather than one, placed at the objective's back focal plane and at the
microscope's native image plane. We present an extended optical model for our
wavefront coded light field microscope and develop a performance metric based
on Fisher information, which we use to choose adequate phase masks
parameters. We validate our approach using both simulated data and experimental
resolution measurements of a USAF 1951 resolution target; and demonstrate the
utility for biological applications with in vivo volumetric calcium imaging of
larval zebrafish brain.
 
 
 
 
Wave Optics Theory and 3-D Deconvolution for the Light Field Microscope
 Michael Broxton,
 Logan Grosenick, Samuel Yang, Noy Cohen, Aaron Andalman,
 
Karl Deisseroth,
 
Marc Levoy
Optics Express, Vol. 21, Issue 21, pp. 25418-25439 (2013).
 
Light field microscopy is a new technique for high-speed volumetric imaging of
weakly scattering or fluorescent specimens. It employs an array of microlenses
to trade off spatial resolution against angular resolution, thereby allowing a
4-D light field to be captured using a single photographic exposure without the
need for scanning. The recorded light field can then be used to computationally
reconstruct a full volume. In this paper, we present an optical model for light
field microscopy based on wave optics, instead of previously reported ray
optics models. We also present a 3-D deconvolution method for light field
microscopy that is able to reconstruct volumes at higher spatial resolution,
and with better optical sectioning, than previously reported. To accomplish
this, we take advantage of the dense spatio-angular sampling provided by a
microlens array at axial positions away from the native object plane. This
dense sampling permits us to decode aliasing present in the light field to
reconstruct high-frequency information. We formulate our method as an inverse
problem for reconstructing the 3-D volume, which we solve using a
GPU-accelerated iterative algorithm. Theoretical limits on the depth-dependent
lateral resolution of the reconstructed volumes are derived. We show that these
limits are in good agreement with experimental results on a standard USAF 1951
resolution target. Finally, we present 3-D reconstructions of pollen grains
that demonstrate the improvements in fidelity made possible by our method.
 
 
 
 
Recording and controlling the 4D light field in a microscope
 Marc Levoy,
 
Zhengyun Zhang,
 Ian McDowall
Journal of Microscopy, Volume 235, Part 2, 2009, pp. 144-162.
Cover article.
 
By inserting a microlens array at the intermediate image plane of an optical
microscope, one can record 4D light fields of biological specimens in a single
snapshot. Unlike a conventional photograph, light fields permit manipulation
of viewpoint and focus after the snapshot has been taken, subject to the
resolution of the camera and the diffraction limit of the optical system. By
inserting a second microlens array and video projector into the microscope's
illumination path, one can control the incident light field falling on the
specimen in a similar way. In this paper we describe a prototype system we
have built that implements these ideas, and we demonstrate two applications for
it: simulating exotic microscope illumination modalities and correcting for
optical aberrations digitally.
 
 
 
 
Light Field Microscopy
 Marc Levoy,
 
Ren Ng,
 
Andrew Adams,
 Matthew Footer,
 
Mark Horowitz
ACM Transactions on Graphics 25(3),
Proc. SIGGRAPH 2006
An additional technical memo containing optical recipes and an
extension to microscopes with infinity-corrected optics.
 
By inserting a microlens array into the optical train of a conventional
microscope, one can capture light fields of biological specimens in a single
photograph. Although diffraction places a limit on the product of spatial and
angular resolution in these light fields, we can nevertheless produce useful
perspective views and focal stacks from them. Since microscopes are inherently
orthographic devices, perspective views represent a new way to look at
microscopic specimens. The ability to create focal stacks from a single
photograph allows moving or light-sensitive specimens to be recorded. Applying
3D deconvolution to these focal stacks, we can produce a set of cross sections,
which can be visualized using volume rendering. In this paper, we demonstrate
a prototype light field microscope (LFM), analyze its optical performance, and
show perspective views, focal stacks, and reconstructed volumes for a variety
of biological specimens. We also show that synthetic focusing followed by 3D
deconvolution is equivalent to applying limited-angle tomography directly to
the 4D light field.
 
 
 
Light fields
 
(papers on camera arrays are farther down)
 
 
 
Unstructured Light Fields
 Abe Davis,
 
Fredo Durand,
 
Marc Levoy
Computer Graphics Forum (Proc. Eurographics),
Volume 31, Number 2, 2012.
 
We present a system for interactively acquiring and rendering light fields
using a hand-held commodity camera. The main challenge we address is assisting
a user in achieving good coverage of the 4D domain despite the challenges of
hand-held acquisition. We define coverage by bounding reprojection error
between viewpoints, which accounts for all 4 dimensions of the light field. We
use this criterion together with a recent Simultaneous Localization and
Mapping technique to compute a coverage map on the space of viewpoints. We
provide users with real-time feedback and direct them toward under-sampled
parts of the light field. Our system is lightweight and has allowed us to
capture hundreds of light fields. We further present a new rendering algorithm
that is tailored to the unstructured yet dense data we capture. Our method can
achieve piecewise-bicubic reconstruction using a triangulation of the captured
viewpoints and subdivision rules applied to reconstruction weights.
 
 
 
 
Wigner Distributions and How They Relate to the Light Field
 Zhengyun Zhang,
 
Marc Levoy
IEEE International Conference on Computational Photography (ICCP) 2009
Best Paper award
 
In wave optics, the Wigner distribution and its Fourier dual, the ambiguity
function, are important tools in optical system simulation and analysis. The
light field fulfills a similar role in the computer graphics community. In this
paper, we establish that the light field as it is used in computer graphics is
equivalent to a smoothed Wigner distribution and that these are equivalent to
the raw Wigner distribution under a geometric optics approximation. Using this
insight, we then explore two recent contributions: Fourier slice photography in
computer graphics and wavefront coding in optics, and we examine the similarity
between explanations of them using Wigner distributions and explanations of
them using light fields. Understanding this long-suspected equivalence may lead
to additional insights and the productive exchange of ideas between the two
fields.
 
 
 
 
Flexible Multimodal Camera Using a Light Field Architecture
 Roarke Horstmeyer, Gary Euliss, Ravindra Athale,
 
Marc Levoy
IEEE International Conference on Computational Photography (ICCP) 2009
 
We present a modified conventional camera that is able to collect multimodal
images in a single exposure. Utilizing a light field architecture in
conjunction with multiple filters placed in the pupil plane of a main lens, we
are able to digitally reconstruct synthetic images containing specific
spectral, polarimetric, and other optically filtered data. The ease with which
these filters can be exchanged and reconfigured provides a high degree of
flexibility in the type of information that can be collected with each
image. This paper explores the various tradeoffs involved in implementing a
pinhole array in parallel with a pupil-plane filter array to measure
multi-dimensional optical data from a scene. It also examines the design space
of a pupil-plane filter array layout. Images are shown from different
multimodal filter layouts, and techniques to maximize resolution and minimize
error in the synthetic images are proposed.
 
 
 
 
Combining Confocal Imaging and Descattering
 Christian Fuchs, Michael Heinz,
 
Marc Levoy,
 
Hendrik P.A. Lensch
Eurographics Symposium on Rendering (EGSR) 2008
 
In translucent objects, light paths are affected by multiple scattering, which
is polluting any observation. Confocal imaging reduces the inï¬uence of such
global illumination effects by carefully focusing illumination and viewing rays
from a large aperture to a speciï¬c location within the object volume. The
selected light paths still contain some global scattering contributions,
though. Descattering based on high frequency illumination serves the same
purpose. It removes the global component from observed light paths. We
demonstrate that confocal imaging and descattering are orthogonal and propose a
novel descattering protocol that analyzes the light transport in a neighborhood
of light transport paths. In combination with confocal imaging, our
descattering method achieves optical sectioning in translucent media with
higher contrast and better resolution.
 
 
 
 
General Linear Cameras with Finite Aperture
 Andrew Adams and
 
Marc Levoy
Eurographics Symposium on Rendering (EGSR) 2007
 
A pinhole camera selects a two-dimensional set of rays from the
four-dimensional light field. Pinhole cameras are a type of general linear
camera, defined as planar 2D slices of the 4D light field. Cameras with finite
apertures can be considered as the summation of a collection of pinhole
cameras. In the limit they evaluate a two-dimensional integral of the
four-dimensional light field. Hence a general linear camera with finite
aperture factors the 4D light field into two integrated dimensions and two
imaged dimensions. We present a simple framework for representing these slices
and integral projections, based on certain eigenspaces in a two-plane
parameterization of the light field. Our framework allows for easy analysis of
focus and perspective, and it demonstrates their dual nature. Using our
framework, we present analogous taxonomies of perspective and focus, placing
within them the familiar perspective, orthographic, cross-slit, and bilinear
cameras; astigmatic and anastigmatic focus; and several other varieties of
perspective and focus.
 
 
 
 
Light Fields and Computational Imaging
 Marc Levoy
IEEE Computer, August 2006
Includes links to the other four feature articles in that issue,
which was devoted to computational photography
 
A survey of the theory and practice of light field imaging, emphasizing the
devices researchers in computer graphics and computer vision have built to
capture light fields photographically and the techniques they have developed to
compute novel images from them.
 
 
 
 
Symmetric Photography : Exploiting Data-sparseness in Reflectance Fields
 Gaurav Garg,
 
Eino-Ville (Eddy)
 Talvala,
 
Marc Levoy,
 
Hendrik P.A. Lensch
Proc. 2006 Eurographics Symposium on Rendering
 
We present a novel technique called symmetric photography to capture real world
reflectance fields. The technique models the 8D reflectance field as a
transport matrix between the 4D incident light field and the 4D exitant light
field. It is a challenging task to acquire this transport matrix due to its
large size. Fortunately, the transport matrix is symmetric and often
data-sparse. Symmetry enables us to measure the light transport from two sides
simultaneously, from the illumination directions and the view
directions. Data-sparseness refers to the fact that sub-blocks of the matrix
can be well approximated using low-rank representations. We introduce the use
of hierarchical tensors as the underlying data structure to capture this
data-sparseness, specifically through local rank-1 factorizations of the
transport matrix. Besides providing an efficient representation for storage, it
enables fast acquisition of the approximated transport matrix and fast
rendering of images from the captured matrix. Our prototype acquisition system
consists of an array of mirrors and a pair of coaxial projector and camera. We
demonstrate the effectiveness of our system with scenes rendered from
reflectance fields that were captured by our system. In these renderings we can
change the viewpoint as well as relight using arbitrary incident light fields.
 
 
 
 
Dual Photography
 Pradeep Sen,
 
Billy Chen,
 
Gaurav Garg,
 
Steve Marschner,
 
Mark Horowitz,
 
Marc Levoy,
 
Hendrik Lensch
ACM Transactions on Graphics 24(3),
Proc. SIGGRAPH 2005
 
We present a novel photographic technique called dual photography, which
exploits Helmholtz reciprocity to interchange the lights and cameras in a
scene. With a video projector providing structured illumination, reciprocity
permits us to generate pictures from the viewpoint of the projector, even
though no camera was present at that location. The technique is completely
image-based, requiring no knowledge of scene geometry or surface properties,
and by its nature automatically includes all transport paths, including
shadows, interreflections and caustics. In its simplest form, the technique can
be used to take photographs without a camera; we demonstrate this by capturing
a photograph using a projector and a photo-resistor. If the photo-resistor is
replaced by a camera, we can produce a 4D dataset that allows for relighting
with 2D incident illumination. Using an array of cameras we can produce a 6D
slice of the 8D reflectance field that allows for relighting with arbitrary
light fields. Since an array of cameras can operate in parallel without
interference, whereas an array of light sources cannot, dual photography is
fundamentally a more efficient way to capture such a 6D dataset than a system
based on multiple projectors and one camera. As an example, we show how dual
photography can be used to capture and relight scenes.
 
 
 
 
Light Field Photography with a Hand-Held Plenoptic Camera
 Ren Ng,
 
Marc Levoy,
 Mathieu Brédif, Gene Duval,
 
Mark Horowitz,
 
Pat Hanrahan
Stanford University Computer Science Tech Report CSTR 2005-02,
April 2005
The refocusing performance of this camera is analyzed in Ren Ng's SIGGRAPH 2005
paper, Fourier Slice Photography.
Ren's PhD dissertation, "Digital Light Field Photography," won the 2006
ACM Doctoral dissertation Award.
See also his startup company,
Lytro.
 
This paper presents a camera that samples the 4D light field on its sensor in a
single photographic exposure. This is achieved by inserting a microlens array
between the sensor and main lens, creating a plenoptic camera. Each microlens
measures not just the total amount of light deposited at that location, but how
much light arrives along each ray. By re-sorting the measured rays of light to
where they would have terminated in slightly different, synthetic cameras, we
can compute sharp photographs focused at different depths. We show that a
linear increase in the resolution of images under each microlens results in a
linear increase in the sharpness of the refocused photographs. This property
allows us to extend the depth of field of the camera without reducing the
aperture, enabling shorter exposures and lower image noise. Especially in the
macrophotography regime, we demonstrate that we can also compute synthetic
photographs from a range of different viewpoints. These capabilities argue for
a different strategy in designing photographic imaging systems.
To the photographer, the plenoptic camera operates exactly like an ordinary
hand-held camera. We have used our prototype to take hundreds of light field
photographs, and we present examples of portraits, high-speed action and macro
close-ups.
 
 
 
 
Interactive Deformation of Light Fields
 Billy Chen,
 
Eyal Ofek,
 
Harry Shum,
 
Marc Levoy
Proc. Symposium on Interactive 3D Graphics and Games (I3D) 2005
 
We present a software pipeline that enables an animator to deform light fields.
The pipeline can be used to deform complex objects, such as furry toys, while
maintaining photo-realistic quality. Our pipeline consists of three stages.
First, we split the light field into sub-light fields. To facilitate splitting
of complex objects, we employ a novel technique based on projected light
patterns. Second, we deform each sub-light field. To do this, we provide the
animator with controls similar to volumetric free-form deformation. Third, we
recombine and render each sub-light field. Our rendering technique properly
handles visibility changes due to occlusion among sub-light fields. To ensure
consistent illumination of objects after they have been deformed, our light
fields are captured with the light source fixed to the camera, rather than
being fixed to the object. We demonstrate our deformation pipeline using
synthetic and photographically acquired light fields. Potential applications
include animation, interior design, and interactive gaming.
 
 
 
 
Synthetic aperture confocal imaging
Marc Levoy,
Billy Chen,
Vaibhav Vaish,
Mark Horowitz,
Ian McDowall, Mark Bolas
ACM Transactions on Graphics 23(3),
Proc. SIGGRAPH 2004
About the
relationship between confocal imaging and
separation of direct and global reflections in 3D scenes.
An additional test
of 
underwater confocal imaging performed in a large
water tank at the Woods Hole Oceanographic Institution.
 
Confocal microscopy is a family of imaging techniques that employ focused
patterned illumination and synchronized imaging to create cross-sectional views
of 3D biological specimens. In this paper, we adapt confocal imaging to
large-scale scenes by replacing the optical apertures used in microscopy with
arrays of real or virtual video projectors and cameras. Our prototype
implementation uses a video projector, a camera, and an array of mirrors.
Using this implementation, we explore confocal imaging of partially occluded
environments, such as foliage, and weakly scattering environments, such as
murky water. We demonstrate the ability to selectively image any plane in a
partially occluded environment, and to see further through murky water than is
otherwise possible. By thresholding the confocal images, we extract mattes
that can be used to selectively illuminate any plane in the scene.
 
 
 
 
Light Field Rendering
Marc Levoy and 
Pat Hanrahan
Proc. SIGGRAPH 1996
About the 
similarity between this paper and the Lumigraph paper.
Download our LightPack software.
Or check out our archive of light fields, which includes
the first demonstration of
synthetic aperture focusing 
(a.k.a. digital refocusing) of a light field.
Included in 
Seminal Graphics Papers, Volume 2.
 
A number of techniques have been proposed for flying through scenes by
redisplaying previously rendered or digitized views. Techniques have also been
proposed for interpolating between views by warping input images, using
depth information or correspondences between multiple images. In this paper,
we describe a simple and robust method for generating new views from arbitrary
camera positions without depth information or feature matching, simply by
combining and resampling the available images. The key to this technique lies
in interpreting the input images as 2D slices of a 4D function - the light
field. This function completely characterizes the flow of light through
unobstructed space in a static scene with fixed illumination.
We describe a sampled representation for light fields that allows for both
efficient creation and display of inward and outward looking views. We have
created light fields from large arrays of both rendered and digitized images.
The latter are acquired using a video camera mounted on a computer-controlled
gantry. Once a light field has been created, new views may be constructed in
real time by extracting slices in appropriate directions. Since the success of
the method depends on having a high sample rate, we describe a compression
system that is able to compress the light fields we have generated by more than
a factor of 100:1 with very little loss of fidelity. We also address the
issues of antialiasing during creation, and resampling during slice extraction.
 
 
 
Camera arrays
 
 
 
Reconstructing Occluded Surfaces
using Synthetic Apertures: Stereo, Focus and Robust Measures
 
 Vaibhav Vaish,
 
Richard Szeliski,
 
C.L. Zitnick,
 
Sing Bing Kang,
 
Marc Levoy
Proc. CVPR 2006.
 
 
Most algorithms for 3D reconstruction from images use cost functions based on
SSD, which assume that the surfaces being reconstructed are visible to all
cameras. This makes it difcult to reconstruct objects which are partially
occluded. Recently, researchers working with large camera arrays have shown it
is possible to see through occlusions using a technique called synthetic
aperture focusing. This suggests that we can design alternative cost functions
that are robust to occlusions using synthetic apertures. Our paper explores
this design space. We compare classical shape from stereo with shape from
synthetic aperture focus. We also describe two variants of multi-view stereo
based on color medians and entropy that increase robustness to occlusions. We
present an experimental comparison of these cost functions on complex light
fields, measuring their accuracy against the amount of occlusion.
 
 
 
 
High Performance Imaging Using Large Camera Arrays
 Bennett Wilburn,
 
Neel Joshi,
 
Vaibhav Vaish,
 
Eino-Ville (Eddy)
 Talvala, Emilio Antunez,
 
Adam Barth,
 
Andrew Adams,
 
Marc Levoy,
 
Mark Horowitz
ACM Transactions on Graphics 24(3),
Proc. SIGGRAPH 2005
 
The advent of inexpensive digital image sensors, and the ability to create
photographs that combine information from a number of sensed images, is
changing the way we think about photography. In this paper, we describe a
unique array of 100 custom video cameras that we have built, and we summarize
our experiences using this array in a range of imaging applications. Our goal
was to explore the capabilities of a system that would be inexpensive to
produce in the future. With this in mind, we used simple cameras, lenses, and
mountings, and we assumed that processing large numbers of images would
eventually be easy and cheap. The applications we have explored include
approximating a conventional single center of projection video camera with high
performance along one or more axes, such as resolution, dynamic range, frame
rate, and/or large aperture, and using multiple cameras to approximate a video
camera with a large synthetic aperture. This permits us to capture a video
light eld, to which we can apply spatiotemporal view interpolation algorithms
in order to digitally simulate time dilation and camera motion. It also permits
us to create video sequences using custom non-uniform synthetic apertures.
 
 
 
 
Synthetic Aperture Focusing using a
Shear-Warp Factorization of the Viewing Transform
 Vaibhav Vaish,
 
Gaurav Garg,
 
Eino-Ville (Eddy)
 Talvala, Emilio Antunez,
 
Bennett Wilburn,
 
Mark Horowitz,
 
Marc Levoy
Proc. Workshop on Advanced 3D Imaging for Safety and Security
(A3DISS) 2005
(in conjunction with CVPR 2005)
 
Synthetic aperture focusing consists of warping and adding together the images
in a 4D light field so that objects lying on a specified surface are aligned
and thus in focus, while objects lying off this surface are misaligned and
hence blurred. This provides the ability to see through partial occluders such
as foliage and crowds, making it a potentially powerful tool for
surveillance. If the cameras lie on a plane, it has been previously shown that
after an initial homography, one can move the focus through a family of planes
that are parallel to the camera plane by merely shifting and adding the
images. In this paper, we analyze the warps required for tilted focal planes
and arbitrary camera configurations. We characterize the warps using a new
rank-1 constraint that lets us focus on any plane, without having to perform a
metric calibration of the cameras. We also show that there are camera
configurations and families of tilted focal planes for which the warps can be
factorized into an initial homography followed by shifts. This homography
factorization permits these tilted focal planes to be synthesized as
efficiently as frontoparallel planes. Being able to vary the focus by simply
shifting and adding images is relatively simple to implement in hardware and
facilitates a real-time implementation. We demonstrate this using an array of
30 video-resolution cameras; initial homographies and shifts are performed on
per-camera FPGAs, and additions and a final warp are performed on 3 PCs.
 
 
 
 
High Speed Video Using a Dense Camera Array
Bennett Wilburn,
Neel Joshi,
Vaibhav Vaish,
Marc Levoy,
Mark Horowitz
Proc. CVPR 2004
 
We demonstrate a system for capturing multi-thousand frame-per-second
(fps) video using a dense array of cheap 30fps CMOS image sensors. A
benefit of using a camera array to capture high speed video is that we
can scale to higher speeds by simply adding more cameras. Even at
extremely high frame rates, our array architecture supports continuous
streaming to disk from all of the cameras. This allows us to record
unpredictable events, in which nothing occurs before the event of
interest that could be used to trigger the beginning of recording.
Synthesizing one high speed video sequence using images from an array
of cameras requires methods to calibrate and correct those cameras'
varying radiometric and geometric properties. We assume that our scene
is either relatively planar or is very far away from the camera and
that the images can therefore be aligned using projective
transforms. We analyze the errors from this assumption and present
methods to make them less visually objectionable. We also present a
method to automatically color match our sensors. Finally, we
demonstrate how to compensate for spatial and temporal distortions
caused by the electronic rolling shutter, a common feature of low-end
CMOS sensors.
 
 
 
 
Using Plane + Parallax for Calibrating Dense Camera Arrays
Vaibhav Vaish,
Bennett Wilburn,
Neel Joshi,
Marc Levoy
Proc. CVPR 2004
 
A light field consists of images of a scene taken from different viewpoints.
Light fields are used in computer graphics for image-based rendering and
synthetic aperture photography, and in vision for recovering shape. In this
paper, we describe a simple procedure to calibrate camera arrays used to
capture light fields using a plane + parallax framework. Specifically, for the
case when the cameras lie on a plane, we show (i) how to estimate camera
positions up to an affine ambiguity, and (ii) how to reproject light field
images onto a family of planes using only knowledge of planar parallax for one
point in the scene. While planar parallax does not completely describe the
geometry of the light field, it is adequate for the first two applications
which, it turns out, do not depend on having a metric calibration of the light
field. Experiments on acquired light fields indicate that our method yields
than better results than full metric calibration.
 
 
 
Polygon meshes
 
 
 
Geometrically Stable Sampling for the ICP Algorithm
Natasha Gelfand,
Leslie Ikemoto,
Szymon Rusinkiewicz,
and 
Marc Levoy
Proc. 3DIM 2003
 
The Iterative Closest Point (ICP) algorithm is a widely used method for
aligning three-dimensional point sets. The quality of alignment obtained by
this algorithm depends heavily on choosing good pairs of corresponding points
in the two datasets. If too many points are chosen from featureless regions of
the data, the algorithm converges slowly, finds the wrong pose, or even
diverges, especially in the presence of noise or miscalibration in the input
data. In this paper, we describe a method for detecting uncertainty in pose,
and we propose a point selection strategy for ICP that minimizes this
uncertainty by choosing samples that constrain potential unstable
transformations.
 
 
 
 
A Hierarchical Method for Aligning Warped Meshes
Leslie Ikemoto,
Natasha Gelfand,
and 
Marc Levoy
Proc. 3DIM 2003
 
Current alignment algorithms for registering range data captured from a 3D
scanner assume that the range data depicts identical geometry taken from
different views. However, in the presence of scanner calibration errors, the
data will be slightly warped. These warps often cause current alignment
algorithms to converge slowly, find the wrong alignment, or even diverge. In
this paper, we present a method for aligning warped range data represented by
polygon meshes. Our strategy can be characterized as a coarse-to-fine
hierarchical approach, where we assume that since the warp is global, we can
compensate for it by treating each mesh as a collection of smaller piecewise
rigid sections, which can translate and rotate with respect to each other. We
split the meshes subject to several constraints, in order to ensure that the
resulting sections converge reliably.
 
 
 
 
Filling holes in complex surfaces using volumetric diffusion
James Davis, 
Steve Marschner, Matt Garr, and 
Marc Levoy
First International Symposium on 3D Data Processing, Visualization, Transmission, June, 2002.
Download our Volfill software.
 
We address the problem of building watertight 3D models from surfaces that
contain holes - for example, sets of range scans that observe most but not all
of a surface. We specifically address situations in which the holes are too
geometrically and topologically complex to fill using triangulation
algorithms. Our solution begins by constructing a signed distance function,
the zero set of which defines the surface. Initially, this function is
defined only in the vicinity of observed surfaces. We then apply a diffusion
process to extend this function through the volume until its zero set bridges
whatever holes may be present. If additional information is available, such
as known-empty regions of space inferred from the lines of sight to a 3D
scanner, it can be incorporated into the diffusion process. Our algorithm is
simple to implement, is guaranteed to produce manifold non-interpenetrating
surfaces, and is efficient to run on large datasets because computation is
limited to areas near holes.
 
 
 
 
Efficient Variants of the ICP Algorithm
Szymon Rusinkiewicz and 
Marc Levoy
Proc. 3DIM 2001
 
The ICP (Iterative Closest Point) algorithm is widely used for geometric
alignment of three-dimensional models when an initial estimate of the relative
pose is known. Many variants of ICP have been proposed, affecting all phases
of the algorithm from the selection and matching of points to the minimization
strategy. We enumerate and classify many of these variants, and evaluate their
effect on the speed with which the correct alignment is reached. In order to
improve convergence for nearly-flat meshes with small features, such as
inscribed surfaces, we introduce a new variant based on uniform sampling of the
space of normals. We conclude by proposing a combination of ICP variants
optimized for high speed. We demonstrate an implementation that is able to
align two range images in a few tens of milliseconds, assuming a good initial
guess. This capability has potential application to real-time 3D model
acquisition and model-based tracking.
 
 
 
 
Fitting Smooth Surfaces to Dense Polygon Meshes
Venkat Krishnamurthy and 
Marc Levoy
Proc. SIGGRAPH 1996
Winner of a 2001
Technical Academy Award.
 
Recent progress in acquiring shape from range data permits the acquisition of
seamless million-polygon meshes from physical models. In this paper, we
present an algorithm and system for converting dense irregular polygon meshes
of arbitrary topology into tensor product B-spline surface patches with
accompanying displacement maps. This choice of representation yields a coarse
but efficient model suitable for animation and a fine but more expensive model
suitable for rendering. The first step in our process consists of
interactively painting patch boundaries over a rendering of the mesh. In many
applications, interactive placement of patch boundaries is considered part of
the creative process and is not amenable to automation. The next step is
gridded resampling of each bounded section of the mesh. Our resampling
algorithm lays a grid of springs across the polygon mesh, then iterates between
relaxing this grid and subdividing it. This grid provides a parameterization
for the mesh section, which is initially unparameterized. Finally, we fit a
tensor product B-spline surface to the grid. We also output a displacement map
for each mesh section, which represents the error between our fitted surface
and the spring grid. These displacement maps are images; hence this
representation facilitates the use of image processing operators for
manipulating the geometric detail of an object. They are also compatible with
modern photo-realistic rendering systems. Our resampling and fitting steps are
fast enough to surface a million polygon mesh in under 10 minutes - important
for an interactive system.
 
 
 
 
A Volumetric Method for Building Complex Models from Range Images
Brian Curless and 
Marc Levoy
Proc. SIGGRAPH 1996
Download the VripPack library.
Or check out our archive of scanned 3D models.
Included in 
Seminal Graphics Papers, Volume 2.
 
A number of techniques have been developed for reconstructing surfaces
by integrating groups of aligned range images. A desirable set of
properties for such algorithms includes: incremental updating,
representation of directional uncertainty, the ability to fill gaps in
the reconstruction, and robustness in the presence of outliers. Prior
algorithms possess subsets of these properties. In this paper, we
present a volumetric method for integrating range images that
possesses all of these properties.
Our volumetric representation consists of a cumulative weighted signed
distance function. Working with one range image at a time, we first
scan-convert it to a distance function, then combine this with the
data already acquired using a simple additive scheme. To achieve
space efficiency, we employ a run-length encoding of the volume. To
achieve time efficiency, we resample the range image to align with the
voxel grid and traverse the range and voxel scanlines synchronously.
We generate the final manifold by extracting an isosurface from the
volumetric grid. We show that under certain assumptions, this
isosurface is optimal in the least squares sense. To fill gaps in the
model, we tessellate over the boundaries between regions seen to be
empty and regions never observed.
Using this method, we are able to integrate a large number of range
images (as many as 70) yielding seamless, high-detail models of up to
2.6 million triangles.
 
 
 
 
Zippered Polygon Meshes from Range Images
Greg Turk and 
Marc Levoy
Proc. SIGGRAPH 1994
Download the ZipPack library.
Or check out our archive of scanned 3D models.
Read the history of the
Stanford Bunny.
 
Range imaging offers an inexpensive and accurate means for digitizing the shape
of three-dimensional objects. Because most objects self occlude, no single
range image suffices to describe the entire object. We present a method for
combining a collection of range images into a single polygonal mesh that
completely describes an object to the extent that it is visible from the
outside. The steps in our method are: 1) align the meshes with each other
using a modified iterated closest-point algorithm, 2) zipper together adjacent
meshes to form a continuous surface that correctly captures the topology of the
object, and 3) compute local weighted averages of surface positions on all
meshes to form a consensus surface geometry. Our system differs from previous
approaches in that it is incremental; scans are acquired and combined one at a
time. This approach allows us to acquire and combine large numbers of scans
with minimal storage overhead. Our largest models contain up to 360,000
triangles. All the steps needed to digitize an object that requires up to 10
range scans can be performed using our system with five minutes of user
interaction and a few hours of compute time. We show two models created using
our method with range data from a commercial rangefinder that employs laser
stripe technology.
 
 
 
3D scanning
 
 
 
Real-Time 3D Model Acquisition
Szymon Rusinkiewicz, 
Olaf Hall-Holt, and 
Marc Levoy
ACM Transactions on Graphics 21(3),
Proc. SIGGRAPH 2002
 
The digitization of the 3D shape of real objects is a rapidly expanding
field, with applications in entertainment, design, and archaeology. We
propose a new 3D model acquisition system that permits the user to rotate
an object by hand and see a continuously-updated model as the object is
scanned. This tight feedback loop allows the user to find and fill holes
in the model in real time, and determine when the object has been
completely covered. Our system is based on a 60 Hz. structured-light
rangefinder, a real-time variant of ICP (iterative closest points) for
alignment, and point-based merging and rendering algorithms. We
demonstrate the ability of our prototype to scan objects faster and with
greater ease than conventional model acquisition pipelines.
 
 
 
 
An Assessment of Laser Range Measurement of Marble Surfaces
Guy Godin, J.-Angelo Beraldin, Marc Rioux, 
Marc Levoy, Luc Cournoyer, and Francois Blais
Fifth Conference on optical 3-D measurement techniques, 2001.
 
An important application of laser range sensing is found in the 3D scanning and
modelling of heritage collections, and of sculptures in particular. Since a
significant proportion of the statues in the world" s museums is composed of
marble, the optical properties of this material under laser range sensing need
to be understood. Marble's translucency and heterogeneous structure produce
significant bias and increased noise in the geometric measurements. Experiments
on a sample of Carrara Statuario marble highlight the relationship between the
laser spot diameter and the estimated noise levels in the geometric
measurements. A bias in the depth measurement is also observed. These phenomena
are believed to result from scattering on the surface of small crystals at or
near the surface.
 
 
 
 
Better Optical Triangulation Through Spacetime Analysis
Brian Curless and 
Marc Levoy
Proc. ICCV 1995
 
Optical triangulation range scanners are finding wide usage in
industrial inspection, metrology, medicine, and computer graphics.
The standard methods for extracting range data from structured light
reflecting off of an object are accurate only for planar surfaces of
uniform reflectance illuminated by an incoherent source. Using these
methods, curved surfaces, discontinuous surfaces, and surfaces of
varying reflectance cause systematic distortions of the range data.
Coherent light sources such as lasers introduce speckle artifacts that
further degrade the data. We present a new ranging method based on
analyzing the time evolution of the structured light reflections.
Using our spacetime analysis, we can correct for each of these
artifacts, thereby attaining significantly higher accuracy using
existing technology. We present results that demonstrate the validity
of our method using a commercial laser stripe triangulation scanner.
 
 
 
Cultural heritage
 
 
 
Fragments of the City: Stanford's Digital Forma Urbis Romae Project
David Koller,
Jennifer Trimble, Tina Najbjerg,
Natasha Gelfand,
Marc Levoy
Proc. Third Williams Symposium on Classical Architecture,
Journal of Roman Archaeology supplement,
2006.
Here's a web site about the project.
Or check out our online database of map fragments.
 
In this article, we summarize the Stanford Digital Forma Urbis Project work
since it began in 1999 and discuss its implications for representing and
imaging Rome. First, we digitized the shape and surface of every known fragment
of the Severan Marble Plan using laser range scanners and digital color
cameras; the raw data collected consists of 8 billion polygons and 6 thousand
color images, occupying 40 gigabytes. These range and color data have been
assembled into a set of 3D computer models and high-resolution photographs -
one for each of the 1,186 marble fragments. Second, this data has served in the
development of fragment matching algorithms; to date, these have resulted in
over a dozen highly probable, new matches. Third, we have gathered the
Project's 3D models and color photographs into a relational database and
supported them with archaeological documentation and an up-to-date scholarly
apparatus for each fragment. This database is intended to be a public,
web-based, research and study tool for scholars, students and interested
members of the general public alike; as of this writing, 400 of the surviving
fragments are publicly available, and the full database is scheduled for
release in 2005. Fourth, these digital and archaeological data, and their
availability in a hypertext format, have the potential to broaden the scope and
type of research done on this ancient map by facilitating a range of
typological, representational and urbanistic analyses of the map, some of which
are proposed here. In these several ways, we hope that this Project will
contribute to new ways of imaging Rome.
 
 
 
 
Computer-aided Reconstruction and New Matches in the Forma Urbis Romae
David Koller
and 
Marc Levoy
Proc. Formae Urbis Romae - Nuove Scoperte,
Bullettino Della Commissione Archeologica Comunale di Roma,
2006.
See the extra links listed in the next paper up.
 
In this paper, we describe our efforts to apply computer-aided reconstruction
algorithms to find new matches and positionings among the fragments of the
Forma Urbis Romae. First, we review the attributes of the fragments that may be
useful clues for automated reconstruction. Then, we describe several different
specific methods that we have developed which make use of geometric computation
capabilities and digital fragment representations to suggest new matches. These
methods are illustrated with a number of new proposed fragment joins and
placements that have been generated from our computer-aided reconstruction
process.
 
 
 
 
Unwrapping and Visualizing Cuneiform Tablets
Sean Anderson and 
Marc Levoy
IEEE Computer Graphics and Applications,
Vol. 22, No. 6, November/December, 2002, pp. 82-88.
 
Thousands of historically revealing cuneiform clay tablets, which were
inscribed in Mesopotamia millenia ago, still exist today. Visualizing
cuneiform writing is important when deciphering what is written on the
tablets. It is also important when reproducing the tablets in papers and
books. Unfortunately, scholars have found photographs to be an inadequate
visualization tool, for two reasons. First, the text wraps around the
sides of some tablets, so a single viewpoint is insufficient. Second, a
raking light will illuminate some textual features, but will leave others
shadowed or invisible because they are either obscured by features on the
tablet or are nearly aligned with the lighting direction. We present
solutions to these problems by first creating a high-resolution 3D computer
model from laser range data, then unwrapping and flattening the inscriptions on
the model to a plane, allowing us to represent them as a scalar displacement
map, and finally, rendering this map non-photorealistically using accessibility
and curvature coloring. The output of this semi-automatic process enables
all of a tablet's text to be perceived in a single concise image. Our
technique can also be applied to other types of inscribed surfaces, including
bas-reliefs.
 
 
 
 
The Digital Michelangelo Project: 3D scanning of large statues,
Marc Levoy, 
Kari Pulli, 
Brian Curless, 
Szymon Rusinkiewicz, 
David Koller, 
Lucas Pereira,
Matt Ginzton,
Sean Anderson, 
James Davis, 
Jeremy Ginsberg, 
Jonathan Shade, and Duane Fulk
Proc. SIGGRAPH 2000
Other papers on this project were published in
3DIM '99,
Eurographics '99,
EVA '99,
and as a chapter in the book
Exploring David,
Giunti Press, March 2004.
See also this
web page about the book.
Included in 
Seminal Graphics Papers, Volume 2.
Here's a web site about the project.
Download our Scanalyze
or ScanView software.
Or check out our archive of 3D models from the project.
 
We describe a hardware and software system for digitizing the shape and color
of large fragile objects under non-laboratory conditions. Our system employs
laser triangulation rangefinders, laser time-of-flight rangefinders, digital
still cameras, and a suite of software for acquiring, aligning, merging, and
viewing scanned data. As a demonstration of this system, we digitized 10
statues by Michelangelo, including the well-known figure of David, two building
interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant
marble map of ancient Rome. Our largest single dataset is of the David - 2
billion polygons and 7,000 color images. In this paper, we discuss the
challenges we faced in building this system, the solutions we employed, and the
lessons we learned. We focus in particular on the unusual design of our laser
triangulation scanner and on the algorithms and software we developed for
handling very large scanned models.
 
 
 
 
Digitizing the Forma Urbis Romae
Marc Levoy
Siggraph Digital Campfire on Computers and Archeology, April, 2000.
Here's a web site about the project.
Check out our database
of the map fragments.
 
Recent improvements in laser rangefinder technology, together with algorithms
for combining multiple range and color images, allow us to reliably and
accurately digitize the external shape and surface characteristics of many
physical objects. Examples include machine parts, design models, toys, and
artistic and cultural artifacts. As an application of this technology, I and a
team of 30 faculty, staff, and students from Stanford University and the
University of Washington spent the 1998-99 academic year in Italy scanning the
sculptures and architecture of Michelangelo. During our year abroad, we also
became involved in several side projects. One of these was the digitization of
the Forma Urbis Romae...
 
 
 
Texture synthesis
 
 
 
Order-Independent Texture Synthesis
Li-Yi Wei and 
Marc Levoy
Technical Report TR-2002-01, Computer Science Department, Stanford University, April, 2002
 
Search-based texture synthesis algorithms are sensitive to the order in which
texture samples are generated; different synthesis orders yield different
textures. Unfortunately, most polygon rasterizers and ray tracers do not
guarantee the order with which surfaces are sampled. To circumvent this
problem, textures are synthesized beforehand at some maximum resolution and
rendered using texture mapping.
We describe a search-based texture synthesis algorithm in which samples can be
generated in arbitrary order, yet the resulting texture remains identical. The
key to our algorithm is a pyramidal representation in which each texture sample
depends only on a fixed number of neighboring samples at each level of the
pyramid. The bottom (coarsest) level of the pyramid consists of a noise image,
which is small and predetermined. When a sample is requested by the renderer,
all samples on which it depends are generated at once. Using this approach,
samples can be generated in any order. To make the algorithm efficient, we
propose storing texture samples and their dependents in a pyramidal cache.
Although the first few samples are expensive to generate, there is substantial
reuse, so subsequent samples cost less. Fortunately, most rendering algorithms
exhibit good coherence, so cache reuse is high.
 
 
 
 
Texture Synthesis over Arbitrary Manifold Surfaces
Li-Yi Wei and 
Marc Levoy
Proc. SIGGRAPH 2001
 
Algorithms exist for synthesizing a wide variety of textures over rectangular
domains. However, it remains difficult to synthesize general textures over
arbitrary manifold surfaces. In this paper, we present a solution to this
problem for surfaces defined by dense polygon meshes. Our solution extends Wei
and Levoy's texture synthesis method by generalizing their definition of search
neighborhoods. For each mesh vertex, we establish a local parameterization
surrounding the vertex, use this parameterization to create a small rectangular
neighborhood with the vertex at its center, and search a sample texture for
similar neighborhoods. Our algorithm requires as input only a sample texture
and a target model. Notably, it does not require specification of a global
tangent vector field; it computes one as it goes - either randomly or via a
relaxation process. Despite this, the synthesized texture contains no
discontinuities, exhibits low distortion, and is perceived to be similar to the
sample texture. We demonstrate that our solution is robust and is applicable to
a wide range of textures.
 
 
 
 
Fast Texture Synthesis using Tree-structured Vector Quantization
Li-Yi Wei and 
Marc Levoy
Proc. SIGGRAPH 2000
 
Texture synthesis is important for many applications in computer graphics,
vision, and image processing. However, it remains difficult to design an
algorithm that is both efficient and capable of generating high quality
results. In this paper, we present an efficient algorithm for realistic texture
synthesis. The algorithm is easy to use and requires only a sample texture as
input. It generates textures with perceived quality equal to or better than
those produced by previous techniques, but runs two orders of magnitude
faster. This permits us to apply texture synthesis to problems where it has
traditionally been considered impractical. In particular, we have applied it to
constrained synthesis for image editing and temporal texture generation. Our
algorithm is derived from Markov Random Field texture models and generates
textures through a deterministic searching process. We accelerate this
synthesis process using tree-structured vector quantization.
 
 
 
Image synthesis
 
 
 
A Practical Model for Subsurface Light Transport
Henrik Wann Jensen, 
Steve Marschner, 
Marc Levoy, and 
Pat Hanrahan
Proc. SIGGRAPH 2001
Winner of a 2004
Technical Academy Award. Although I helped initiate this research, I
contributed less to the final paper than Henrik, Steve, and Pat, so I asked the
academy not to cite me in the award.
Included in 
Seminal Graphics Papers, Volume 2.
 
This paper introduces a simple model for subsurface light transport in
translucent materials. The model enables efficient simulation of effects that
BRDF models cannot capture, such as color bleeding within materials and
diffusion of light across shadow boundaries. The technique is efficient even
for anisotropic, highly scattering media that are expensive to simulate using
existing methods. The model combines an exact solution for single scattering
with a dipole point source diffusion approximation for multiple scattering. We
also have designed a new, rapid image-based measurement technique for
determining the optical properties of translucent materials. We validate the
model by comparing predicted and measured values and show how the technique can
be used to recover the optical properties of a variety of materials, including
milk, marble, and skin. Finally, we describe sampling techniques that allow
the model to be used within a conventional ray tracer.
 
 
 
 
Synthetic Texturing Using Digital Filters
Eliot Feibush,
Marc Levoy,
Robert Cook
Proc. SIGGRAPH 1980
 
Aliasing artifacts are eliminated from computer generated images of textured
polygons by equivalently filtering both the texture and the edges of the
polygons. Different filters can be easily compared because the weighting
functions that define the shape of the filters are pre-computed and stored in
lookup tables. A polygon subdivision algorithm removes the hidden surfaces so
that the polygons are rendered sequentially to minimize accessing the texture
definition files. An implementation of the texture rendering procedure is
described.
 
 
 
Volume rendering and medical imaging
 
 
 
 
Application of Zernike polynomials towards accelerated adaptive focusing
of transcranial high intensity focused ultrasound,
Elena A. Kaye, Yoni Hertzberg, Michael Marx, Beat Werner, Gil Navon,
Marc Levoy,
and Kim Butts Pauly,
Journal of Medical Physics,
Vol. 39, No. 6254 (2012).
 
Purpose: To study the phase aberrations produced by human skulls during
transcranial magnetic resonance imaging guided focused ultrasound surgery
(MRgFUS), to demonstrate the potential of Zernike polynomials (ZPs) to
accelerate the adaptive focusing process, and to investigate the benefits of
using phase corrections obtained in previous studies to provide the initial
guess for correction of a new data set.
Conclusions: The application of ZPs to phase aberration correction was
shown to be beneficial for adaptive focusing of transcranial ultrasound. The
skull-based phase aberrations were found to be well approximated by the number
of ZP modes representing only a fraction of the number of ele- ments in the
hemispherical transducer. Implementing the initial phase aberration estimate
together with Zernike-based algorithm can be used to improve the robustness and
can potentially greatly increase the viability of MR-ARFI-based focusing for a
clinical transcranial MRgFUS therapy.
 
 
 
 
Feature-Based Volume Metamorphosis
Apostolos Lerios, 
Chase D. Garfinkle, and 
Marc Levoy
Proc. SIGGRAPH 1995
 
Image metamorphosis, or image 
morphing, is a popular
technique for creating a smooth transition between two images. For
synthetic images, transforming and rendering the underlying
three-dimensional (3D) models has a number of advantages over morphing
between two pre-rendered images. In this paper we consider 3D
metamorphosis applied to volume-based representations of objects. We
discuss the issues which arise in volume morphing and present a method
for creating morphs.
Our morphing method has two components: first a warping of the two
input volumes, then a blending of the resulting warped volumes. The
warping component, an extension of Beier and Neely's image warping
technique to 3D, is feature-based and allows fine user control, thus
ensuring realistic looking intermediate objects. In addition, our
warping method is amenable to an efficient approximation which gives a
50 times speedup and is computable to arbitrary accuracy. Also, our
technique corrects the ghosting problem present in Beier and Neely's
technique. The second component of the morphing process, blending, is
also under user control; this guarantees smooth transitions in the
renderings.
 
 
 
 
Fast Volume Rendering Using a Shear-Warp Factorization of the Viewing Transformation
Philippe Lacroute and 
Marc Levoy
Proc. SIGGRAPH 1994
Download the VolPack library.
Or check out our archive of volume data.
 
Several existing volume rendering algorithms operate by factoring the viewing
transformation into a 3D shear parallel to the data slices, a projection to
form an intermediate but distorted image, and a 2D warp to form an undistorted
final image. We extend this class of algorithms in three ways. First, we
describe a new object-order rendering algorithm based on the factorization that
is significantly faster than published algorithms without loss of image
quality. The algorithm achieves its speed by exploiting coherence in the volume
data and the intermediate image. The shear-warp factorization permits us to
traverse both the volume and the intermediate image data structures in
synchrony during rendering, using both types of coherence to reduce work. Our
implementation running on an SGI Indigo workstation renders a 256^3 voxel
medical data set in one second. Our second extension is a derivation of the
factorization for perspective viewing transformations, and we show how our
rendering algorithm can support this extension. Third, we introduce a data
structure for encoding spatial coherence in unclassified volumes (i.e. scalar
fields with no precomputed opacity). When combined with our shear-warp
rendering algorithm this data structure allows us to classify and render a
256^3 voxel volume in three seconds. Our algorithms employ run-length
encoding, min-max pyramids, and multi-dimensional summed area tables. The
method extends readily to support mixed volumes and geometry.
 
 
 
 
Frequency Domain Volume Rendering
Takashi Totsuka and 
Marc Levoy
Proc. SIGGRAPH 1993
 
The Fourier projection-slice theorem allows projections of volume data to be
generated in O(n^2 log n) time for a volume of size n^3. The method operates
by extracting and inverse Fourier transforming 2D slices from a 3D frequency
domain representation of the volume. Unfortunately, these projections do not
exhibit the occlusion that is characteristic of conventional volume renderings.
We present a new frequency domain volume rendering algorithm that replaces much
of the missing depth and shape cues by performing shading calculations in the
frequency domain during slice extraction. In particular, we demonstrate
frequency domain methods for computing linear or nonlinear depth cueing and
directional diffuse reflection. The resulting images can be generated an order
of magnitude faster than volume renderings and may be more useful for many
applications.
 
 
 
 
Volume Rendering using the Fourier Projection-Slice Theorem
Marc Levoy
Proc. Graphics Interface 1992
 
The Fourier projection-slice theorem states that the inverse transform of a
slice extracted from the frequency domain representation of a volume yields a
projection of the volume in a direction perpendicular to the slice. This
theorem allows the generation of attenuation-only renderings of volume data in
O(n^2 log N) time for a volume of size n^3. In this paper, we show how more
realistic renderings can be generated using a class of shading models whose
terms are Fourier projections. Models are derived for rendering depth cueing
by linear attenuation of variable energy emitters and for rendering directional
shading by Lambertian reflection with hemispherical illumination. While the
resulting images do not exhibit the occlusion that is characteristic of
conventional volume rendering, they provide sufficient depth and shape cues to
give a strong illusion that occlusion exists.
 
 
 
 
A Hybrid Ray Tracer for Rendering Polygon and Volume Data
Marc Levoy
IEEE Computer Graphics and Applications, Vol. 10, No. 2, March, 1990, pp. 33-40.
 
Volume rendering is a technique for visualizing sampled functions of three
spatial dimensions by computing 2D projections of a colored semi-transparent
volume. This paper addresses the problem of extending volume rendering to
handle polygonally defined objects. The solution proposed is a hybrid ray
tracing algorithm. Rays are simultaneously cast through a set of polygons and
a volume data array, samples of each are drawn at equally spaced intervals
along the rays, and the resulting colors and opacities are composited together
in depth-sorted order. To avoid aliasing of polygonal edges at modest
computational expense, a form of selective supersampling is employed. To avoid
errors in visibility at polygon-volume intersections, volume samples lying
immediately in front of and behind polygons are given special treatment. The
cost, image quality, and versatility of the algorithm are evaluated using data
from 3D medical imaging applications.
 
 
 
 
Volume Rendering by Adaptive Refinement
Marc Levoy
The Visual Computer, Vol. 6, No. 1, February, 1990, pp. 2-7.
 
Volume rendering is a technique for visualizing sampled scalar functions of
three spatial dimensions by computing 2D projections of a colored
semi-transparent gel. This paper presents a volume rendering algorithm in
which image quality is adaptively refined over time. An initial image is
generated by casting a small number of rays into the data, less than one ray
per pixel, and interpolating between the resulting colors. Subsequent images
are generated by alternately casting more rays and interpolating. The
usefulness of these rays is maximized by distributing them according to
measures of local image complexity. Examples from two applications are given:
molecular graphics and medical imaging.
 
 
 
 
Efficient Ray Tracing of Volume Data
Marc Levoy
ACM Transactions on Graphics, Vol. 9, No. 3, July, 1990, pp. 245-261.
 
Volume rendering is a family of techniques for visualizing sampled scalar or
vector fields of three spatial dimensions without fitting geometric primitives
to the data. A subset of these techniques generate images by computing 2D
projections of a colored semi-transparent volume, where the color and opacity
at each point is derived from the data using local operators. Since all voxels
participate in the generation of each image, rendering time grows linearly with
the size of the dataset. This paper presents a front-to-back image-order volume
rendering algorithm and discusses two techniques for improving its
performance. The first technique employs a pyramid of binary volumes to encode
spatial coherence present in the data, and the second technique uses an opacity
threshold to adaptively terminate ray tracing. Although the actual time saved
depends on the data, speedups of an order of magnitude have been observed for
datasets of useful size and complexity. Examples from two applications are
given: medical imaging and molecular graphics.
 
 
 
 
Volume Rendering in Radiation Treatment Planning
Marc Levoy, Henry Fuchs, Stephen M. Pizer, Julian Rosenman,
Edward L. Chaney, George W. Sherouse, 
Victoria Interrante, and Jeffrey Kiel
First Conference on Visualization in Biomedical Computing, IEEE, May, 1990
 
Successful treatment planning in radiation therapy depends in part on
understanding the spatial relationship between patient anatomy and the
distribution of radiation dose. We present several visualizations based on
volume rendering that offer potential solutions to this problem. The
visualizations employ region boundary surfaces to display anatomy, polygonal
meshes to display treatment beams, and isovalue contour surfaces to display
dose. To improve perception of spatial relationships, we use metallic shading,
surface and solid texturing, synthetic fog, shadows, and other artistic
devices. Also outlined is a method based on 3D mip maps for efficiently
generating perspective volume renderings and beam's-eye views. To evaluate the
efficacy of these visualizations, we are building a radiotherapy planning
system based on a Cray YMP and the Pixel-Planes 5 raster display engine. The
system will allow interactive manipulation of beam geometry, dosimetry,
shading, and viewing parameters, and will generate volume renderings of anatomy
and dose in real time.
 
 
 
 
Display of Surfaces from Volume Data
Marc Levoy
PhD Dissertation,
Tech Report TR89-022,
University of North Carolina at Chapel Hill,
May, 1989.
 
Volume rendering
is a technique for visualizing sampled scalar fields of three spatial
dimensions without fitting geometric primitives to the data. A color and a
partial transparency are computed for each data sample, and images are formed
by blending together contributions made by samples projecting to the same pixel
on the picture plane. Quantization and aliasing artifacts are reduced by
avoiding thresholding during data classification and by carefully resampling
the data during projection. This thesis presents an image-order volume
rendering algorithm, demonstrates that it generates images of comparable
quality to existing object-order algorithms, and offers several improvements.
In particular, methods are presented for displaying isovalue contour surfaces
and region boundary surfaces, for rendering mixtures of analytically defined
geometry and sampled fields, and for adding shadows and textures. Three
techniques for reducing rendering cost are also presented: hierarchical spatial
enumeration, adaptive termination of ray tracing, and adaptive image sampling.
Case studies from two applications are given: medical imaging and molecular
graphics.
 
 
 
 
Display of Surfaces from Volume Data
Marc Levoy
IEEE Computer Graphics and Applications, Vol. 8, No. 3, May, 1988
About the 
error in this paper.
Received the
Test of Time Award
from IEEE CG&A in December 2022.
 
The application of volume rendering techniques to the display of surfaces from
sampled scalar functions of three spatial dimensions is explored. Fitting of
geometric primitives to the sampled data is not required. Images are formed by
directly shading each sample and projecting it onto the picture plane. Surface
shading calculations are performed at every voxel with local gradient vectors
serving as surface normals. In a separate step, surface classification
operators are applied to obtain a partial opacity for every voxel. Operators
that detect isovalue contour surfaces and region boundary surfaces are
presented. Independence of shading and classification calculations insures an
undistorted visualization of 3-D shape. Non-binary classification operators
insure that small or poorly defined features are not lost. The resulting
colors and opacities are composited from back to front along viewing rays to
form an image. The technique is simple and fast, yet displays surfaces
exhibiting smooth silhouettes and few other aliasing artifacts. The use of
selective blurring and super-sampling to further improve image quality is also
described. Examples from two applications are given: molecular graphics and
medical imaging.
 
 
 
Point-based rendering
 
 
 
Streaming QSplat: A Viewer for Networked Visualization of Large, Dense Models
Szymon Rusinkiewicz and 
Marc Levoy
Proc. 2001 Symposium on Interactive 3D Graphics
 
Steady growth in the speeds of network links and graphics accelerator cards
has brought increasing interest in streaming transmission of three-dimensional
data sets. We demonstrate how streaming visualization can be made practical
for data sets containing hundreds of millions of samples. Our system is based
on QSplat, a multiresolution rendering system for dense polygon meshes that
employs a bounding sphere hierarchy data structure and splat rendering. We
show how to incorporate view-dependent progressive transmission into QSplat, by
having the client request visible portions of the model in order from
coarse to fine resolution. In addition, we investigate interaction techniques
for improving the effectiveness of streaming data visualization. In
particular, we explore color-coding streamed data by resolution, examine the
order in which data should be transmitted in order to minimize visual
distraction, and propose tools for giving the user fine control over download
order.
 
 
 
 
QSplat: A Multiresolution Point Rendering System for Large Meshes
Szymon Rusinkiewicz and 
Marc Levoy
Proc. SIGGRAPH 2000
Download our QSplat software.
Or check out our archive of QSplat models.
 
Advances in 3D scanning technologies have enabled the practical creation of
meshes with hundreds of millions of polygons. Traditional algorithms for
display, simplification, and progressive transmission of meshes are impractical
for data sets of this size. We describe a system for representing and
progressively displaying these meshes that combines a multiresolution hierarchy
based on bounding spheres with a rendering system based on points. A single
data structure is used for view frustum culling, backface culling,
level-of-detail selection, and rendering. The representation is compact and
can be computed quickly, making it suitable for large data sets. Our
implementation, written for use in a large-scale 3D digitization project,
launches quickly, maintains a user-settable interactive frame rate regardless
of object complexity or camera position, yields reasonable image quality during
motion, and refines progressively when idle to a high final image quality. We
have demonstrated the system on scanned models containing hundreds of millions
of samples.
 
 
 
 
The Use of Points as a Display Primitive
Marc Levoy and 
Turner Whitted
UNC-Chapel Hill Computer Science Technical Report #85-022, January, 1985
Here is a book chapter I wrote outlining the
early history of point-based graphics, and discussing the pros and cons of
using points as a display primitive. The chapter appears in revised form in
Markus Gross's
Point-Based Graphics,
Morgan Kaufmann, 1987.
The key idea of point-based rendering is the representation of geometry as a
dense disconnected collection of splats or disks, rather than as a connected mesh
of polygons. Although seldom used now in computer graphics, this idea has
enjoyed a renaissance in computer vision as a 3D reconstruction technique,
beginning with
3D Gaussian Splatting
for Real-Time Radiance Field Rendering. Interestingly, the key difficulty of
point-based rendering remains, as described in my
historical overview: ensuring that the splats used to represent an object
overlap sufficiently to occlude the background, leaving no gaps.
 
As the visual complexity of computer generated scenes continues to increase,
the use of classical modeling primitives as display primitives becomes less
appealing. Customization of display algorithms, the conflict between object
order and image order rendering and the reduced usefulness of object coherence
in the presence of extreme complexity are all contributing factors. This
paper proposes to decouple the modeling geometry from the rendering process by
introducing the notion of points as a universal meta-primitive. We first
demonstrate that a discrete array of points arbitrarily displaced in space
using a tabular array of perturbations can be rendered as a continuous
three-dimensional surface. This solves the long-standing problem of producing
correct silhouette edges for bump mapped textures. We then demonstrate that a
wide class of geometrically defined objects, including both flat and curved
surfaces, can be converted into points. The conversion can proceed in object
order, facilitating the display of procedurally defined objects. The
rendering algorithm is simple and requires no coherence in order to be
efficient. It will also be shown that the points may be rendered in random
order, leading to several interesting and unexpected applications of the
technique.
 
 
 
Systems and architectures
 
 
 
Protected Interactive 3D Graphics Via Remote Rendering
David Koller,
Michael Turitzin,
Marc Levoy,
Marco Tarini,
Giuseppe Croccia,
Paolo Cignoni,
Roberto Scopigno
ACM Transactions on Graphics 23(3),
Proc. SIGGRAPH 2004
A shortened version of this paper was the
cover article in the June 2005 issue
of Communications of the ACM (CACM).
Download our ScanView software.
 
Valuable 3D graphical models, such as high-resolution digital scans of cultural
heritage objects, may require protection to prevent piracy or misuse, while
still allowing for interactive display and manipulation by a widespread
audience. We have investigated techniques for protecting 3D graphics content,
and we have developed a remote rendering system suitable for sharing archives
of 3D models while protecting the 3D geometry from unauthorized extraction. The
system consists of a 3D viewer client that includes low-resolution versions of
the 3D models, and a rendering server that renders and returns images of
high-resolution models according to client requests. The server implements a
number of defenses to guard against 3D reconstruction attacks, such as
monitoring and limiting request streams, and slightly perturbing and distorting
the rendered images. We consider several possible types of reconstruction
attacks on such a rendering server, and we examine how these attacks can be
defended against without excessively compromising the interactive experience
for non-malicious users.
 
 
 
 
Polygon-Assisted JPEG and MPEG Compression of Synthetic Images
Marc Levoy
Proc. SIGGRAPH 1995
 
Recent advances in realtime image compression and decompression hardware make
it possible for a high-performance graphics engine to operate as a rendering
server in a networked environment. If the client is a low-end workstation or
set-top box, then the rendering task can be split across the two devices. In
this paper, we explore one strategy for doing this.
For each frame, the server generates a high-quality rendering and a low-quality
rendering, subtracts the two, and sends the difference in compressed form. The
client generates a matching low quality rendering, adds the decompressed
difference image, and displays the composite. Within this paradigm, there is
wide latitude to choose what constitutes a high-quality versus low-quality
rendering. We have experimented with textured versus untextured surfaces, fine
versus coarse tessellation of curved surfaces, Phong versus Gouraud
interpolated shading, and antialiased versus nonantialiased edges.
In all cases, our polygon-assisted compression looks subjectively better for a
fixed network bandwidth than compressing and sending the high-quality
rendering. We describe a software simulation that uses JPEG and MPEG-1
compression, and we show results for a variety of scenes.
 
 
 
 
Parallel Visualization Algorithms: Performance and Architectural Implications
Jaswinder Pal Singh, Anoop Gupta, and 
Marc Levoy
IEEE Computer, Vol. 27, No. 7, July 1994
 
Several recent algorithms have substantially sped up complex and time-consuming
visualization tasks. In particular, novel algorithms for radiosity computation
[1] and volume rendering [2][3] have demonstrated performance far superior to
earlier methods. Despite these advances, visualization of complex scenes or
data sets remains computationally expensive. Rendering a 256-by-256-by-256
voxel volume data set takes about 5 seconds per frame on a 100 MHz Silicon
Graphics Indigo workstation using the ray-casting algorithm in [2], and about a
second per frame using a new shear-warp algorithm [3]. These times are much
larger than the 0.03 seconds per frame required for real-time rendering or the
0.1 seconds per frame required for interactive rendering. Realistic radiosity
and ray tracing computations are much more time-consuming...
 
 
 
 
Volume Rendering on Scalable Shared-Memory MIMD Architectures
Jason Nieh and 
Marc Levoy
Proc. 1992 Workshop on Volume Visualization
 
Volume rendering is a useful visualization technique for understanding the
large amounts of data generated in a variety of scientific disciplines.
Routine use of this technique is currently limited by its computational
expense. We have designed a parallel volume rendering algorithm for MIMD
architectures based on ray tracing and a novel task queue image partitioning
technique. The combination of ray tracing and MIMD architectures allows us to
employ algorithmic optimizations such as hierarchical opacity enumeration,
early ray termination, and adaptive image sampling. The use of task queue
image partitioning makes these optimizations efficient in a parallel framework.
We have implemented our algorithm on the Stanford DASH Multiprocessor, a
scalable shared-memory MIMD machine. Its single address-space and coherent
caches provide programming ease and good performance for our algorithm. With
only a few days of programming effort, we have obtained nearly linear speedups
and near real-time frame update rates on a 48 processor machine. Since DASH is
constructed from Silicon Graphics multiprocessors, our code runs on any Silicon
Graphics workstation without modification.
 
 
 
User interfaces
 
 
 
3D Painting on Scanned Surfaces
Maneesh Agrawala, 
Andrew Beers, and 
Marc Levoy
Proc. 1995 Symposium on Interactive 3D Graphics
 
We present an intuitive interface for painting on unparameterized
three-dimensional polygon meshes using a 6D Polhemus space tracker as an input
device. Given a physical object we first acquire its surface geometry using a
Cyberware scanner. We then treat the sensor of the space tracker as a
paintbrush. As we move the sensor over the surface of the physical object we
color the corresponding locations on the scanned mesh. The physical object
provides a natural force-feedback guide for painting on the mesh, making it
intuitive and easy to accurately place color on the mesh.
 
 
 
 
Spreadsheets for Images
Marc Levoy
Proc. SIGGRAPH 1994
 
We describe a data visualization system based on spreadsheets. Cells in our
spreadsheet contain graphical objects such as images, volumes, or movies.
Cells may also contain widgets such as buttons, sliders, or curve editors.
Objects are displayed in miniature inside each cell. Formulas for cells are
written in a general-purpose programming language (Tcl) augmented with
operators for array manipulation, image processing, and rendering.
Compared to flow chart visualization systems, spreadsheets are more expressive,
more scalable, and easier to program. Compared to conventional numerical
spreadsheets, spreadsheets for images pose several unique design problems:
larger formulas, longer computation times, and more complicated intercell
dependencies. In response to these problems, we have extended the spreadsheet
paradigm in three ways: formulas can display their results anywhere in the
spreadsheet, cells can be selectively disabled, and multiple cells can be
edited at once. We discuss these extensions and their implications, and we
also point out some unexpected uses for our spreadsheets: as a visual database
browser, as a graphical user interface builder, as a smart clipboard for the
desktop, and as a presentation tool.
 
 
 
 
Gaze-Directed Volume Rendering
Marc Levoy and 
Ross Whitaker
Proc. 1990 Symposium on Interactive 3D Graphics
 
We direct our gaze at an object by rotating our eyes or head until the object's
projection falls on the fovea, a small region of enhanced spatial acuity near
the center of the retina. In this paper. we explore methods for encororating
gaze direction into rendering algorithms. This approach permits generation of
images exhibiting continuously varying resolution, and allows these images to
be displayed on conventional television monitors. Specifically. we describe a
ray tracer for volume data in which the number of rays cast per unit area on
the image plane and the number of samples drawn per unit length along each ray
are functions of local retinal acuity. We also describe an implementation
using 2D and 3D mip maps, an eye tracker, and the Pixel-Planes 5 massively
parallel raster display system. Pending completion of PixelPlanes 5 in the
spring of 1990. we have written a simulator on a Stellar graphics
supercomputer. Preliminary results indicate that while users are aware of the
variable-resolution structure of the image, the high-resolution sweet spot
follows their gaze well and promises to be useful in practice.
 
 
 
Cartoon animation
 
 
 
 
Merging and Transformation of Raster Images for Cartoon Animation
Bruce A. Wallace
Proc. SIGGRAPH 1981
About the role
of this paper in the history of digital compositing.
About the invention of
two-background
matte extraction.
 
The task of assembling drawings and backgrounds together for each frame of an
animated sequence has always been a tedious undertaking using conventional
animation camera stands, and has contributed to the high cost of animation
production. In addition, the physical limitations that these camera stands
place on the manipulation of the individual artwork levels restricts the total
image-making possibilities afforded by traditional cartoon animation.
Documents containing all frame assembly information must also be maintained.
This paper presents several computer methods for assisting in the production of
cartoon animation, both to reduce expense and to improve the overall quality.
Merging is the process of combining levels of artwork into a final composite
frame using digital computer graphics. The term "level" refers to a single
painted drawing (cel) or background. A method for the simulation of any
hypothetical animation camera set-up is introduced. A technique is presented
for reducing the total number of merges by retaining merged groups consisting
of individual levels which do not change over successive frames. Lastly, a
sequence-editing system, which controls precise definition of an animated
sequence, is described. Also discussed is the actual method for merging any
two adjacent levels and several computational and storage optimizations to
speed the process.
 
 
 
 
Area Flooding Algorithms
Marc Levoy
SIGGRAPH 1981 Two-Dimensional Computer Animation course notes.
 
This paper describes the area flooder (equivalent to the paint bucket in Adobe
Photoshop) used in the Hanna-Barbera Productions Computer-Assisted Animation
System, which was in production from the mid-1980s until 1996. Descriptions
are included in the paper of both a hard-edged flooder (stop when the color
changes) and a soft-edge flooder (stop when the color gradient exceeds a given
threshold). At the time this was the fastest area flooder known, at least to
the small community of programmers who worked in this area. This speed was due
partly to the algorithm itself (so it's still a fast algorithm), and partly due
to having been coded directly in machine language. Here is a
longer description
of this project. The version linked here was optically scanned from the
SIGGRAPH 1982 course notes. It is identical to the 1981 version but includes
corrections made by hand on the manuscript in 1982.
 
 
 
 
A Color Animation System Based on the Multiplane Technique
Marc Levoy
Proc. SIGGRAPH 1977
 
This paper describes an animation package currently under development at the
Cornell Program of Computer Graphics.The basic algorithm employed is linear or
non-linear interpolation between successive pairs of key frames. These key
frames are composed of artwork input by the animator on a graphic tablet and
displayed on either a black and white vector scope or a color halftone CRT. The
initial working environment is two-dimensional, and the individual images are
combined using a multiplane cel animation technique to produce depth and motion
illusions. Real-time film previewing, utilizing an on-the-fly interpolation
algorithm, provides the artist with instant playback of animated sequences.
 
 
© 1994-2007
Marc Levoy 
Last update:
July 14, 2025 12:24:17 AM