H-index: 72 | Google citation | DBLP | CS Rankings
We introduce adaptive view planning to multi-view synthesis, aiming to improve both occlusion revelation and 3D consistency for single-view 3D reconstruction. Instead of generating an unordered set of views independently or simultaneously, we generate a sequence of views, leveraging temporal consistency to enhance 3D coherence. Most importantly, our view sequence is not determined by a pre-determined camera setup. Instead, we compute an adaptive camera trajectory (ACT), specifically, an orbit of camera views, which maximizes the visibility of occluded regions of the 3D object to be reconstructed. Once the best orbit is found, we feed it to a video diffusion model to generate novel views around the orbit ...
We introduce HIT, a novel hierarchical neural field representation for 3D shapes that learns general hierarchies in a coarse-to-fine manner across different shape categories in an unsupervised setting. Our key contribution is a hierarchical transformer (HIT), where each level learns parent–child relationships of the tree hierarchy using a compressed code-book. This codebook enables the network to automatically identify common substructures across potentially diverse shape categories. Unlike previous works that constrain the task to a fixed hierarchical structure (e.g., binary), we impose no such restriction ...
We introduce MultiCOIN, a video inbetweening framework that allows multi-modal controls, including depth transition and layering, motion trajectories, text prompts, and target regions for movement localization, while achieving a balance between flexibility, ease of use, and precision for fine-grained video interpolation. To achieve this, we adopt the Diffusion Transformer (DiT) architecture as our video generative model, due to its proven capability to generate high-quality long videos. To ensure compatibility between DiT and our multi-modal controls, we map all motion controls into a common sparse and user-friendly point-based representation as the video/noise input. Further, we separate content controls and motion controls into two branches to encode the required features ...
We pose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening to interpolate two single-view images. In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation. Given two monocular RGB images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D, without making assumptions on the object category, motion type, length, or complexity. To handle such arbitrary and diverse motions, we utilize a foundational video interpolation model for motion prediction and employ a hierarchical approach through keyframes to address large frame-to-frame motion gaps can lead to ambiguous interpretations ...
We introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance.
We introduce ASIA (Adaptive 3D Segmentation using few Image Annotations), a novel framework that enables segmentation of possibly non-semantic and non-text describable "parts" in 3D. Our segmentation is controllable through a few user-annotated in-the-wild images, which are easier to collect than multi-view images, less demanding to annotate than 3D models, and more precise than potentially ambiguous text descriptions. Our method leverages the rich priors of text-to-image diffusion models, such as Stable Diffusion, to transfer segmentations from image space to 3D, even when the annotated and target objects differ significantly in geometry or structure. During training, we optimize a text token for each segment and fine-tune our model with a novel cross-view part correspondence loss.
We present an open-vocabulary and zero-shot method for arbitrary referring expression segmentation (RES), targeting more general input expressions than those handled by prior works. Specifically, our inputs encompass both object- and part-level labels as well as implicit references pointing to properties or qualities of object/part function, design, style, material, etc. Our model, coined RESAnything, leverages Chain-of-Thoughts (CoT) reasoning, where the key idea is attribute prompting. We generate detailed descriptions of object/part attributes including shape, color, and location for potential segment proposals through systematic prompting of a large language model (LLM), where the proposals are produced by a foundational image segmentation model.
We introduce a novel representation for learning and generating Computer-Aided Design (CAD) models in the form of boundary representations (B-Reps). Our representation unifies the continuous geometric properties of B-Rep primitives in different orders (e.g., surfaces and curves) and their discrete topological relations in a holistic latent (HoLa) space. This is based on the simple observation that the topological connection between two surfaces is intrinsically tied to the geometry of their intersecting curve. Such a prior allows us to reformulate topology learning in B-Reps as a geometric reconstruction problem in Euclidean space. Specifically, we eliminate the presence of curves, vertices, and all the topological connections in the latent space by learning to distinguish and derive curve geometries from a pair of surface primitives via a neural intersection network ...
We introduce Masked Anchored SpHerical Distances (MASH), a novel multi-view and parametrized representation of 3D shapes. Inspired by multi-view geometry and motivated by the importance of perceptual shape understanding for learning 3D shapes, MASH represents a 3D shape as a collection of observable local surface patches, each defined by a spherical distance function emanating from an anchor point. We further leverage the compactness of spherical harmonics to encode the MASH functions, combined with a generalized view cone with a parameterized base that masks the spatial extent of the spherical function to attain locality. We develop a differentiable optimization algorithm capable of converting any point cloud into a MASH representation accurately approximating ground-truth surfaces with arbitrary geometry and topology ...
We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form ...
We present ATOP (Articulate That Object Part), a novel few-shot method based on motion personalization to articulate a static 3D object with respect to a part and its motion as prescribed in a text prompt. In our work, the text input allows us to tap into the power of modern-day diffusion models to generate plausible motion samples for the right object category and part. In turn, the input 3D object provides image prompting to personalize the generated video to that very object we wish to articulate. Our method starts with a few-shot finetuning for category-specific motion generation, a key first step to compensate for the lack of articulation awareness by current diffusion models. This is followed by motion video personalization that is realized by multi-view rendered images of the target 3D object. At last, we transfer the personalized video motion to the target 3D object via differentiable rendering to optimize part motion parameters by an SDS loss.
We propose GALA, a novel representation of 3D shapes that (i) excels at capturing and reproducing complex geometry and surface details, (ii) is computationally efficient, and (iii) lends itself to 3D generative modelling with modern, diffusion-based schemes. The key idea of GALA is to exploit both the global sparsity of surfaces within a 3D volume and their local surface properties ...
We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.
We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for color prediction. As our core contribution, we introduce differen- tial primitive assembly (DPA) into NeRF to output a 3D occupancy field in place of density prediction, where the predicted occupancies serve as opacity values for volume rendering. Our network, coined DPA-Net, produces a union of convexes ...
We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort.
To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a
pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions.
We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool
or via generative modeling), a user can directly "paint" desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries
which adhere with the painted styles.
We introduce a novel approach for single-image mesh texturing, which employs diffusion models with judicious conditioning to seamlessly transfer an object's texture from a single RGB image to a given 3D mesh object. We do not assume that the two objects belong to the same category, and even if they do, there can be significant discrepancies in their geometry and part proportions. Our method aims to rectify the discrepancies by respecting both shape semantics and edge features in the inputs to produce clean and sharp mesh texturization. Leveraging a pre-trained Stable Diffusion generator, our method is capable of transferring textures in the absence of a direct guide from the single-view image.
We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection. To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed. To maximize the expressive power of the part templates, we introduce a per-part deformation network to enable the modeling of diverse parts with substantial geometry variations, while imposing constraints on the deformation capacity to ensure fidelity to the originally represented parts.
We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models which involves a two-step process: it first applies a spatial partitioning, referred to as the "split", followed by a "fit" operation to derive a single primitive within each partition. Specifically, our partitioning aims to produce the classical Voronoi diagram of the set of ground-truth (GT) B-Rep primitives.
We introduce a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing. This representation includes a latent code, which captures high-level global semantics of the shape, and a 3D neural feature volume, which provides a spatial context to associate with the local shape changes given by the editing. Second, ...
We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing is more occlusion-revealing since it can peel through any occluders without obstruction. In the limit, i.e., with infinitely many slices, it is guaranteed to unveil all hidden object parts.
We present BRICS, a bi-level feature representation for image collections, which consists of a key code space on top of a feature grid space. Specifically, our representation is learned by an autoencoder to encode images into continuous key codes, which are used to retrieve features from groups of multi-resolution feature grids. Our key codes and feature grids are jointly trained continuously with well-defined gradient flows, leading to high usage rates of the feature grids and improved generative modeling compared to discrete Vector Quantization (VQ). Differently from existing continuous representations such as KL-regularized latent codes, our key codes are strictly bounded in scale and variance.
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images.
We introduce deformable interaction analogy (DINA) as a means to generate close interactions between two 3D objects. Given a single demo interaction between an anchor object (e.g., a hand) and a source object (e.g., a mug grasped by the hand), our goal is to generate many analogous 3D interactions between the same anchor object and various new target objects (e.g. a toy airplane), where the anchor object is allowed to be rigid or deformable.
We present a volume rendering-based neural surface reconstruction method that
takes as few as three disparate RGB images as input. Our key idea is to regularize
the reconstruction, which is severely ill-posed and leaving significant gaps between
the sparse views, by learning a set of neural templates that act as surface priors.
Our method, coined DiViNet, operates in two stages. The first stage learns the
templates, in the form of 3D Gaussian functions, across different scenes, without
3D supervision. In the reconstruction stage, our predicted templates serve as
anchors to help "stitch" the surfaces over sparse regions.
We present D2CSG, a neural model composed of two dual and complementary network branches, with dropouts, for unsupervised learning of compact constructive solid geometry (CSG) representations of 3D CAD shapes. Our network is trained to reconstruct a 3D shape by a fixed-order assembly of quadric primitives, with both branches producing a union of primitive intersections or inverses. A key difference between D2CSG and all prior neural CSG models is its dedicated residual branch to assemble the potentially complex shape complement, which is subtracted from an overall shape modeled by the cover branch. With the shape complements, our network is provably general, while the weight dropout further improves compactness of the CSG tree by removing redundant primitives.
We present ShaDDR, an example-based deep generative neural network which produces a high-resolution textured 3D shape through geometry detailization and conditional texture generation applied to an input coarse voxel shape. Trained on a small set of detailed and textured exemplar shapes, our method learns to detailize the geometry via multi-resolution voxel upsampling and generate textures on voxel surfaces via differentiable rendering against exemplar texture images from a few views. The generation is realtime, taking less than 1 second to produce a 3D model with voxel resolutions up to 512^3. The generated shape preserves the overall structure of the input coarse voxel model, while the style of the generated geometric details and textures can be manipulated through learned latent codes.
This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space. Many recent methods have been developed to encode 3D shapes into a learned latent shape space to enable generative design and modeling. Yet, existing methods lack effective exploration mechanisms, despite the rich information. To this end, we propose to leverage CLIP, a powerful pre-trained vision-language model, to aid the shape-space exploration. Our idea is threefold. First, we couple the CLIP and shape spaces by generating paired CLIP and shape codes through sketch images and training a mapper network to connect the two spaces. Second, to explore the space around a given shape, we formulate a co-optimization strategy to search for the CLIP code that better matches the geometry of the shape. Third, we design three exploration modes, binary-attribute-guided, text-guided, and sketch-guided, to locate suitable exploration trajectories in shape space and induce meaningful changes to the shape.
We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior geometries of a target 3D object. Unlike other works in active vision which focus on optimizing camera viewpoints to better investigate the environment, the primary feature of our reconstruction is an analysis of the interactability of various parts of the target object and the ensuing part manipulation by a robot to enable scanning of occluded regions. As a result, an understanding of part articulations of the target object is obtained on top of complete geometry acquisition. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors. It iterates between interaction analysis and interaction-driven reconstruction, scanning and reconstructing detected moveable parts one at a time, where both the articulated part detection and mesh reconstruction are carried out by neural networks.
We present a complete learning framework to solve the real-world transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms.
We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts. For the same reason, the necessary data annotation effort is tremendous, motivating approaches to minimize human involvement. Our labeling tool iteratively verifies or modifies part labels predicted by a deep neural network, with human feedback continually improving the network prediction. To effectively reduce human efforts, we develop two novel features in our tool, hierarchical and symmetry-aware active labeling. Our human-in-the-loop approach, coined HAL3D, achieves 100% accuracy (barring human errors) on any test set with pre-defined hierarchical part labels, with 80% time-saving over manual effort.
We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples ...
We introduce anchored radial observations (ARO), a novel shape encoding for learning neural field representation of shapes that is category-agnostic and generalizable amid significant shape variations. The main idea behind our work is to reason about shapes through partial observations from a set of viewpoints, called anchors. We develop a general and unified shape representation by employing a fixed set of anchors, via Fibonacci sampling, and designing a coordinate-based deep neural network to predict the occupancy value of a query point in space. Differently from prior neural implicit models, that use global shape feature, our shape encoder operates on contextual, query-specific features ...
This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis
tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor
scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning
models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of
3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks – 3D object detection, 3D scene
segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis
works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis ...
We introduce NIFT, Neural Interaction Field and Template, a descriptive and robust interaction representation of object manipulations to facilitate imitation learning. Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos to the Neural Interaction Field (NIF) defined for the new object. Specifically, the NIF is a neural field which encodes the relationship between each spatial point and a given object, where the relative position is defined by a spherical distance function rather than occupancies or signed distances, which are commonly adopted by conventional neural fields but less informative ...
We introduce an end-to-end learning framework for image-to-image composition, aiming to seamlessly compose an object represented as a cropped patch from an object image into a background scene image. As our approach emphasizes more on semantic and structural coherence of the composed images, rather than their pixel-level RGB accuracies, we tailor the input and output of our network with structure-aware features and design our network losses accordingly, with ground truth established in a self-supervised setting through the object cropping. Specifically, our network takes the semantic layout features from the input scene image, features encoded from the edges and silhouette in the input object patch, as well as a latent code as inputs, and generates a 2D spatial affine transform defining the translation and scaling of the object patch.
We present a novel attention-based mechanism for learning enhanced point features for tasks such as point cloud classification and segmentation. Our key message is that
if the right attention point is selected, then "one point is all you need" — not a sequence as in a recurrent model and not
a pre-selected set as in all prior works. Also, where the attention point is should be learned, from data and specific to
the task at hand. Our mechanism is characterized by a new and simple convolution, which combines the feature at an input point with the feature at its associated attention point. We call such a point adirectional attention point (DAP) ...
Polygonal meshes are ubiquitous, but have only played a relatively minor role in the deep learning revolution. State-of-the-art neural generative models for 3D shapes learn implicit functions and generate meshes via expensive iso-surfacing. We overcome these challenges by employing a classical spatial data structure from graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core operation of BSP involves recursive subdivision of 3D space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition without supervision. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built over a set of planes, where the planes and convexes are both defined by learned network weights.
We present a novel method for single-view 3D reconstruction of textured meshes, with a focus to address the primary challenge surrounding texture inference and transfer. Our key observation is that learning textured reconstruction in a structure-aware and globally consistent manner is effective in handling the severe ill-posedness of the texturing problem and significant variations in object pose and texture details. Specifically, we perform structured mesh reconstruction, via a retrieval-and-assembly approach, to produce a set of genus-zero parts parameterized by deformable boxes and endowed with semantic information. For texturing, we first transfer visible colors from the input image onto the unified UV texture space of the deformable boxes. Then we combine a learned transformer model for per-part texture completion with a global consistency loss to optimize inter-part texture consistency. Our texture completion model operates in a VQ-VAE embedding space and is trained end-to-end, with the transformer training enhanced with retrieved texture instances to improve texture completion performance amid significant occlusion.
We introduce the first learning-based reconstructability predictor to improve view and path planning for
large-scale 3D urban scene acquisition using unmanned drones. In contrast to previous heuristic approaches, our method learns a model that explicitly predicts how well a 3D urban scene will be reconstructed from a set of viewpoints. To make such a model trainable and simultaneously applicable to drone path planning, we simulate the proxy-based 3D scene reconstruction during training to set up the prediction. Specifically, the neural network we design is trained to predict the scene reconstructability as a function of the proxy geometry, a set of viewpoints, and optionally a series of scene images acquired in flight ...
We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions that depend directly on difficult-to-obtain surface gradients, NDC uses a neural network to predict them. As a result, NDC can be trained to produce meshes from signed or unsigned distance fields, binary voxel grids, or point clouds (with or without normals); and it can produce open surfaces in cases where the input represents a sheet or partial surface.
We introduce CAPRI-Net, a neural network for learning compact and interpretable implicit representations of 3D
computer-aided design (CAD) models, in the form of adaptive primitive assemblies. Our network takes an input 3D
shape that can be provided as a point cloud or voxel grids, and reconstructs it by a compact assembly of quadric
surface primitives via constructive solid geometry (CSG) operations. The network is self-supervised with a reconstruction
loss, leading to faithful 3D reconstructions with sharp edges and plausible CSG trees, without any ground-truth shape
assemblies.
We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. Our model is built on autoencoding implicit fields, rather than point clouds which represents the state of the art. Furthermore, our translation network is trained to perform the task over a latent grid representation which combines the merits of both latent-space processing and position awareness, to not only enable drastic shape transforms but also well preserve spatial features and fine local details for natural shape translations.
We introduce RIM-Net, a neural network which learns recursive implicit fields for unsupervised inference of hierarchical
shape structures. Our network recursively decomposes an input 3D shape into two parts, resulting in a binary tree hierarchy.
Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the
input shape. At each node of the tree, simultaneous feature decoding and shape decomposition are carried out by their
respective feature and part decoders, with weight sharing across the same hierarchy level ...
We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we evolve the shapes through part re-combination to produce generations of hybrids or crossbreeds between parents from the heterogeneous shape collection ...
We introduce Neural Marching Cubes (NMC), a data-driven approach for extracting a triangle mesh from a discretized implicit field. We re-cast MC from a deep learning perspective, by designing tessellation templates more apt at preserving geometric features, and learning the vertex positions and mesh topologies from training meshes, to account for contextual information from nearby cubes. We develop a compact per-cube parameterization to represent the output triangle mesh, while being compatible with neural processing, so that a simple 3D convolutional network can be employed for the training. We evaluate our neural MC approach by quantitative and qualitative comparisons to all well-known MC variants, demonstrating its superiority in faithful reconstruction of sharp features and mesh topology.
We introduce RaidaR, a rich annotated image dataset of rainy street scenes, to support autonomous driving research. The new dataset contains the largest number of rainy images (58,542) to date, 5,000 of which provide semantic segmentations and 3,658 provide object instance segmentations. The RaidaR images cover a wide range of realistic rain-induced artifacts, including fog, droplets, and road reflections, which can effectively augment existing street scene datasets to improve data-driven machine perception during rainy weather.
We introduce TM-NET, a novel deep generative model for synthesizing textured meshes in a part-aware manner. Once trained, the network can generate novel textured meshes from scratch or predict textures for a given 3D mesh, without image guidance. Plausible and diverse textures can be generated for the same mesh part, while texture compatibility between parts in the same shape is achieved via conditional generation. Specifically, our method produces texture maps for individual shape parts, each as a deformable box, leading to a natural UV map with minimal distortion. The network separately embeds part geometry (via a PartVAE) and part texture (via a TextureVAE) into their respective latent spaces ...
We introduce a path-oriented drone trajectory planning algorithm, which performs continuous image acquisition along an aerial path, aiming to optimize both the scene reconstruction quality and path quality. Specifically, our method takes as input a rough 3D scene proxy and produces a drone trajectory and image capturing setup, which efficiently yields a high-quality
reconstruction of the 3D scene based on three optimization objectives: one
maximize the amount of 3D scene information that can be acquired along
the entirety of the trajectory, another one to optimize the scene capturing
efficiency by maximizing the scene information that can be acquired per
unit length along the aerial path, and the last one to minimize the total
turning angles along the aerial path, so as to reduce the number of sharp turns.
We introduce 3D-FRONT (3D Furnished Rooms with layOuts and semaNTics), a new, large-scale, and comprehensive repository of synthetic indoor scenes highlighted by professionally designed layouts and a large number of rooms populated by high-quality textured 3D models with style compatibility. From layout semantics down to texture details of individual objects, our dataset is freely available to the academic community and beyond. Currently, 3D-FRONT contains 18,797 rooms diversely furnished by 3D objects, far surpassing all publicly available scene datasets. In addition, the 7,302 furniture objects all come with high-quality textures ...
We present MRGAN, a multi-rooted adversarial network which generates part-disentangled 3D point-cloud shapes without part-based shape supervision. The network fuses multiple branches of tree-structured graph convolution layers which produce point clouds, with learnable constant inputs at the tree roots. Each branch learns to grow a different shape part, offering control over the shape generation at the part level. Our network encourages disentangled generation of semantic parts via two key ingredients: a root-mixing training strategy which helps decorrelate the different branches to facilitate disentanglement, and a set of loss terms designed with part disentanglement and shape semantics in mind.
We present the first single-view 3D reconstruction network aimed at recovering geometric details from an input image which encompass both topological shape structures and surface features. Our key idea is to train the network to learn a detail disentangled reconstruction consisting of two functions, one implicit field representing the coarse 3D shape and the other capturing the details. Given an input image, our network, coined D2IM-Net, encodes it into global and local features which are respectively fed into two decoders. The base decoder uses the global features to reconstruct a coarse implicit field, while the detail decoder reconstructs, from the local features, two displacement maps, defined over the front and back sides of the captured object. The final 3D reconstruction is a fusion between the base shape and the displacement maps, with three losses enforcing the recovery of coarse shape, overall structure, and surface details via a novel Laplacian term.
We introduce a deep generative network for 3D shape detailization, akin to stylization with the style being geometric details. We address the challenge of creating large varieties of high-resolution and detailed 3D geometry from a small set of exemplars by treating the problem as that of geometric detail transfer. Given a low-resolution coarse voxel shape, our network refines it, via voxel upsampling, into a higher-resolution shape enriched with geometric details. The output shape preserves the overall structure (or content) of the input, while its detail generation is conditioned on an input "style code" corresponding to a detailed exemplar.
We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). Our network, coined LayoutGMN, learns the layout metric via neural graph matching, using an attention-based GMN designed under a triplet network setting. To train our network, we utilize weak labels obtained by pixel-wise Intersection-over-Union (IoUs) to define the triplet loss. Importantly, LayoutGMN is built with a structural bias which can effectively compensate for the lack of structure awareness in IoUs.
This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships. Given the number of primitives, the generator produces a structured roof model as a graph, which consists of 1) primitive geometry as raster images at each node, encoding facet segmentation and angles; 2) inter-primitive colinear/coplanar relationships at each edge; and 3) primitive geometry in a vector format at each node, generated by a novel differentiable vectorizer while enforcing the relationships.
State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference.
We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data. We represent these edges as a collection
of parametric curves (i.e., lines, circles, and B-splines). Accordingly, our deep neural network, coined PIE-NET, is trained for parametric inference
of edges. The network is trained on the ABC dataset and relies on a "region proposal" architecture, where a first module proposes an over-complete
collection of edge and corner points, and a second module ranks each proposal to decide whether it should be considered.
We introduce COALESCE, the first data-driven framework for component-based shape assembly which employs deep learning to synthesize part connections. To handle geometric and topological mismatches between parts, we remove the mismatched portions via erosion, and rely on a joint synthesis step, which is learned from data, to fill the gap and arrive at a natural and plausible part joint. Given a set of input parts extracted from different objects, COALESCE automatically aligns them and synthesizes plausible joints to connect the parts into a coherent 3D object represented by a mesh. The joint synthesis network, designed to focus on joint regions, reconstructs the surface between the parts by predicting an implicit shape representation that agrees with existing parts, while generating a smooth and topologically meaningful connection.
We introduce carvable volume decomposition for efficient 3-axis CNC machining of 3D freeform objects,
where our goal is to develop a fully automatic method to jointly optimize setup and path planning.
We formulate our joint optimization as a volume decomposition problem which prioritizes minimizing the number
of setup directions while striving for a minimum number of continuously carvable volumes, where a 3D volume
is continuously carvable, or simply carvable, if it can be carved with the machine cutter traversing a single
continuous path. Geometrically, carvability combines visibility and monotonicity and presents a new shape property
which had not been studied before.
We introduce the transport-and-pack (TAP) problem, a frequently encountered instance of real-world packing,
and develop a neural optimization solution based on reinforcement learning. Given an initial spatial
configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly
into a target container. Due to obstruction and accessibility constraints, our problem has to add a new
search dimension, i.e., finding an optimal transport sequence, to the already immense search space for
packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns
to guide the solution of new problem instances instead of executing an expensive online search.
We introduce BSD-GAN, a novel multi-branch and scale-disentangled training method which enables unconditional Generative Adversarial Networks (GANs) to learn image representations at multiple scales, benefiting a wide range of generation and editing tasks. The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features. Specifically, each noise vector, as input to the generator network of BSD-GAN, is deliberately split into several sub-vectors, each corresponding to, and is trained to learn, image representations at a particular scale. During training, we progressively "de-freeze" the sub-vectors, one at a time, as a new set of higher-resolution images is employed for training and more network layers are
added.
We introduce GANhopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two input domains. Our network is trained on unpaired images from the two domains only, without any in-between images. All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discriminator, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count.
We introduce a differential visual similarity metric to train deep neural networks for 3D reconstruction, aimed at improving reconstruction quality. The metric compares two 3D shapes by measuring distances between multi-view images differentiably rendered from the shapes. Importantly, the image-space distance is also differentiable and measures visual similarity, rather than pixel-wise distortion. Specifically, the similarity is defined by mean-squared errors over HardNet features computed from probabilistic keypoint maps of the compared images. Our differential visual shape similarity metric can be easily plugged into various 3D reconstruction networks, replacing their distortion-based losses, such as Chamfer or Earth Mover distances, so as to optimize the network weights to produce reconstructions with better structural fidelity and visual quality.
We introduce the first neural optimization framework to solve a classical instance of
the tiling problem. Namely, we seek a non-periodic tiling of an arbitrary 2D shape
using one or more types of tiles: the tiles maximally fill the shape’s interior without
overlaps or holes. To start, we reformulate tiling as a graph problem by modeling
candidate tile locations in the target shape as graph nodes and connectivity between
tile locations as edges. We build a graph convolutional neural network, coined TilinGNN,
to progressively propagate and aggregate features over graph edges and predict tile
placements. Our network is self-supervised and trained by maximizing the tiling coverage
on target shapes, while avoiding overlaps and holes between the tiles. After training,
TilinGNN has a running time that is roughly linear to the number of candidate tile
locations, significantly outperforming traditional combinatorial search.
We introduce a learning framework for automated floorplan generation which combines generative modeling using deep
neural networks and user-in-the-loop designs to enable human users to provide sparse design constraints. Such
constraints are represented by a layout graph. The core component of our learning framework is a deep neural
network, Graph2Plan, which is trained on RPLAN, a large-scale dataset consisting of 80K annotated, human-designed
floorplans. The network converts a layout graph, along with a building boundary, into a floorplan that
fulfills both the layout and boundary constraints.
Polygonal meshes are ubiquitous in the digital 3D domain. Leading methods for learning generative models of shapes rely on implicit functions, and generate meshes only after expensive iso-surfacing routines. To overcome these challenges, we are inspired by a classical spatial data structure from computer graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core ingredient of BSP is an operation for recursive subdivision of space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition. Importantly, BSP-Net is unsupervised since no convex shape decompositions are needed for training. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built on a set of planes. The convexes inferred by BSP-Net can be easily extracted to form a polygon mesh, without any need for iso-surfacing. The generated meshes are compact (i.e., low-poly) and well suited to represent sharp geometry; they are guaranteed to be watertight and can be easily parameterized.
We introduce AdaSeg, a deep neural network architecture for adaptive co-segmentation of a set of 3D shapes represented as point clouds. Differently from the familiar
single-instance segmentation problem, co-segmentation is
intrinsically contextual: how a shape is segmented can vary
depending on the set it is in. Hence, our network features
an adaptive learning module to produce a consistent shape
segmentation which adapts to a set.
We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. The input to our network is a 3D shape segmented into parts, where each part is first encoded into a feature representation using a part autoencoder. The core component of PQ-NET is a sequence-to-sequence or Seq2Seq autoencoder which encodes a sequence of part features into a latent vector of fixed size, and the decoder reconstructs the 3D shape, one part at a time, resulting in a sequential assembly. The latent space formed by the Seq2Seq encoder encodes both part structure and fine part geometry. The decoder can be adapted to perform several generative tasks including shape autoencoding, interpolation, novel shape generation, and single-view 3D reconstruction, where the generated shapes are all composed of meaningful parts.
3D models of objects and scenes are critical to many academic disciplines and industrial applications. Of particular interest is the emerging opportunity for 3D graphics to serve artificial intelligence: computer vision systems can benefit from synthetically- generated training data rendered from virtual 3D scenes, and robots can be trained to navigate in and interact with real-world environments by first acquiring skills in simulated ones. One of the most promising ways to achieve this is by learning and applying generative models of 3D content: computer programs that can synthesize new 3D shapes and scenes. To allow users to edit and manipulate the synthesized 3D content to achieve their goals, the generative model should also be structure-aware: it should express 3D shapes and scenes using abstractions that allow manipulation of their high-level structure. This state-of-the- art report surveys historical work and recent progress on learning structure-aware generative models of 3D shapes and scenes.
We introduce LOGAN, a deep neural network aimed at learning general-purpose shape transforms
from unpaired domains. The network is trained on two sets of shapes, e.g., tables and chairs, while
there is neither a pairing between shapes from the domains as supervision nor any point-wise correspondence
between any shapes. Once trained, LOGAN takes a shape from one domain and transforms it into the
other. Our network consists of an autoencoder to encode shapes from the two input domains into a
common latent space, where the latent codes concatenate multi-scale shape features, resulting in
an overcomplete representation. The translator is based on a latent generative adversarial network (GAN),
where an adversarial loss enforces cross-domain translation while a feature
preservation loss ensures that the right shape features are preserved for a natural shape transform.
We introduce SDM-NET, a deep generative neural network which produces structured deformable meshes. Specifically, the network is trained to generate a spatial arrangement of closed, deformable mesh parts, which respect the global part structure of a shape collection, e.g., chairs, airplanes, etc. Our key observation is that while the overall structure of a 3D shape can be complex, the shape can usually be decomposed into a set of parts, each homeomorphic to a box, and the finer-scale geometry of the part can be recovered by deforming the box. The architecture of SDM-NET is that of a two-level variational autoencoder (VAE). At the part level, a PartVAE learns a deformable model of part geometries. At the structural level, we train a Structured Parts VAE (SP-VAE), which jointly learns the part structure of a shape collection and the part geometries, ensuring a coherence between global shape structure and surface details.
We introduce a method to automatically compute LEGO Technic models from user input sketches, optionally with motion annotations. The generated models resemble the input sketches with coherently-connected bricks and simple layouts, while respecting the intended symmetry and mechanical properties expressed in the inputs. This complex computational assembly problem involves an immense search space, and a much richer brick set and connection mechanisms than regular LEGO. To address it, we first comprehensively model the brick properties and connection mechanisms, then formulate the construction requirements into an objective function, accounting for faithfulness to input sketch, model simplicity, and structural integrity. Next, we model the problem as a sketch cover, where we iteratively refine a random initial layout to cover the input sketch, while guided by the objective. At last, we provide a working system to analyze the balance, stress, and assemblability of the generated model.
We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of point-wise displacements for the input shape. At the same time, the displacements allow the network to learn moveable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per part motion parameters, from the segmented motion sequence.
We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task. The unsupervised BAE-NET is trained with all shapes in an input collection using a shape reconstruction loss, without ground-truth segmentations. Specifically, the network takes an input shape and encodes it using a convolutional neural network, whereas the decoder concatenates the resulting feature code with a point coordinate and outputs a value indicating whether the point is inside/outside the shape. Importantly, the decoder is branched: each branch learns a compact representation for one commonly recurring part of the shape collection, e.g., airplane wings. By complementing the shape reconstruction loss with a label loss, BAE-NET is easily tuned for one-shot learning.
Data-driven generative modeling has made remarkable
progress by leveraging the power of deep neural networks.
A reoccurring challenge is how to sample a rich variety
of data from the entire target distribution, rather than only
from the distribution of the training data. In other words, we
would like the generative model to go beyond the observed
training samples and learn to also generate "unseen" data.
In our work, we present a generative neural network for
shapes that is based on a part-based prior, where the key
idea is for the network to synthesize shapes by varying both
the shape parts and their compositions.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for
shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each
point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this
assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding
a shape, and outputs a value which indicates whether the point is outside the shape or not ...
We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grouping during its encoding phase and scene generation during decoding.
We introduce the use of qualitative analysis and active learning to photo album construction. Given a heterogeneous collection of pho- tos, we organize them into a hierarchical categorization tree (C-tree) based on qualitative analysis using quartets instead of relying on conventional, quantitative image similarity metrics. The main moti- vation is that in a heterogeneous collection, quantitative distances may become unreliable between dissimilar data and there is unlikely a single metric that is well applicable to all data.
We present a novel method to produce discernible image mosaics, with relatively large image tiles replaced by images drawn from a database, to resemble a target image. Since visual edges strongly support content perception, we compose our mosaic via edge-aware photo retrieval to best preserve visual edges in the target image. Moreover, unlike most previous works which apply a pre-determined partition to an input image, our image mosaics are composed by adaptive tiles, whose sizes are determined based on the available images and an objective of maximizing resemblance to the target.
We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category.
We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss.
We present a fully automatic method that finds a small number of machine fabricable wires with minimal overlap to reproduce a wire sculpture design as a 3D shape abstraction. Importantly, we consider non-planar wires, which can be fabricated by a wire bending machine, to enable efficient construction of complex 3D sculptures that cannot be achieved by previous works. We call our wires Eulerian wires, since they are as Eulerian as possible with small overlap to form the target design together.
We study a new and elegant instance of geometric dissection of 2D shapes: reversible hinged dissection, which corresponds to a dual transform between two shapes where one of them can be dissected in its interior and then inverted inside-out, with hinges on the shape boundary, to reproduce the other shape, and vice versa. We call such a transform reversible inside-out transform or RIOT. Since it is rare for two shapes to possess even a rough RIOT, let alone an exact one, we develop both a RIOT construction algorithm and a quick filtering mechanism to pick, from a shape collection, potential shape pairs that are likely to possess the transform. Our construction algorithm is fully automatic. It computes an approximate RIOT between two given input 2D shapes, whose boundaries can undergo slight deformations, while the filtering scheme picks good inputs for the construction.
We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing existing 3D scenes. We perform edits by first parsing a natural language command from the user and trans- forming it into a semantic scene graph that is used to retrieve corresponding sub-scenes from the databases that match the command. We then augment this retrieved sub-scene by incorporating other objects that may be implied by the scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene with the user’s current scene, where new objects are spliced into the environment, possibly triggering appropriate adjustments to the existing scene arrangement.
We introduce a computational solution for cost-efficient 3D fabrication using universal building blocks. Our key idea is to employ a set of universal blocks, which can be massively prefabricated at a low cost, to quickly assemble and constitute a significant internal core of the target object, so that only the residual volume need to be 3D printed online. We further improve the fabrication efficiency by decomposing the residual volume into a small number of printing-friendly pyramidal pieces.
We contribute the first large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition. We demonstrate the potential impact of SketchyScene by training new computational models for semantic segmentation of scene sketches and showing how the new dataset enables several applications including image retrieval, sketch colorization, editing, and captioning, etc. We will release the complete crowdsourced dataset to the community.
We introduce P2P-NET, a general-purpose deep neural network which learns geometric transformations between point-based shape representations from two domains, e.g., meso-skeletons and surfaces, partial and complete scans, etc. The architecture of the P2P-NET is that of a bi-directional point dis- placement network, which transforms a source point set to a prediction of the target point set with the same cardinality, and vice versa, by applying point-wise displacement vectors learned from data. P2P-NET is trained on paired shapes from the source and target domains, but without relying on point-to-point correspondences between the source and target point sets. The training loss combines two uni-directional geometric losses, each enforc- ing a shape-wise similarity between the predicted and the target point sets, and a cross-regularization term to encourage consistency between displace- ment vectors going in opposite directions.
Humans can predict the functionality of an object even without any surroundings, since their knowledge and experience would allow them to "hallucinate" the interaction or usage scenarios involving the object. We develop predictive and generative deep convolutional neural networks to replicate this feat. Our networks are trained on a database of scene contexts, called interaction contexts, each consisting of a central object and one or more surrounding objects, that represent object functionalities. Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts involving the object. fSIM-NET is complemented by a generative network (iGEN-NET) and a segmentation network (iSEG-NET). iGEN-NET takes a single voxelized 3D object and synthesizes a voxelized surround, i.e., the interaction context which visually demonstrates the object's functionalities. iSEG-NET separates the interacting objects into different groups according to their interaction types.
We present an automatic algorithm for subtractive manufacturing of freeform 3D objects
using high-speed CNC machining. Our method decomposes the input object's surface into a small number
of patches each of which is fully accessible and machinable by the CNC machine, in continuous fashion,
under a fixed drill-object setup configuration. This is achieved by covering the input surface using a
minimum number of accessible regions and then extracting a set of machinable patches from each accessible
region. For each patch obtained, we compute a continuous, space-filling, and iso-scallop tool path,
in the form of connected Fermat spirals, which conforms to the patch boundary. Furthermore, we develop a
novel method to control the spacing of Fermat spirals based on directional surface curvature and adapt the
heat method to obtain iso-scallop carving.
We present a semi-supervised co-analysis method for learning 3D shape styles from projected feature lines, achieving style patch localization with only weak supervision. Given a collection of 3D shapes spanning multiple object categories and styles, we perform style co-analysis over projected feature lines of each 3D shape and then backproject the learned style features onto the 3D shapes.
Shape dissimilarity is a fundamental problem with many applications such as shape exploration, retrieval, and classification. Given a collection of shapes, all existing methods develop a consistent global metric to compareand organize shapes. The global nature of the involved shape descriptors implies that overall shape appearanceis compared. These methods work well to distinguishshapes from different categories, but often fail for fine-grained classes within the same category. In this paper, we develop a dissimilarity metric for fine-grained classes by fusing together multiple distinctive metrics for different classes. The fused metric measures the dissimilarities among inter-class shapes by observing their unique traits.
We introduce a method for learning a model for the mobility of parts in 3D objects. Our method allows not only to understand the dynamic function- alities of one or more parts in a 3D object, but also to apply the mobility functions to static 3D models. Specifically, the learned part mobility model can predict mobilities for parts of a 3D object given in the form of a single static snapshot reflecting the spatial configuration of the object parts in 3D space, and transfer the mobility from relevant units in the training data ...
We introduce a deep learning approach for grouping discrete patterns common in graphical designs. Our approach is based on a convolutional neural network architecture that learns a grouping measure defined over a pair of pattern elements. Motivated by perceptual grouping principles, the key feature of our network is the encoding of element shape, context, symmetries, and structural arrangements. These element properties are all jointly considered and appropriately weighted in our grouping measure ...
Conditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators.
We introduce a shape modeling tool, ExquiMo, which is guided by the idea of improving the creativity of 3D shape designs through collaboration. Inspired by the game of Exquisite Corpse, our tool allocates distinct parts of a shape to multiple players who model the assigned parts in a sequence. Our approach is motivated by the understanding that effective surprise leads to creative outcomes. Hence, to maintain the surprise factor of the output, we conceal the previously modeled parts from the most recent player. Part designs from individual players are fused together to produce an often unexpected, hence creative, end result ...
Many approaches to shape comparison and recognition start by establishing a shape correspondence.
We ``turn the table'' and show that quality shape correspondences can be obtained by performing
many shape recognition tasks. What is more, the method we develop computes a
fine-grained, topology-varying part correspondence between two 3D shapes
where the core evaluation mechanism only recognizes shapes globally. This is made
possible by casting the part correspondence problem in a deformation-driven framework
and relying on a data-driven ``deformation energy'' which rates visual similarity
between deformed shapes and models from a shape repository. Our basic premise is that
if a correspondence between two chairs (or airplanes, bicycles, etc.) is correct, then a
reasonable deformation between the two chairs anchored on the correspondence ought
to produce plausible, ``chair-like'' in-between shapes.
We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures the hierarchical structures of varying complexity despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes.
We introduce a method for co-locating style-defining elements over a set of 3D shapes. Our goal is to translate high-level style descriptions, such as "Ming" or "European" for furniture models, into explicit and localized regions over the geometric models that characterize each style. For each style, the set of style-defining elements is defined as the union of all the elements that are able to discriminate the style. Another property of the style-defining elements is that they are frequently-occurring, reflecting shape characteristics that appear across multiple shapes of the same style ...
We introduce a framework for action-driven evolution of 3D indoor scenes, where the goal is to simulate how scenes are altered by human actions, and specifically, by object placements necessitated by the actions. To this end, we develop an action model with each type of action combining information about one or more human poses, one or more object categories, and spatial configurations of object-object and object-human relations for the action. Importantly, all these pieces of information are learned from annotated photos.
We propose an interactive system that aims at lifting a 2D sketch
into a 3D sketch with the help of existing models in shape collections.
The key idea is to exploit part structure for shape retrieval and
sketch reconstruction. We adopt sketch-based shape retrieval and
develop a novel matching algorithm which considers structure in
addition to traditional shape features.
We present a data-driven method for synthesizing 3D indoor scenes by inserting objects
progressively into an initial, possibly, empty scene. Instead of relying on few hundreds of
hand-crafted 3D scenes, we take advantage of existing large-scale annotated RGB-D datasets,
in particular, the SUN RGB-D database consisting of 10,000+ depth images of real scenes, to
form the prior knowledge for our synthesis task. Our object insertion scheme follows a
co-occurrence model and an arrangement model, both learned from the SUN dataset.
We introduce a co-analysis method which learns a functionality model for an object category, e.g.,
strollers or backpacks. Like previous works on functionality, we analyze object-to-object interactions
and intra-object properties and relations. Differently from previous works, our model goes beyond
providing a functionalityoriented descriptor for a single object; it prototypes the functionality of
a category of 3D objects by co-analyzing typical interactions involving objects from the category.
A calligram is an arrangement of words or letters that creates a visual image, and a
compact calligram fits one word into a 2D shape. We introduce a fully automatic method
for the generation of legible compact calligrams which provides a balance between
conveying the input shape, legibility, and aesthetics.
We develop a new kind of "space-filling" curves, connected Fermat spirals, and show their
compelling properties as a tool path fill pattern for layered fabrication. Unlike classical
space-filling curves such as the Peano or Hilbert curves, which constantly wind and bind to
preserve locality, connected Fermat spirals are formed mostly by long, low-curvature paths.
This geometric property, along with continuity, influences the quality and efficiency of
layered fabrication.
We introduce a novel approach to measure similarity between two 3D shapes based on sparse
reconstruction of shape descriptors. The main feature of our approach is its applicability
to handle incomplete shapes. We characterize the shapes by learning a sparse dictionary
from their local descriptors. The similarity between two shapes A and B is
defined by the error incurred when reconstructing B's descriptor set using the
basis signals from A’s dictionary.
An intriguing and reoccurring question in many branches of computer science is whether machines can be creative, like humans.
In this exploratory paper, we examine the problem from a computer graphics, and more specifically, geometric modeling,
perspective. We focus our discussions on the weaker but still intriguing question: "Can machines assist or inspire humans in
a creative endeavor for the generation of geometric forms?"
We present a deformation-driven approach to topology-varying 3D shape correspondence. In this paradigm,
the best correspondence between two shapes is the one that results in a minimal-energy, possibly
topology-varying, deformation that transforms one shape to conform to the other while respecting the
correspondence. Our deformation model allows both geometric and topological operations such as part split,
duplication, and merging ...
We pose the decompose-and-pack or DAP problem, which tightly combines shape decomposition and packing.
While in general, DAP seeks to decompose an input shape into a small number of parts which can be efficiently
packed, our focus is geared towards 3D printing. The goal is to optimally decompose-and-pack a 3D object
into a printing volume to minimize support material, build time, and assembly cost. We present Dapper,
a global optimization algo- rithm for the DAP problem which can be applied to both powder- and FDM-based 3D printing.
Decomposing a complex shape into geometrically simple primitives is a fundamental problem in geometry processing.
We are interested in a shape decomposition problem where the simple primitives sought are generalized cylinders.
We introduce a quantitative measure of cylindricity for a shape part and develop a cylindricity-driven optimization
algorithm, with a global objective function, for generalized cylinder decomposition.
We introduce a contextual descriptor which aims to provide a geometric description of the functionality of a 3D
object in the context of a given scene. Differently from previous works, we do not regard functionality as an
abstract label or represent it implicitly through an agent. Our descriptor, called interaction context or ICON
for short, explicitly represents the geometry of object-to-object interactions. Our approach to object functionality
analysis is based on the key premise that functionality should mainly be derived from interactions between objects and not objects in isolation.
We introduce the foldabilization problem for space-saving furniture design. Namely, given a 3D object representing
a piece of furniture, the goal is to apply a minimum amount of modification to the object so that it can be folded
to save space —-- the object is thus foldabilized. We focus on one instance of the problem where folding is with
respect to a prescribed folding direction and allowed object modifications include hinge insertion and part shrinking.
We develop an automatic algorithm for foldabilization by formulating and solving a nested optimization problem ...
In this paper, we are interested in the problem of 3D shape retrieval where the query
shape is incomplete with moderate to significant portions of the original shape missing. The key idea of our method
is to grasp the basis local descriptors for each shape in the retrieved database by sparse dictionary learning
and apply them in sparsely coding the local descriptors of an incomplete query
We cover techniques designed for compaction of
shape representations or shape configurations. The goal of compaction is to reduce
storage space, a fundamental problem in many application domains.
Compaction of shape representations focuses on reducing
the memory space allocated for storing the shape geometry data digitally, whilst shape compaction
techniques in the physical domain reduce the physical space occupied by
shape configurations ...
A Sampler of Useful Computational Tools for Applied Geometry, Computer Graphics, and Image Processing shows how to use a collection of mathematical techniques to solve important problems in applied mathematics and computer science areas. The book discusses fundamental tools in analytical geometry ...
Enhancing the self-symmetry of a shape is of fundamental aesthetic virtue. In this paper, we are interested in recov- ering the aesthetics of intrinsic reflection symmetries, where an asymmetric shape is symmetrized while keeping its general pose and perceived dynamics. The key challenge to intrinsic symmetrization is that the input shape has only approximate reflection symmetries, possibly far from perfect. The main premise of our work is that curve skeletons provide a concise and effective shape abstraction for analyzing approximate intrinsic symmetries as well as symmetrization. By measuring intrinsic distances over a curve skeleton for symmetry analysis, symmetrizing the skeleton, and then propagating the symmetrization from skeleton to shape, our approach to shape symmetrization is skeleton-intrinsic ...
We present a distillation algorithm which operates on a large, unstructured, and noisy collection of internet images
returned from an online object query. We introduce the notion of a distilled set, which is a clean, coherent, and
structured subset of inlier images. In addition, the object of interest is properly segmented out throughout the
distilled set. Our approach is unsupervised, built on a novel clustering scheme, and solves the distillation and
object segmentation problems simultaneously. In essence, instead of distilling the collection of images, we distill
a collection of loosely cutout foreground "shapes", which may or may not contain the queried object. Our key
observation, which motivated our clustering scheme, is that outlier shapes are expected to be random in nature,
whereas, inlier shapes, which do tightly enclose the object of interest, tend to be well supported by similar shapes
captured in similar views ...
We introduce indirect shape analysis, or ISA, where a given shape is analyzed not based on geometric or topological features
computed directly from the shape itself, but by studying how external agents interact with the shape. The potential benefits
of ISA are two-fold. First, agent-object interactions often reveal an object’s function, which plays a key role in shape
understanding. Second, compared to direct shape analysis, ISA, which utilizes pre-selected agents, is less affected by
imperfections of, or inconsistencies between, the geometry or topology of the analyzed shapes. We employ digital human models
as the external agents and develop a prototype ISA scheme for 3D shape classification and retrieval ...
A shape is pyramidal if it has a flat base with the remaining boundary forming a height function over
the base. Pyramidal shapes are optimal for molding, casting, and layered 3D printing. We introduce an
algorithm for approximate pyramidal shape decomposition. The general exact pyramidal decomposition
problem is NP-hard. We turn this problem into an NP-complete Exact Cover Problem which admits a practical solution
... Our solution is equally applicable to 2D or 3D shapes, to shapes with polygonal or smooth boundaries, with or without holes ...
We present an interactive technique for surface reconstruction from incomplete and sparse
scans of 3D objects possessing sharp features ...
We factor 3D editing by the user into two "orthogonal" interactions acting on skeletal and
profile curves of the underlying shape, controlling its topology and geometric features,
respectively. For surface completion, we introduce a novel skeleton-driven morph-to-fit,
or morfit, scheme which reconstructs the shape as an ensemble of generalized cylinders.
Morfit is a hybrid operator which optimally interpolates between adjacent curve profiles
(the "morph") and snaps the surface to input points (the "fit") ...
We introduce an algorithm for generating novel 3D models via topology-varying
shape blending. Given a source and a target shape, our method blends them topologically
and geometrically, producing continuous series of in-betweens as new shape creations. The blending
operations are defined on a spatio-structural graph composed of medial curves and sheets. Such a shape abstraction is
structure-oriented, part-aware, and facilitates topology manipulations. Fundamental topological operations
including split and merge are realized by allowing one-to-many correspondences between
the source and the target ...
We introduce focal points for characterizing, comparing, and organizing collections of complex and
heterogeneous data and apply the concepts and algorithms developed to collections of 3D indoor scenes.
We represent each scene by a graph of its constituent objects and define focal points as representative
substructures in a scene collection. To organize a heterogenous scene collection, we cluster the scenes
based on a set of extracted focal points: scenes in a cluster are closely connected when viewed from the
perspective of the representative focal points of that cluster ... The problem of focal point extraction is
intermixed with the problem of clustering groups of scenes based on their representative focal points.
We present a co-analysis algorithm ...
We introduce the use of sparse representation for edit propagation of
high-resolution images or video. Previous approaches for edit propagation typically
employ a global optimization over the whole set of image pixels, incurring a
prohibitively high memory and time consumption for high-resolution
images. Rather than propagating an edit pixel by pixel, we follow the principle of
sparse representation to obtain a compact set of representative samples (or features)
and perform edit propagation on the samples instead ...
We introduce spectral Global Intrinsic Symmetry Invariant Functions (GISIFs), a class
of GISIFs obtained via eigendecomposition of the Laplace-Beltrami operator on compact
Riemannian manifolds. We discretize the spectral GISIFs for 2D manifolds approximated
either by triangle meshes or point clouds. In contrast to GISIFs obtained from
geodesic distances, our spectral GISIFs are robust to local topological changes.
Additionally, for symmetry analysis our spectral GISIFs can be viewed as generalizations
of the classical Heat (HKSs) and Wave Kernel Signatures (WKSs), and, as such, represent
a more expressive and versatile class of functions ...
We introduce projective analysis for semantic segmentation and labeling of 3D shapes.
The analysis treats an input 3D shape as a collection of 2D projections, label each
projection by transferring knowledge from existing labeled images, and back-projects
and fuses the labelings on the 3D shape ...
Projective analysis simplifies the processing task by working in a lower-dimensional
space, circumvents the requirement of having complete and well-modeled 3D shapes,
and addresses the data challenge for 3D shape analysis by leveraging the massive
image data.
The four metrics adopted by the well-known Princeton Segmentation
Benchmark have been extensively applied to evaluate mesh segmentation
algorithms. However, comparison to only a single ground-truth is problematic
since one object may have multiple semantic segmentations. We propose two novel
metrics to support comparison with multiple ground-truth mesh segmentations,
which we call Similarity Hamming Distance (SHD) and Adaptive Entropy Increment (AEI) ...
We present an algorithm for hierarchical and layered analysis of irregular facades,
seeking a high-level understanding of facade structures. By introducing layering into
the analysis, we no longer view a facade as a flat structure, but allow it to be
structurally separated into depth layers, enabling more compact and natural
interpretations of building facades. Computationally, we perform a symmetry-driven
search for an optimal hierarchical decomposition defined by split and layering
operations applied to an input facade. The objective is symmetry maximization ...
We introduce an unsupervised co-hierarchical analysis of a set of shapes,
aimed at discovering their hierarchical part structures and revealing relations
between geometrically dissimilar yet functionally equivalent shape parts across
the set. The core problem is that of representative co-selection. For each shape
in the set, one representative hierarchy (tree) is selected from among many possible
interpretations of the hierarchical structure of the shape. Collectively, the
selected tree representatives maximize the within-cluster structural similarity among them.
We present a method for organizing a heterogeneous collection of 3D shapes
for overview and exploration. Instead of relying on quantitative distances,
which may become unreliable between dissimilar shapes, we introduce a qualitative
analysis which utilizes multiple distance measures but only in cases where the
measures can be reliably compared. Our analysis is based on the notion of quartets,
each defined by two pairs of shapes, where the shapes in each pair are close to
each other, but far apart from the shapes in the other pair.
We introduce L1-medial skeleton as a curve skeleton representation for 3D point
cloud data. The L1-median is well-known as a robust global center of an arbitrary
set of points. We make the key observation that adapting L1-medians locally to
a point set representing a 3D shape gives rise to a one-dimensional structure, which can
be seen as a localized center of the shape ...
In this survey paper, we organize, summarize, and present the key concepts and methodological approaches towards efficient structure-aware shape processing. We discuss common models of structure, their implementation in terms of mathematical formalism and algorithms, and explain the key principles in the context of a number of state-of- the-art approaches. Further, we attempt to list the key open problems and challenges, both at the technical and at the conceptual level, to make it easier for new researchers to better explore and contribute to this topic.
We present a skeleton-based algorithm for intrinsic symmetry detection
on imperfect 3D point cloud data. The data imperfections such as noise
and incompleteness make it difficult to reliably compute geodesic distances,
We introduce L1-medial skeleton as a curve skeleton representation for 3D point
cloud data. The L1-median is well-known as a robust global center of an arbitrary
set of points. We make the key observation that adapting L1-medians locally to
a point set representing a 3D shape gives rise to a one-dimensional structure, which can
be seen as a localized center of the shape ...
We introduce bilateral map, a local shape descriptor whose
region of interest is defined by two feature points. Compared
to the classical descriptor definition using single points, the
bilateral approach exploits the use of a second point to place more
constraints on the selection of the spatial context for feature
analysis. This leads to a descriptor where the shape of the region of
interest is anisotropic and adapts to the context of the two points,
making it more refined for shape analysis, in particular, partial matching.
We pose the open question "how to extract
styles from geometric shapes?" and address one instance of the problem. Specifically, we present an unsupervised
algorithm for identifying curve styles in a set of shapes ...
We propose a resampling approach to
process a noisy and possibly outlier-ridden point set in an edge-aware manner.
Our key idea is to first resample away from the edges so that reliable normals can be
computed at the samples, and then based on reliable data, we
progressively resample the point set while approaching the edge singularities ...
We present an algorithm for multi-scale partial intrinsic symmetry detection over 2D
and 3D shapes, where the scale of a symmetric region is defined by
intrinsic distances between symmetric points over the region. To identify prominent
symmetric regions which overlap and vary in form and scale, we decouple scale extraction
and symmetry extraction by performing two levels of clustering. First, significant
symmetry scales are identified by clustering sample point pairs from an input shape ...
We introduce the geometric problem of stackabilization: how to geometrically modify a
3D object so that it is more amenable to stacking. Given a 3D object and a stacking direction,
we define a measure of stackability, which is derived from the gap between the lower
and upper envelopes of the object in a stacking configuration along the stacking
direction. The main challenge in stackabilization lies in the desire to modify the object's
geometry only subtly so that the intended functionality and aesthetic appearance of the
original object are not significantly affected ...
We present an automatic shape composition method to fuse two shape parts which may
not overlap and possibly contain sharp features, a scenario often encountered when
modeling man-made objects. At the core of our method is a novel field-guided
approach to automatically align two input parts in a feature-conforming manner.
The key to our field-guided shape registration is a natural continuation of
one part into the ambient field as a means to introduce an overlap with
the distant part, which then allows a surface-to-field registration ...
We consider the use of a semi-supervised learning method where the user actively
assists in the co-analysis by iteratively providing input that progressively
constrains the system. We introduce a novel constrained clustering method based
on a spring system which embeds elements to better respect their inter-distances
in feature space together with the user given set of constraints. We also present an
active learning method that suggests to the user where his input is
likely to be the most effective in refining the results.
We introduce a new type of meshes called 5-6-7 meshes, analyze their properties, and
present a 5-6-7 remeshing algorithm. A 5-6-7 mesh is a closed triangle mesh where
each vertex has valence 5, 6, or 7. We prove that it is always possible to convert
an arbitrary mesh into a 5-6-7 mesh. We present a remeshing algorithm which converts
a closed triangle mesh with arbitrary genus into a 5-6-7 mesh which a) closely
approximates the original mesh geometrically, e.g., in terms of feature preservation,
and b) has a comparable vertex count as the original mesh.
We formulate the skeletonization problem via mean curvature flow (MCF). While the classical application
of MCF is surface fairing, we take advantage of its area-minimizing characteristic to drive the curvature
flow towards the extreme so as to collapse the input mesh geometry and obtain a skeletal structure. By
analyzing the differential characteristics of the flow, we reveal that MCF locally increases shape
anisotropy. This justifies the use of curvature motion for skeleton computation, and leads to the
generation of what we call "mean curvature skeletons" ...
We introduce set evolution as a means for creative 3D shape modeling, where an
initial population of 3D models is evolved to produce generations of novel shapes.
Part of the evolving set is presented to a user as a shape gallery to offer modeling suggestions.
User preferences define the fitness for the evolution so
that over time, the shape population will mainly consist of individuals with good
fitness. However, to inspire the user's creativity, we must also keep the evolving
set diverse. Hence the evolution is ``fit and diverse'' ...
A 5-6-7 mesh is a closed triangle mesh where each vertex has valence 5,
6, or 7. An intriguing question is whether it is always possible to convert an arbitrary mesh into a 5-6-7 mesh. In this
paper, we answer the question in the positive. We present a 5-6-7 remeshing algorithm which converts any
closed triangle mesh with arbitrary genus into a 5-6-7 mesh which a) closely approximates the original mesh
geometrically, e.g., in terms of feature preservation, and b) has a comparable vertex count as the original
mesh.
Empirical Mode Decomposition (EMD) is a powerful tool for the analysis of non-stationary and
nonlinear signals, and has drawn a great deal of attention in various areas. In this paper,
we generalize the classical EMD from Euclidean space to surfaces represented as triangular meshes.
Inspired by the EMD, we also make a first step in using the extremal envelope method for
feature-preserving smoothing.
We propose a simple and efficient method that helps create model
variations by applying non-uniform stretching on 3D models with
organic geometric details. The method replicates the geometric
details and synthesizes extensions by adopting texture synthesis
techniques on surface details.
We introduce an algorithm for unsupervised co-segmentation of a set of
shapes so as to reveal the semantic shape parts and establish
their correspondence across the set.
Our algorithm exploits a key enabling feature of the input set, namely,
dissimilar parts may be ``linked'' through third-parties present in the
set ...
We present an algorithm for interactive structure-preserving retargeting
of irregular 3D architecture models, offering the modeler an easy-to-use
tool to quickly generate a variety of 3D models that resemble an input piece in its structural
style ...
Objects with many concavities are difficult to acquire using laser scanners.
The resulting point scan typically suffers from large amounts of missing data.
We introduce weak volumetric priors which assume that the
volume of a shape varies smoothly and that each point cloud sample is
visible from outside the shape. Specifically, the union of view-rays
given by the scanner implicitly carves the exterior volume, while
volumetric smoothness regularizes the internal volume.
We introduce an algorithm for 3D object modeling where the user draws creative inspiration from
an object captured in a single photograph. Our method leverages the rich source of photographs
for creative 3D modeling. However, with only a photo as a guide, creating a 3D model from
scratch is a daunting task. We support the modeling process by utilizing an available set of 3D
candidate models. Specifically, the user creates a digital 3D model as a geometric variation
from a 3D candidate.
We present an algorithm to compute the silhouette set of a point cloud. Previous
methods extract point set silhouettes by thresholding point normals, which can
lead to simultaneous over- and under-detection of silhouettes. We argue that
additional information such as surface curvature is necessary to resolve these
issues. To this end, we develop a local reconstruction scheme using Gabriel and
intrinsic Delaunay criteria and defi?ne point set silhouettes
based on the notion of a silhouette generating set ...
We introduce symmetry hierarchy of man-made objects, a high-level structural
representation of a 3D model providing a symmetry-induced, hierarchical
organization of the model's constituent parts. We show that symmetry hierarchy
naturally implies a hierarchical segmentation that is more meaningful than those
produced by local geometric considerations. We also develop an application of
symmetry hierarchies for structural shape editing.
We stipulate that under challenging scenarios, shape correspondence by
humans involves recognition of the shape parts where prior knowledge on the
parts would play a more dominant role than geometric similarity.
We introduce an approach to part correspondence which incorporates prior
knowledge and combines the knowledge with content-driven analysis based on
geometric similarity between the matched shapes ...
We review methods that are designed to compute correspondences between
geometric shapes represented by triangle meshes, contours, or point sets.
This survey is motivated in part by some recent developments in space-time
registration, where one seeks to correspond non-rigid and time-varying
surfaces, and semantic shape analysis, which underlines a recent trend to
incorporate shape understanding into the analysis pipeline ...
We present an algorithm for computing families
of geodesic curves over an open mesh patch to partition
the patch into strip-like segments. Specifically, the
segments can be well approximated using strips obtained
by trimming long, rectangular pieces of material possessing
a prescribed width. We call this width-bounded
geodesic strip tiling of a curved surface, a problem with
practical applications such as the surfacing of curved
roofs.
We perform co-analysis of a set of man-made 3D objects to allow the
creation of novel instances derived from the set. We analyze the objects
at the part level and treat the anisotropic part scales as a shape
style. The co-analysis then allows style transfer to synthesize new
objects. The key to co-analysis is part correspondence, where a major
challenge is the handling of large style variations and diverse geometric
content in the shape set. We propose style-content separation as
a means to address this challenge ...
We present cone carving, a novel space carving technique towards
topologically correct surface reconstruction from an incomplete scanned
point cloud. The technique utilizes the point samples not only for local
surface position estimation but also to obtain global visibility
information under the assumption that each acquired point is visible from
a point laying outside the shape. This enables associating each point
with a generalized cone, called the visibility cone, that carves a
portion of the outside ambient space of the shape from the inside out.
In this paper, we perform active laser scanning of real world vegetation
and present an automatic approach that robustly reconstructs skeletal
structures of trees, from which full geometry can be generated. The
core of our method is a series of {\it global optimizations} that
fit skeletal structures to the often sparse, incomplete,
and noisy point data. A significant benefit of our approach is its
ability to reconstruct multiple overlapping trees simultaneously
without segmentation.
We introduce an interactive tool which enables a user to quickly
assemble an architectural model directly over a 3D point cloud
acquired from large-scale scanning of an urban scene. The user loosely
defines and manipulates simple building blocks, which we call
SmartBoxes, over the point samples. These boxes quickly snap to
their proper locations to conform to common architectural
structures. The key idea is that the building blocks are smart ...
We address the problem of finding analogies between parts of 3D objects.
By partitioning an object into meaningful parts and finding analogous
parts in other objects, not necessarily of the same type, based on a
contextual signature, many analysis and modeling tasks could be enhanced
...
We present an algorithm for curve skeleton
extraction via Laplacian-based contraction. Our
algorithm can be applied to surfaces with boundaries,
polygon soups, and point clouds. We develop
a contraction operation that is designed to work
on generalized discrete geometry data, particularly
point clouds, via local Delaunay triangulation and
topological thinning ...
We provide the first comprehensive survey on spectral mesh processing.
Spectral methods for mesh processing and analysis rely on eigenvalues,
eigenvectors, or eigenspace projections derived from appropriately defined
mesh operators to carry out desired tasks ...
Supraspinatus muscle disorders are frequent and debilitating, resulting in
pain and a limited range of shoulder motion. The gold standard for
diagnosis involves an invasive surgical procedure ... we present a method
to classify 3D shapes of the muscle into the relevant pathology groups,
based on MRIs. The method learns the Fourier coefficients that best
distinguish the different classes ...
We present a review of the correspondence problem targeted towards the
computer graphics audience. This survey is motivated by recent developments
such as advances in the correspondence of non-rigid or isometric shapes and
methods that extract semantic information from the shapes ...
We introduce the notion of consensus skeletons for non-rigid space-time registration
of a deforming shape. Instead of basing the registration on point features, which are
local and sensitive to noise, we adopt the curve skeleton of the shape as a global and
descriptive feature for the task. Our method uses no template and only assumes
that the skeletal structure of the captured shape remains largely consistent over time
...
While many 3D objects around us exhibit various forms of global
symmetries, prominent intrinsic symmetries which exist only on parts of an
object are also well recognized ... In this paper, we introduce algorithms
to extract and utilize partial intrinsic reflectional symmetries (PIRS) of
a 3D shape ...
We consolidate an unorganized point cloud with noise, outliers,
non-uniformities, and interference between close-by surface sheets as a
preprocess to surface generation ... First, we present a weighted locally
optimal projection operator ... Next, we introduce an iterative framework
for robust normal estimation, ...
We explore the use of salient curves in synthesizing natural-looking,
shape-revealing textures on surfaces. Our synthesis is guided by two
principles: matching the direction of the texture patterns to those of the
salient curves, and aligning the prominent feature lines in the texture to
the salient curves exactly ...
We undertake a study of the local properties of 2-Gabriel meshes. We show
that, under mild constraints on the dihedral angles, such meshes are
Delaunay meshes. The analysis is done by means of the Delaunay edge
flipping algorithm and it reveals the details of the distinction between
these two mesh structures ...
We present an algorithm for curve skeleton extraction from imperfect point
clouds where large portions of the data may be missing. Our construction
is primarily based on a novel notion of generalized rotational symmetry
axis (ROSA) of a point set with normals, via a variational formulation ...
We propose a method for fast updating of harmonic fields defined on
polygonal meshes, enabling real-time insertion and deletion of
constraints. Our approach utilizes the penalty method to enforce
constraints in harmonic field computation. It maintains the symmetry
of the Laplacian system ...
We present a face recognition method based on sparse representation for
recognizing 3D face meshes under expressions using low-level geometric
features ... To handle facial expressions, we design a feature pooling and
ranking scheme to collect various types of low-level geometric features
and rank them ...
We define quality differential coordinates (QDC) for per-vertex
encoding of the quality of a tetrahedral mesh. Our formulation allows the
incorporation of element quality metrics into QDC construction to penalize
badly shaped and inverted tetrahedra ...
The notion of parts in a shape plays an important role in many geometry
problems. At the same time, many such problems utilize a surface metric to
assist shape analysis and understanding. The main contribution of our work
is to bring together these two fundamental concepts ...
We introduce a novel class of distance fields for a given surface defined
by its tangent planes. At each point in space, we assign a scalar value
which is a weighted sum of distances to these tangent planes. We use four
applications to illustrate the benefit of using the resulting TDF scalar
field: view point selection, ...
We develop adaptive sampling criteria which guarantee a topologically
faithful mesh and demonstrate an improvement and simplification over
earlier results, albeit restricted to 2D surfaces. These sampling
criteria are based on the strong convexity radius and the injectivity
radius ...
We present an automatic feature correspondence algorithm capable of
handling large, non-rigid shape variations, as well as partial matching ...
The search is deformation-driven, prioritized by a self-distortion energy
measured on meshes deformed according to a given correspondence ...
We look at a particular instance of the convex decomposition problem
which arises from real-world game development. Given a collection of
polyhedral surfaces (possibly with boundaries, holes, and complex interior
structures) that model the scene geometry in a game environment, we wish
to find a small set of convex hulls ...
We formulate contour correspondence as a Quadratic Assignment Problem
(QAP), incorporating proximity information. By maintaining the
neighborhood relation between points this way, we show that better
matching results are obtained in practice. We propose the first Ant Colony
Optimization (ACO) algorithm ...
We present algorithms to produce Delaunay meshes from arbitrary triangle
meshes by edge flipping and geometry-preserving refinement and prove their
correctness. In particular we show that edge flipping serves to reduce
mesh surface area, and that a poorly sampled input mesh may yield
unflippable edges necessitating refinement ...
Spectral methods for mesh processing and analysis rely on the eigenvalues,
eigenvectors, or eigenspace projections derived from appropriately defined
mesh operators to carry out desired tasks. This state-of-the-art report
aims to provide a comprehensive survey on the spectral approach ...
We propose a mesh segmentation algorithm where at each step, a sub-mesh
embedded in 3D is first spectrally projected into the plane with a contour
extracted from the planar embedding. Transforming the shape analysis
problem to the 2D domain facilitates our segmentability analysis and
sampling tasks ...
We investigate the use of multiple intrinsic geometric attributes,
including angles, geodesic distances, and curvatures, for 3D face
recognition ... As invariance to facial expressions holds the key to
improving recognition performance, we propose to train for the
component-wise weights ...
We define a Delaunay mesh to be a manifold triangle mesh whose edges form
an intrinsic Delaunay triangulation or iDT of its vertices ... We show
that meshes constructed from a smooth surface by taking an iDT or a
restricted Delaunay triangulation, do not in general yield a Delaunay
mesh ...
We present an algorithm for finding a meaningful correspondence between
two triangle meshes, which is designed to handle general non-rigid
transformations. Our algorithm operates on embeddings of the two shapes in
the spectral domain so as to normalize them with respect to uniform
scaling and rigid-body transformation.
We present an approach for robust shape retrieval from databases
containing articulated 3D models. Each shape is represented by the
eigenvectors of an appropriately defined affinity matrix, forming a
spectral embedding which achieves normalization against rigid-body
transformations, shape articulation ...
We propose an algorithm for guaranteed nonobtuse remeshing and nonobtuse
mesh decimation. Our strategy for the remeshing problem is to first
convert an input mesh, using a modified Marching Cubes algorithm, into a
rough approximate mesh that is guaranteed to be nonobtuse. We then apply
iterative "deform-to-fit" ...
We present an efficient silhouette extractor for triangle meshes under
perspective projection in the Hough space. The more favorable point
distribution in Hough space allows us to obtain significant performance
gains over the traditional dual-space based techniques ...
We present a spectral approach for robust shape retrieval from databases
containing articulated 3D shapes. We show absolute
improvement in retrieval performance when conventional shape descriptors are
used in the spectral domain on the McGill database of articulated 3D
shapes. We also propose a simple eigenvalue-based descriptor ...
In this paper, we treat optimal mesh layout generation as a problem of
preserving graph distances and propose to use the subdominant eigenvector
of a kernel (affinity) matrix for sequencing ...
We apply Nystrom method, a sub-sampling and reconstruction technique, to
speed up spectral mesh processing. We first relate this method to Kernel
Principal Component Analysis (KPCA). This enables us to derive a novel
measure in the form of a matrix trace, based soly on sampled data, to
quantify the quality of Nystrom approximation ...
We present an algorithm for finding a meaningful correspondence between
two 3D shapes given as triangle meshes. Our algorithm operates on
embeddings of the two shapes in the spectral domain so as to normalize
them with respect to uniform scaling, rigid-body transformation and shape
bending ...
We present a novel approach for discretely optimizing contours on the
surface of a triangle mesh. This is achieved through the use of a minimum
ratio cycle (MRC) algorithm, where we compute a contour having the minimal
ratio between a novel contour energy term and the length of the
contour ...
Facial expression, which changes face geometry, usually has an adverse
effect on the performance of a face recognition system. On the other
hand, face geometry is a useful cue for recognition. Taking these into
account, we utilize the idea of separating geometry and texture
information in a face image ...
SIGGRAPH/TOG: 58; ICCV/CVPR/ECCV/NeurIPS: 17; SGP: 7; Eurographics: 8; EGSTAR: 4; CGF: 19.
2026 and arXiv
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005 -