Name	Name	Last commit message	Last commit date
Latest commit History 329 Commits
LICENSE	LICENSE
README.md	README.md

Awesome Bird's Eye View Perception

This is a repository for Bird's Eye View Perception, including 3D object detection, segmentation, online-mapping and occupancy prediction.

News

- 2023年05月09日: An initial version of recent papers or projects.
- 2023年05月12日: Adding paper for 3D object detection.
- 2023年05月14日: Adding paper for BEV segmentation, HD-map construction, Occupancy prediction and motion planning.

Papers

Survey
3D Object Detection
BEV Segmentation
Tracking
Perception Prediction Planning
- Monocular
- Multiple Camera
Mapping
LaneGraph
- Monocular
Locate
Occupancy Prediction
- Occupancy Challenge
Challenge
Dataset
World Model
Other

Survey

Vision-Centric BEV Perception: A Survey (Arxiv 2022)[Paper] [Github]
Delving into the Devils of Bird’s-eye-viewPerception: A Review, Evaluation and Recipe (Arxiv 2022) [Paper] [Github]

3D Object Detection

Radar Lidar

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection System (Arxiv 2023) [Paper]
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D DynamicObject Detection (CVPR 2023) [paper] [Github]
MaskBEV: Joint Object Detection and Footprint Completion for Bird’s-eye View 3D Point Clouds (IORS 2023) [Paper] [Github]
LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion (Arxiv 2023) [Paper]
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection (Arxiv 2024) [Paper] [Github]

Radar Camera

CRAFT: Camera-Radar 3D Object Detectionwith Spatio-Contextual Fusion Transformer (Arxiv 2022) [Paper]
RadSegNet: A Reliable Approach to Radar Camera Fusion (Arxiv 2022) [paper]
Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection (IEEE TIV 2023) [Paper]
CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception (ICLRW 2023) [Paper]
RC-BEVFusion: A Plug-In Module for Radar-CameraBird’s Eye View Feature Fusion (Arxiv 2023) [Paper]
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection (CVPR 2024) [Paper] [Github]
UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection (Arxiv 2024) [paper]
SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection (Arxiv 2024) [Paper] [Github]
RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection (AAAI 2025 2024) [Paper] [Github]

Lidar Camera

Semantic bevfusion: rethink lidar-camera fusion in unified bird’s-eye view representation for 3d object detection (Arxiv 2022) [Paper]
Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection (Arxiv 2023) [Paper] [Github]
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection (CVPR 2023) [paper] [Github]
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration (Arxiv 2023) [Paper]
Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection (Arxiv 2023) [paper]
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection (ICCV 2023) [Paper] [Github]
3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion (Arxiv 2023) [Paper]
FUSIONVIT: HIERARCHICAL 3D OBJECT DETECTION VIA LIDAR-CAMERA VISION TRANSFORMER FUSION (Arxiv 2023) [paper]
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers (Arxiv 2023) [Paper]
PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection (Arxiv 2024) [Paper]
Learned Multimodal Compression for Autonomous Driving (IEEE MMSP 2024) [Paper]
Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement (Arxiv 2024) [Paper]
SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection (Arxiv 2024) [Paper]
Timealign: A Multi-Modal Object Detection Method For Time Misalignment Fusing In Autonomous Driving (Arxiv 2024) [paper]
Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection (ICRA 2025) [Paper]

Lidar

MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection (AAAI 2023)[paper] [Github]
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection (Arxiv 2023) [Paper]
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection (Arxiv 2023) [Paper]
SEED: A Simple and Effective 3D DETR in Point Clouds (ECCV 2024) [Paper] [Github]

Monocular

Learning 2D to 3D Lifting for Object Detection in 3Dfor Autonomous Vehicles (IROS 2019) [Paper] [Project Page
Orthographic Feature Transform for Monocular 3D Object Detection (BMVC 2019) [Paper] [Github]
BEV-MODNet: Monocular Camera-based Bird's Eye View Moving Object Detection for Autonomous Driving (ITSC 2021) [Paper] [Project Page]
Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021) [Paper] [Github]
PersDet: Monocular 3D Detection in Perspective Bird’s-Eye-View (Arxiv 2022) [Paper]
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving (CVPR 2022) [Paper]
Monocular 3D Object Detection with Depth from Motion (ECCV 2022) [paper] [Github]
MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection (ICCV 2023) [Paper] [Github]
S3-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings (Arxiv 2023) [Paper]
YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection (Arxiv 2023) [Paper]
UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving (Arxuv 2024) [paper] [Github]
UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method (Arxiv 2024) [Paper]
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors (Arxiv 2024) [Paper]

Multiple Camera

Object DGCNN: 3D Object Detection using Dynamic Graphs (NIPS 2021) [Paper] [Github]
BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View (Arxiv 2022) [Paper] [Github]
DETR3D:3D Object Detection from Multi-view Image via 3D-to-2D Queries (CORL 2021) [Paper] [Github]
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework (NeurIPS 2022) [Paper] [Github]
Unifying Voxel-based Representation withTransformer for 3D Object Detectio (NeurIPS 2022) [paper] [Github]
Polar Parametrization for Vision-based Surround-View 3D Detection (arxiv 2022) [Paper] [Github]
SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving (Arxiv 2022) [Paper] [Github]
BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxuv 2022) [Paper] [Github]
BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stere (Arxiv 2022) [Paper] [Github]
MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones (Arxiv 2022) [Paper] [Github]
Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object (Arxiv 2022)[Paper]
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention (Arxiv 2022) [Paper]
Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detectio (IROS 2023) [Paper]
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks (Arxiv 2022) [Paper]
STS: Surround-view Temporal Stereo for Multi-view 3D Detection (Arxiv 2022) [Paper]
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection (Arxiv 2022) [Paper]
Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
AutoAlign: Pixel-Instance Feature Aggregationfor Multi-Modal 3D Object Detection (IJCAI 2022) [Paper]
Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection (ACM MM 2022) [paper] [Github]
ORA3D: Overlap Region Aware Multi-view 3D Object Detection (BMVC 2022) [Paper] [Project Page]
AutoAlignV2: Deformable Feature Aggregation for DynamicMulti-Modal 3D Object Detection (ECCV 2022) [Paper] [Github]
CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022) [paper] [Github]
SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022) [Paper][Github]
Position Embedding Transformation for Multi-View 3D Object Detection (ECCV 2022) [Paper] [Github]
BEVDepth: Acquisition of Reliable Depth forMulti-view 3D Object Detection (AAAI 2023) [Paper] [Github]
PolarFormer: Multi-camera 3D Object Detectionwith Polar Transformers (AAAI 2023) [Paper] [Github]
A Simple Baseline for Multi-Camera 3D Object Detection (AAAI 2023) [Paper] [Github]
Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection (Arxiv 2023) [Paper] [Github]
Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion (Arxiv 2023) [Paper] [Github]
BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection (Arxiv 2023) [Paper] [Github]
BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo (Arxiv 2023) [Paper]
BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap (Arxiv 2023) [Paper] [Github]
DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking (Arxiv 2023) [Paper] [Github]
Geometric-aware Pretraining for Vision-centric 3D Object Detection (Arxiv 2023) [Paper] [Github]
Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception (Arxiv 2023) [Paper]
OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection (Arxiv 2023) [Paper]
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (ICCV 2023) [Paper] [Github]
VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection (Arxiv 2023) [Paper]
Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability (Arxiv 2023) [Paper]
VoxelFormer: Bird’s-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection (Arxiv 2023) [Paper] [Github]
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (Arxiv 2023) [Paper] [Github]
CrossDTR: Cross-view and Depth-guided Transformersfor 3D Object Detection (ICRA 2023) [Paper] [Github]
SOLOFusion: Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection (ICLR 2023) [paper] [Github]
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection (ICLR 2023) [Paper] [Github]
UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View (CVPR 2023)[Paper] [Github]
Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving (CVPR 2023) [Paper]
Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection (CVPR 2023) [Paper] [Github]
Aedet: Azimuth-invariant multi-view 3d object detection (CVPR 2023) [Paper] [Github] [Project]
BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection (CVPR 2023) [Paper] [Github]
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR 2023) [Paper] [Github]
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection (CVPR 2023) [Paper] [Github]
Sparse4D v2 Recurrent Temporal Fusion with Sparse Model (Arxiv 2023) [Paper] [Github]
DA-BEV : Depth Aware BEV Transformer for 3D Object Detection (Arxiv 2023) [Paper]
BEV-IO: Enhancing Bird’s-Eye-View 3D Detectionwith Instance Occupancy (Arxiv 2023) [Paper]
OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection (Arxiv) [Paper]
SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection (ICCV 2023) [Paper] [Github]
Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images (Arxiv 2023) [paper]
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting (Arxiv 2023) [Paper]
Far3D: Expanding the Horizon for Surround-view 3D Object Detection (Arxiv 2023) [Paper]
HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird’s Eye View (Arxiv 2023) [paper]
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github]
3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers (ICCV 2023) [Paper] [Github] [Github]
FB-BEV: BEV Representation from Forward-Backward View Transformations (ICCV 2023) [paper] [Github]
QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection (ICCV 2023) [Paper]
SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos (ICCV 2023) [Paper] [Github]
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection (ICCV 2023) [paper] [Github]
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023) [paper]
BEVHeight++: Toward Robust Visual Centric 3D Object Detection (Arxiv 2023) [paper]
UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities (Arxiv 2023) [Paper]
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving (Arxiv 2023) [Paper]
Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github] [Project]
CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion (Arxiv 2023) [paper]
DynamicBEV: Leveraging Dynamic Queries and Temporal Context for 3D Object Detection (Arxiv 2023) [paper]
TOWARDS GENERALIZABLE MULTI-CAMERA 3D OBJECT DETECTION VIA PERSPECTIVE DEBIASING (Arxiv 2023) [Paper]
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection (NeurIPS 2023) (Arxiv 2023) [Paper] [Github]
M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D Object (Arxiv 2023) [Paper]
Sparse4D v3 Advancing End-to-End 3D Detection and Tracking (Arxiv 2023) [Paper] [Github]
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection (Arxiv 2023) [paper]
Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [Paper]
Residual Graph Convolutional Network for Bird"s-Eye-View Semantic Segmentation (Arxiv 2023) [Paper]
Diffusion-Based Particle-DETR for BEV Perception (Arxiv 2023) [paper]
M-BEV: Masked BEV Perception for Robust Autonomous Driving (Arxiv 2023) [Paper]
Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps (Arxiv 2023) [Paper]
Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
WidthFormer: Toward Efficient Transformer-based BEV View Transformation (Arxiv 2023) [Paper] [Github]
UniVision: A Unified Framework for Vision-Centric 3D Perception (Arxiv 2024) [Paper]
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception (Arxiv 2024) [Paper]
Towards Scenario Generalization for Vision-based Roadside 3D Object Detection (Arxiv 2024) [Paper] [Github]
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow (CVPR 2024) [Paper]
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection (Arxiv 2024) [paper]
Lifting Multi-View Detection and Tracking to the Bird's Eye View (Arxiv 2024) [paper] [Github]
DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection (Arxiv 2024) [Paper]
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection (CVPR 2024) [Paper] [Github]
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection (ECCV 2024) [Paper] [Github]
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection (ECCV 2024) [Paper]
PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View (Arxiv 2024) [Paper]
GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection (Arxiv 2024) [Paper]
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression (ECCV 2024) [Paper] [Github]
MambaBEV: An efficient 3D detection model with Mamba2 (Arxiv 2024) [Paper]
ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object (Arxiv 2024) [Paper]
Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting (Arxiv 2024) [paper]
HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection (Arxiv 2024) [paper]
TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning Distillation (Arxiv 2024) [paper] [Github]
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation (Arxiv 2025) [Paper]

BEV Segmentation

Lidar Camera

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Axxiv 2023) [Paper] [Github]
X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation (WACV 2023) [Paper]
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation (ICRA 2023) [Paper] [Github] [Project] UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving (Arxiv 2023) [Paper]
BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation (Arxiv 2023) [paper]
Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (Arxiv 2023) [paper]
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation (CVPR 2023) [Paper] [Github]
BEV-Guided Multi-Modality Fusion for Driving Perception (CVPR 2023) [Paper] [Github]
FUSIONFORMER: A MULTI-SENSORY FUSION IN BIRD’S-EYE-VIEW AND TEMPORAL CONSISTENT TRANSFORMER FOR 3D OBJECTION (Arxiv 2023) [paper]
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation (ICCV 2023) [Paper] [Github]
BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird’s Eye View Map Construction (Arxiv 2023) [Paper]
BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation (Arxiv 2024) [paper]
OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation (Arxiv 2024) [Paper]
BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment (IROS 2024) [Paper] [Project]
PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation (AAAI 2025) [paper] [Paper]

Lidar

LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network (Arxiv 2022) [paper]
SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation (Arxiv 2023) [Paper]
BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds (3DV 2023) [Paper] [Github]

Monocular

Learning to Look around Objects for Top-View Representations of Outdoor Scenes (ECCV 2018) [paper]
A Parametric Top-View Representation of Complex Road Scenes (CVPR 2019) [Paper]
Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks (ICRA 2019 IEEE RA-L 2019) [Paper] [Github]
Short-Term Prediction and Multi-Camera Fusion on Semantic Grids (ICCVW 2019) [paper]
Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks (CVPR 2020) [Paper] [Github]
MonoLayout : Amodal scene layout from a single image (WACV 2020) [Paper] [Github]
Bird’s Eye View Segmentation Using Lifted2D Semantic Features (BMVC 2021) [Paper]
Enabling Spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation (ICRA 2021) [Paper] [mp4]
Projecting Your View Attentively: Monocular Road Scene Layout Estimation viaCross-view Transformation (CVPR 2021) [Paper] [Github]
ViT BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation (IEEE IJCNN 2022) [paper]
Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images (IEEE RA-L 2022) [Paper] [Github] [Project]
Understanding Bird's-Eye View of Road Semantics using an Onboard Camera (ICRA 2022) [Paper] [Github]
"The Pedestrian next to the Lamppost"Adaptive Object Graphs for Better Instantaneous Mapping (CVPR 2022) [Paper]
Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts (CVPR 2022) [Paper]
Translating Images into Maps (ICRA 2022) [Paper] [Github]
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation (ECCV 2022) [Paper]
SBEVNet: End-to-End Deep Stereo Layout Estimation (WACV 2022) [Paper]
BEVSegFormer: Bird’s Eye View Semantic Segmentation From ArbitraryCamera Rigs (WACV 2023) [Paper]
DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception (Arxiv 2023) [Paper] [Github]
HFT: Lifting Perspective Representations via Hybrid Feature Transformation (ICRA 2023) [Paper] [Github]
SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images (Arxiv 2023) [Paper]
Calibration-free BEV Representation for Infrastructure Perception (Arxiv 2023) [Paper]
Semi-Supervised Learning for Visual Bird’s Eye View Semantic Segmentation (Arxiv 2023) [Paper]
DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEVPerception (Arxiv 2023) [paper] [github] [Project]
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity (Arxiv 2023) [Paper]
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects (CVPR 2024) [Paper] [Github]
DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning (Arxiv 2024) [Paper] [Github]
Improved Single Camera BEV Perception Using Multi-Camera Training (ITSC 2024) [Paper]
Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation (Arxiv 2024) [Paper]
Geo-ConvGRU: Geographically Masked Convolutional Gated Recurrent Unit for Bird-Eye View Segmentation (Arxiv 2024) [Paper]

Multiple Camera

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (IEEE ITSC 2020)[Paper] [Github]
Cross-view Semantic Segmentation for Sensing Surroundings (IROS 2020 IEEE RA-L 2020) [Paper] [Github] [Project]
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020) [Paper] [Github] [Project]
Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022) [Paper] [Github]
Scene Representation in Bird’s-Eye View from Surrounding Cameras withTransformers (CVPRW 2022) [Paper]
M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation (Arxiv 2022) [Paper] [Project]
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv 2022) [Paper] [Github]
Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer (Arxiv 2022) [Paper] [Github]
A Simple Baseline for BEV Perception Without LiDAR (Arxiv 2022) [Paper] [Github] [Project Page]
UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View (ICCV 2023) [Paper] [Github
LaRa: Latents and Rays for Multi-CameraBird’s-Eye-View Semantic Segmentation (CORL 2022) [Paper]) [Github]
CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers (CORL 2022) [Paper] [Github]
Vision-based Uneven BEV Representation Learningwith Polar Rasterization and Surface Estimation (CORL 2022) [Paper] [Github]
BEVFormer: a Cutting-edge Baseline for Camera-based Detection (ECCV 2022) [Paper] [Github]
JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes (ECCV 2022) [Paper] [Github]
Learning Ego 3D Representation as Ray Tracing (ECCV 2022) [Paper] [Github]
Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception (NIPS 2022 Workshop) [Paper] or [Paper] [Github]
Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (Arxiv 2023) [Paper] [Github]
BEVFormer v2: Adapting Modern Image Backbones toBird’s-Eye-View Recognition via Perspective Supervision (CVPR 2023) [Paper]
MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (CVPR 2023) [Paper]
Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving (Arxiv 2023) [paper] [Github]
MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception (ICCV 2023) [Paper] [Github]
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation (ICCV 2023) [paper] [Github] [Project]
One Training for Multiple Deployments: Polar-based Adaptive BEV Perception for Autonomous Driving (Arxiv 2023) [Paper]
RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions (Arxiv 2023) [paper] [Github] [Project]
X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation (Arxiv 2023) [Paper]
PowerBEV: A Powerful Yet Lightweight Framework forInstance Prediction in Bird’s-Eye View (Axriv 2023) [paper]
Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s-Eye View (ICCV 2023) [Paper]
Towards Viewpoint Robustness in Bird’s Eye View Segmentation (ICCV 2023) [Paper] [Project]
PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View (Arxiv 2023) [Paper]
PointBeV: A Sparse Approach to BeV Predictions (Arxiv 2023) [paper] [Github]
DualBEV: CNN is All You Need in View Transformation (Arxiv 2024) [Paper]
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning (Arxiv 2024) [paper]
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (Arxiv 2024) [Paper] [Github]
Improving Bird's Eye View Semantic Segmentation by Task Decomposition (CVPR 2024) [Paper] [Github]
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (CVPR 2024) [Paper] [Github]
RoadBEV: Road Surface Reconstruction in Bird's Eye View (Arxiv 2024) [Paper] [Github]
TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation (Arxiv 2024) [Paper]
DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model (Arxiv 2024) [Paper]
Bird's-Eye View to Street-View: A Survey (Arxiv 2024) [Paper]
LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping (Arxiv 2024) [Paper]
Navigation Instruction Generation with BEV Perception and Large Language Models (ECCV 2024) [paper] [Github]
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation (Arxiv 2024) [Paper]
MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation (ACM MM 2024) [paper]
Robust Bird’s Eye View Segmentation by Adapting DINOv2 (ECCV 2024 Workshop) [Paper]
Unveiling the Black Box: Independent Functional Module Evaluation for Bird’s-Eye-View Perception Model (Arxiv 2024) [Paper]
RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View (Arxiv 2024) [Paper]
OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping (ACCV 2024) [Paper] [Github]
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving (NeurIPS 2024) [Paper] [Github]
Fast and Efficient Transformer-based Method for Bird’s Eye View Instance Prediction (IEEE ITSC 2024) [Paper] [Github]
Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation (WACV 2025) [paper]
Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 (Arxiv 2025) [Paper]
SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation (Arxiv 2025) [Paper]
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance (Arxiv 2025) [Paper]
Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving (Arxiv 2025) [Paper]
TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping (Arxiv 2025) [Paper] [Github]
BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation (Arxiv 2025) [[Paper[[(BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation)
HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors (Arxiv 2025) [Paper]
MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations (ICLR 2025) [Paper]

Perception Prediction Planning

Monocular

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning (WACV 2021) [Paper]
HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow Prediction (CVPRW 2022) [paper]

Multiple Camera

FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV 2021) [Paper] [Github] [Project]
NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021) [Paper] [Github]
ST-P3: End-to-end Vision-based AutonomousDriving via Spatial-Temporal Feature Learning (ECCV 2022) [Paper] [Github]
StretchBEV: Stretching Future InstancePrediction Spatially and Temporally (ECCV 2022) [Paper] [Github] [Projet]
TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving (CVPR 2023) [Paper] [Github]
Planning-oriented Autonomous Driving (CVPR 2023, Occupancy Prediction) [paper] [Github] [Project]
Think Twice before Driving:Towards Scalable Decoders for End-to-End Autonomous Driving (CVPR 2023) [Paper] [Github]
ReasonNet: End-to-End Driving with Temporal and Global Reasoning (CVPR 2023) [Paper]
LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving (Arxiv 2023) [paper]
FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving (Arxiv 2023) [Paper]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning (Arxiv 2024) [Paper] [Project]
SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving (Arxiv 2024) [Paper]
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation (Arxiv 2024) [paper] [Github]
DUALAD: Disentangling the Dynamic and Static World for End-to-End Driving (CVPR 2024) [Paper]
Solving Motion Planning Tasks with a Scalable Generative Model (ECCV 2024) [Paper] [Github]

Mapping

Lidar

Hierarchical Recurrent Attention Networks for Structured Online Map (CVPR 2018) [Paper]

Lidar Camera

End-to-End Deep Structured Models for Drawing Crosswalks (ECCV 2018) [Paper]
Probabilistic Semantic Mapping for Urban Autonomous Driving Applications (IROS 2020) [Paper] [Github]
Convolutional Recurrent Network for Road Boundary Extraction (CVPR 2022) [Paper]
Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
M^2-3DLaneNet: Multi-Modal 3D Lane Detection (Arxiv 2022) [paper] [Github]
HDMapNet: An Online HD Map Construction and Evaluation Framework (ICRA 2022) [paper] [Github] [Project]
SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (Arxiv 2023) [paper] [Github]
VMA: Divide-and-Conquer Vectorized MapAnnotation System for Large-Scale Driving Scene (Arxiv 2023) [Paper]
THMA: Tencent HD Map AI System for Creating HD Map Annotations (AAAI 2023) [paper]

Monocular

RoadTracer: Automatic Extraction of Road Networks from Aerial Images (CVPR 2018) [Paper] [Github]
DAGMapper: Learning to Map by Discovering Lane Topology (ICCV 2019) [paper]
End-to-end Lane Detection through Differentiable Least-Squares Fitting (ICCVW 2019) [paper]
VecRoad: Point-based Iterative Graph Exploration for Road Graphs Extraction (CVPR 2020) [Paper] [Github] [Project]
Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [paper] [Github]
iCurb: Imitation Learning-based Detection of Road Curbs using Aerial Images for Autonomous Driving (ICRA 2021 IEEE RA-L) [paper] [Github] [Project]
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps (CVPR 2021) [paper]
Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV 2021) [Paper] [Github]
RNGDet: Road Network Graph Detection by Transformer in Aerial Images (IEEE TGRS 2022) [[Paper] [Project]
RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement (IEEE RA-L 2022) [Paper] [Github] [Project]
SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving (ICRA 2022) [paper] [Github]
Laneformer: Object-aware Row-Column Transformers for Lane Detection (AAAI 2022) [Paper]
Lane-Level Street Map Extraction from Aerial Imagery (WACV 2022) [Paper] [Github]
Reconstruct from Top View: A 3D Lane Detection Approach based on GeometryStructure Prior (CVPRW 2022) [paper]
PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (CVPR 2022) [Paper] [Github]
Topology Preserving Local Road Network Estimation from Single Onboard Camera Image (CVPR 2022) [Paper] [Github]
TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction (ECCV 2022) [Paper]
CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D (IEEE IVS 2023) [Paper]
Polygonizer: An auto-regressive building delineator (ICLRW 2023) [Paper]
CurveFormer: 3D Lane Detection by Curve Propagation with CurveQueries and Attention (ICRA 2023) [Paper]
Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection (CVPR 2023) [paper] [Github]
Learning and Aggregating Lane Graphs for Urban Automated Driving (Arxiv 2023) [paper]
Online Lane Graph Extraction from Onboard Video (Arxiv 2023) [paper] [Github]
Video Killed the HD-Map: Predicting Driving BehaviorDirectly From Drone Images (Arxiv 2023) [Paper]
Prior Based Online Lane Graph Extraction from Single Onboard Camera Image (Arxiv 2023) [Paper]
Online Monocular Lane Mapping Using Catmull-Rom Spline (Arxiv 2023) [Paper] [Github]
Improving Online Lane Graph Extraction by Object-Lane Clustering (ICCV 2023) [Paper]
LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
Patched Line Segment Learning for Vector Road Mapping (Arxiv 2023) [paper]
Sparse Point Guided 3D Lane Detection (ICCV 2023) [Paper] [Github]
Recursive Video Lane Detection (ICCV 2023) [Paper] [Github]
LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
Occlusion-Aware 2D and 3D Centerline Detection for Urban Driving via Automatic Label Generation (ARXIV 2023) [PAPER]
BUILDING LANE-LEVEL MAPS FROM AERIAL IMAGES (Arxiv 2023) [paper]
LaneCPP: Continuous 3D Lane Detection using Physical Priors (CVPR 2024) [Paper]
DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles (Arxiv 2024) [Paper] [Github]

Multiple Camera

PersFormer: a New Baseline for 3D Laneline Detection (ECCV 2022) [Paper] [Github]
Continuity-preserving Path-wise Modeling for Online Lane Graph Construction (Arxiv 2023) [paper] [Github]
VAD: Vectorized Scene Representation for Efficient Autonomous Driving (Arxiv 2023) [paper] [Github]
InstaGraM: Instance-level Graph Modelingfor Vectorized HD Map Learning (Arxiv 2023) [Paper]
VectorMapNet: End-to-end Vectorized HD Map Learning (Arxiv 2023) [Paper] [Github] [Project]
Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving (Arxiv 2023) [Paper] [Github]
Topology Reasoning for Driving Scenes (Arxiv 2023) [paper] [Github]
MV-Map: Offboard HD-Map Generation with Multi-view Consistency (Arxiv 2023) [paper] [Github]
CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation (ICRA 2023) [paper] [Github]
Structured Modeling and Learning for Online Vectorized HD Map Construction (ICLR 2023) [paper] [Github]
Neural Map Prior for Autonomous Driving (CVPR 2023) [Paper]
An Efficient Transformer for Simultaneous Learning of BEV and LaneRepresentations in 3D Lane Detection (Arxiv 2023) [paper]
TopoMask: Instance-Mask-Based Formulation for the Road Topology Problemvia Transformer-Based Architecture (Arxiv 2023) [apper]
PolyDiffuse: Polygonal Shape Reconstruction viaGuided Set Diffusion Models (Arxiv 2023) [paper] [Github] [Project]
Online Map Vectorization for Autonomous Driving: A Rasterization Perspective (Arxiv 2023) [Paper]
NeMO: Neural Map Growing System forSpatiotemporal Fusion in Bird’s-Eye-Viewand BDD-Map Benchmark (Arxiv 2023) [Paper]
MachMap: End-to-End Vectorized Solution for Compact HD-Map Construction (CVPR 2023 Workshop) [Paper]
Lane Graph as Path: Continuity-preserving Path-wise Modelingfor Online Lane Graph Construction (Arxiv 2023) [paper]
End-to-End Vectorized HD-map Construction with Piecewise B ́ezier Curve (CVPR 2023) [Paper] [Github]
GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping (Arxiv 2023) [Paper]
MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction (Arxiv 2023) [Paper]
LATR: 3D Lane Detection from Monocular Images with Transformer (Arxiv 2023) [Paper]
INSIGHTMAPPER: A CLOSER LOOK AT INNER-INSTANCE INFORMATION FOR VECTORIZED HIGH-DEFINITION MAPPING (Arxiv 2023) [Paper] [Project] [Github]
HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization (Arxiv 2023) [Paper]
StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction (WACV 2024) [Paper] [Github]
PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction (ICCV 2023) [Paper]
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach (ICCV 2023) [paper]
TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning (Arxiv 2023) [paper] [Github]
ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction (CoRL 2023) [Paper] [Github]
Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data (Arxiv 2023) [Paper]
Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps (Arxiv 2023) [Paper] [Github]
P-MAPNET: FAR-SEEING MAP CONSTRUCTOR ENHANCED BY BOTH SDMAP AND HDMAP PRIORS (ICLR 2024 submitted paper) [Openreview] [Paper]
Online Vectorized HD Map Construction using Geometry (Arxiv 2023) [paper] [Github]
LANESEGNET: MAP LEARNING WITH LANE SEGMENT PERCEPTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching (Arxiv 2024) [Paper
MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
Stream Query Denoising for Vectorized HD Map Construction (Arxiv 2024) [Paper]
ADMap: Anti-disturbance framework for reconstructing online vectorized HD map (Arxiv 2024) [Paper]
PLCNet: Patch-wise Lane Correction Network for Automatic Lane Correction in High-definition Maps (Arxiv 2024) [Paper]
LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement (AAAI 2024) [paper]
VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [Paper]
CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention (Arxiv 2024) [Paper]
VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [paper]
Lane2Seq: Towards Unified Lane Detection via Sequence Generation (CVPR 2024) [Paper]
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction (Arxiv 2024) [Paper] [Github]
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping (Arxiv 2024) [paper] [Github]
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction (CVPR 2024) [Paper] [Github]
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction (CVPR 2024) [Paper] [Github]
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction (CVPR 2024) [Paper]
SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations (Arxiv 2024) [Paper]
DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction (Arxiv 2024) [Paper]
Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction (Arxiv 2024) [Paper]
Is Your HD Map Constructor Reliable under Sensor Corruptions? (Arxiv 2024) [Paper] [Github] [Project]
DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation(KDD 2024)[Paper]
LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction (Arxiv 2024) [Paper]
Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention (ECCV 2024) [Paper] [Github]
BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight (Arxiv 2024) [Paper]
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data (Arxiv 2024) [Paper]
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation (ECCV 2024) [Paper]
Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks (Arxiv 2024) [Paper] [Github]
Generation of Training Data from HD Maps in the Lanelet2 Framework (Arxiv 2024) [Paper]
PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction (Arxiv 2024) [paper] [Github]
CAMAv2: A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2024) [Paper]
HeightLane: BEV Heightmap guided 3D Lane Detection (Arxiv 2024) [paper]
PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors (Arxiv 2024) [Paper]
Local map Construction Methods with SD map: A Novel Survey (Arxiv 2024) [Paper]
Enhancing Vectorized Map Perception with Historical Rasterized Maps (ECCV 2024) [Paper] [Github]
GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction (Arxiv 2024) [Paper] [Github]
GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction (Arxiv 2024) [[paper]] (https://arxiv.org/abs/2409.10063)
MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction (Arxiv 2024) [paper]
Exploring Semi-Supervised Learning for Online Mapping (Arxiv 2024) [Paper]
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction (Arxiv 2024) [Paper] [Github] [Project]
HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning (WACV 2025) [Paper] [Github]
M3TR: Generalist HD Map Construction with Variable Map Priors (Arxiv 2024) [Paper] [Github]
TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior (Arxiv 2024) [Paper]
ImagineMap: Enhanced HD Map Construction with SD Maps (Arxiv 2024) [paper]
Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression (TPAMI 2025) [paper] [Github]
LDMapNet-U: An End-to-End System for City-Scale Lane-Level Map Updating (KDD 2025) [Paper]
MapGS: Generalizable Pretraining and Data Augmentation for Online Mapping via Novel View Synthesis (Arxiv 2025) [paper]
Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning (Arxiv 2025) [Paper]
Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation (Arxiv 2025) [Paper]
FastMap: Fast Queries Initialization Based Vectorized HD Map Reconstruction Framework (Arxiv 2025) [Paper] [Github]
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction (Arxiv 2025) [Paper]
HisTrackMap: Global Vectorized High-Definition Map Construction via History Map Tracking (Arxiv 2025) [Paper]
AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction (Arxiv 2025) [Paper]

Lanegraph

Monocular

Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
AutoGraph: Predicting Lane Graphs from Traffic Observations (IEEE RAL 2023) [Paper]
Learning and Aggregating Lane Graphs for Urban Automated Driving (CVPR 2023) [Paper]
TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes (Arxiv 2024) [Paper]
Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors (Arxiv 2024) [Paper]
Learning Lane Graphs from Aerial Imagery Using Transformers (Arxiv 2024) [Paper]
TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem (Arxiv 2024) [Paper]
LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations (ITSC 2024) [Paper]
Behavioral Topology (BeTop), a multi-agent behavior formulation for interactive motion prediction and planning (NeurIPS 2024) [Paper] [Github]
SMART: Advancing Scalable Map Priors for Driving Topology Reasoning (Arxiv 2025) [Paper] [Project]

Tracking

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer (Arxiv 2022) [Paper] [Github]
EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View (Arxiv 2023) [paper] [Github]
Traj-MAE: Masked Autoencoders for Trajectory Prediction (Arxiv 2023) [Paper]
Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes (Arxiv 2024) [Paper]
MapsTP: HD Map Images Based Multimodal Trajectory Prediction for Automated Vehicles (Arixv 2024) [Paper]
Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures (Arxiv 2024) [Paper]
Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving (Arxiv 2024) [Paper]
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions (Arxiv 2024) [Paper]

Locate

BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images (Arxiv 2022) [paper]
BEV-SLAM: Building a Globally-Consistent WorldMap Using Monocular Vision (IROS 2022) [Paper]
U-BEV: Height-aware Bird’s-Eye-View Segmentation and Neural Map-based Relocalization (Arxiv 2023) [Paper]
Monocular Localization with Semantics Map for Autonomous Vehicles (Arxiv 2024) [Paper]

Occupancy Prediction

Semantic Scene Completion from a Single Depth Image (CVPR 2017) [Paper]
Occupancy Networks: Learning 3D Reconstruction in Function Space (CVPR 2019) [Paper] [Github]
S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds (CoRL 2020) [Paper]
3D Semantic Scene Completion: a Survey (IJCV 2021) [Paper]
Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data (Arxiv 2021) [Paper]
Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (AAAI 2021) [Paper]
Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Paper]
Estimation of Appearance and Occupancy Information in Bird’s EyeView from Surround Monocular Images (Arxiv 2022) [paper] [Project]
Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds (IROS 2021) [Paper] [Github]
Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review (Arxiv 2023) [paper]
LMSCNet: Lightweight Multiscale 3D Semantic Completion (IC 3DV 2020) [Paper] [[Github]
MonoScene: Monocular 3D Semantic Scene Completion (CVPR 2022) [Paper] [Github] [Project]
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction (ICCV 2023) [Paper] [Github]
A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network (Arxiv 2023) [Paper] [Github]
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception (Arxiv 2023) [paper] [Github]
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving (Arxiv 2023) [Paper] [Github] [Project]
Occ-BEV: Multi-Camera Unified Pre-training via 3DScene Reconstruction (Arxiv 2023) [Paper] [Github]
StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion (Arxiv 2023) [paper] [Github]
Learning Occupancy for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
OVO: Open-Vocabulary Occupancy (Arxiv 2023) [Paper]
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
Scene as Occupancy (Arxiv 2023) [[Paper]]](https://arxiv.org/pdf/2306.02851.pdf) [Github]
Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data (Arxiv 2023) [Paper] [Github]
PanoOcc: Unified Occupancy Representation for Camera-based3D Panoptic Segmentation (Arxiv 2023) [Paper] [Github]
UniOcc: Unifying Vision-Centric 3D Occupancy Predictionwith Geometric and Semantic Rendering (Arxiv 2023) [paper]
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving (NeurIPS 2023 D&B track) [paper] [paper]
StereoVoxelNet: Real-Time Obstacle Detection Based on OccupancyVoxels from a Stereo Camera Using Deep Neural Networks (ICRA 2023) [[Paper]] (https://arxiv.org/pdf/2209.08459.pdf) [Github] [Project]
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction (CVPR 2023) [Paper] [Github]
VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction (CVPR 2023) [paper] [Github]
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting (CVPR 2023) [Paper] [Github] [Project]
SSCBench: A Large-Scale 3D Semantic SceneCompletion Benchmark for Autonomous Driving (Arxiv 2023) [paper] [Github]
SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion (IROS 2023) [Paper] [Github]
CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion (Arxiv 2023) [paper]
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries (Arxiv 2023) [Paper] [Github]
Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders (Arxiv 2023) [paper]
UniWorld: Autonomous Driving Pre-training via World Models (Arxiv 2023) [Paper] [Github]
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction (Arxiv 2023) [paper] [Github]
SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection (Arxiv 2023) [paper] [Github]
OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection (Arxiv 2023) [Paper] [Github]
PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion (Arxiv 2023) [Paper]
SPOT: SCALABLE 3D PRE-TRAINING VIA OCCUPANCY PREDICTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper]
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space (Arxiv 2023) [Github]
Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Github] [Project]
RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision (Arxiv 2023) [paper] [Github]
LiDAR-based 4D Occupancy Completion and Forecasting (Arxiv 2023) [Paper] [Github]
SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints (Arxiv 2023) [Paper]
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction (Arxiv 2023) [Paper] [Github]
FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin (Arxiv 2023) [paper]
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications (Arxiv 2023) [paper] [Github]
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving (Arxiv 2023) [paper] [Github]
DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion (Arxiv 2023) [Paper]
A Simple Framework for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries (Arxiv 2023) [Paper]
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction (Arxiv 2023) [Paper]
OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields (Arxiv 2023) [paper] [Github]
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation (Arxiv 2023) [paper]
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Arxiv 2023) [paper] [Project] [Github]
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images (Arxiv 2024) [Paper] [Github]
S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper]
InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction (Arxiv 2024) [Paper]
V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication (Arxiv 2024) [Paper]
OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow (Arxiv 2024) [Paper]
OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction (Arxiv 2024) [Paper]
OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction (Arxiv 2024) [Paper]
FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View (ICRA 2024) [Paper]
OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [paper]
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (CVPR 2024) [Paper] [Github]
Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution (Arxiv 2024) [paper]
OccFiner: Offboard Occupancy Refinement with Hybrid Propagation (Arxiv 2024) [Paper]
MonoOcc: Digging into Monocular Semantic Occupancy Prediction (ICLR 2024) [Paper]
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
Urban Scene Diffusion through Semantic Occupancy Map (Arxiv 2024) [Paper]
Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction (CVPR 2024) [Paper]
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation (CVPR 2024) [paper] [Github]
OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks (Arxiv 2023) [Paper]
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving (Arxiv 2024) [paper]
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers (Arxiv 2024) [paper]
A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective (Arxiv 2024) [Paper]
Vision-based 3D occupancy prediction in autonomous driving: a review and outlook (Arxiv 2024) [Paper]
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision (Arxiv 2024) [Paper]
RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar (Arxiv 2024) [paper]
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github]
EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network (Arxiv 2024) [Paper] [Github]
PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving (3DV 2024) [paper]
UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Arxiv 2024) [paper]
Context and Geometry Aware Voxel Transformer for Semantic Scene Completion (Arxiv 2024) [Paper] [Github]
Occupancy as Set of Points (ECCV 2024) [Paper] [Github]
Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion (Arxiv 2024) [Paper]
Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction (Arxiv 2024) [Paper]
Monocular Occupancy Prediction for Scalable Indoor Scenes (ECCV 2024) [Paper] [Github]
LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering (Arxiv 2024) [Paper]
VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction (Arxiv 2024) [paper]
Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection (Arxiv 2024) [paper] [Github]
OccMamba: Semantic Occupancy Prediction with State Space Models (Arxiv 2024) [paper]
HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction (IEEE RAL 2024) [paper]
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance (Arxiv 2024) [Paper]
MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering (Arxiv 2024) [paper] [Project] [Github]
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting (Arxiv 2024) [paper] [Github]
AdaOcc: Adaptive-Resolution Occupancy Prediction (Arxiv 2024) [Paper]
Diffusion-Occ: 3D Point Cloud Completion via Occupancy Diffusion (Arxiv 2024) [Paper]
UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height (Arxiv 2024) [paper]
COCO-Occ: A Benchmark for Occluded Panoptic Segmentation and Image Understanding (Arxiv 2024) [Paper]
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction (ECCV 2024) [Paper] [Github]
ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning (Arxiv 2024) [Paper]
DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models (Arxiv 2024) [Paper]
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs (Arxiv 2024) [Paper]
OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity (Arxiv 2024) [Paper] [Github] [Project]
DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [Paper] [Github]
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects (Arxiv 2024) [Paper]
OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction (Arxiv 2024) [paper]
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting(Arxiv 2024) [Paper]
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration (Arxiv 2024) [paper] [Github] [Project]
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding (Arxiv 2024) [Paper] [Github]
PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper]
Fast Occupancy Network (Arxiv 2024) [Paper]
Lightweight Spatial Embedding for Vision-based 3D Occupancy Prediction (ARxiv 2024) [Paper]
doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation (Arxiv 2024) [paper] [Github]
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba (AAAI 2025) [paper]
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction (Arxiv 2024) [Paper] [Github]
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction (AAAI 2025) [Paper] [Github] [Github]
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation (Arxiv 2024) [paper]
ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder (AAAI 2025 Arxiv 2024) [paper] [Paper]
MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation (Arxiv 2024) [Paper]
Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion (AAAI 2025) [Paper]
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception (Arxiv 2025) [Paper]
MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies (Arxiv 2025) [Paper] [Github]
OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting (Arxiv 2025) [Paper]
MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring (Arxiv 2025) [paper]
OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving (ARxiv 2025) [Paper]
LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation (Arxiv 2025) [[Paper]]](https://arxiv.org/abs/2502.15438)
OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework (Arxiv 2025) [paper]
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow (Arxiv 2025) [Paper]
H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision (ICRA 2025) [Paper]
TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting (Arxiv 2025) [Paper]
OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction (Arxiv 2025) [Paper] [Github]
L2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR Model (ARxiv 2025) [Paper] [Project] [Github]
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation (CVPR 2025) [Paper]
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World (Arxiv 2025) [Paper]

Occupancy Challenge

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation (CVPR 2023 3D Occupancy Prediction Challenge WorkShop) [paper] [Github]
Separated RoadTopoFormer (Arxiv 2023) [Paper]
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios (CVPR 2023 WorkShop) [Paper] [Github]
AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction (CVPR 2024 Workshop) [Paper]
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement (Arxiv 2024) [Paper]

Challenge

The 1st-place Solution for CVPR 2023 OpenLane Topologyin Autonomous Driving Challenge [Paper]
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report (CVPR 2024 Challenge) [Paper]

Dataset

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark (CVPR 2023) [paper] [Github]
SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions (IV 2024) [Paper] [Project] [Github]
WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving (Arxiv 2024) [paper] [Project] [Github]
WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]

World Model

End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2024) [Paper] [Github]
Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving (ICRA 2024) [paper] [Github] [Project]
Language Prompt for Autonomous Driving (Arxiv 2023) [Paper] [Github]
MotionLM: Multi-Agent Motion Forecasting as Language Modeling (Arxiv 2023) [paper]
GAIA-1: A Generative World Model for Autonomous Driving (Arxiv 2023) [paper]
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving (Arxiv 2023) [paper]
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving (Arxiv 2023) [Paper] [Github]
Learning to Drive Anywhere (CORL 2023) [Paper]
Language-Conditioned Path Planning (Arxiv 2023) [paper]
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model (Arxiv 2023) [Paper] [Project]
GPT-Driver: Learning to Drive with GPT (Arxiv 2023) [Paper]
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving (Arxiv 2023) [paper]
TOWARDS END-TO-END EMBODIED DECISION MAKING VIA MULTI-MODAL LARGE LANGUAGE MODEL: EXPLORATIONS WITH GPT4-VISION AND BEYOND (Arxiv 2023) [Paper]
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model (Arxiv 2023) [Paper]
UNIPAD: A UNIVERSAL PRE-TRAINING PARADIGM FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm (Arxiv 2023) [Paper]
Uni3D: Exploring Unified 3D Representation at Scale (Arxiv 2023) [Paper] [Github]
Video Language Planning (Arxiv 2023) [paper] [Github]
RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models (Arxiv 2023) [Paper]
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (Arxiv 2023) [Paper] [Paper] [Project]
Vision Language Models in Autonomous Driving and Intelligent Transportation Systems (Arxiv 2023) [Paper]
ADAPT: Action-aware Driving Caption Transformer (ICRA 2023) [Paper] [Github]
Language Prompt for Autonomous Driving (Arxiv 2023) [paper] [Github]
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models (Arxiv 2023) [Paper] [Project]
LEARNING UNSUPERVISED WORLD MODELS FOR AUTONOMOUS DRIVING VIA DISCRETE DIFFUSION (Arxiv 2023) [Paper]
ADriver-I: A General World Model for Autonomous Driving (Arxiv 2023) [Paper]
HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (Arxiv 2023) [Paper]
On the Road with GPT-4V(vision): Early Explorations of Visual-Language Model on Autonomous Driving (Arxiv 2023) [paper]
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Arxiv 2023) [Paper]
Applications of Large Scale Foundation Models for Autonomous Driving (Arxiv 2023) [Paper]
Dolphins: Multimodal Language Model for Driving (Arxiv 2023) [Paper] [Project]
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? (Arxiv 2023) [Paper] [Github]
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations (Arxiv 2023) [paper] [Github]
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving (Arxiv 2023) [Paper] [[Github]
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes (Arxiv 2023) [Paper] [Project]
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (Arxiv 2023) [Paper] [Github]
Dialogue-based generation of self-driving simulation scenarios using Large Language Models (Arxiv 2023) [Paper] [Github]
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving (Arxiv 2023) [paper] [Project] [Github]
LingoQA: Video Question Answering for Autonomous Driving (Arxiv 2023) [paper] [Github]
DriveLM: Driving with Graph Visual Question Answering (Arxiv 2023) [Paper] [Github]
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (Arxiv 2023) [Paper] [Project]
LMDrive: Closed-Loop End-to-End Driving with Large Language Models (Arxiv 2023) [Paper] [Github]
Visual Point Cloud Forecasting enables Scalable Autonomous Driving (Arxiv 2023) [Paper] [Github]
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation (Arxiv 2023) [Paper] [Github]
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models (Arxiv 2024) [Paper] [Github]
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (Arxiv 2024) [Paper]
A Survey on Multimodal Large Language Models for Autonomous Driving (WACVW 2024) [Paper]
VLP: Vision Language Planning for Autonomous Driving (Arxiv 2023) [Paper]
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities (Arxiv 2024) [Paper]
MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation (Arxiv 2024) [Paper]
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents (Arxiv 2024) [Paper]
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Arxiv 2024) [Paper] [Github]
GenAD: Generative End-to-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
Generalized Predictive Model for Autonomous Driving (CVPR 2024) [Paper]
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving (Arxiv 2024) [paper]
DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving (Arxiv 2024) [Paper]
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control (Arxiv 2024) [Paper] [Project]
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (Arxiv 2024) [Paper] [Project] [Github]
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (ICLR 2024) [Paper] [Paper]
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [Paper]
Guiding Attention in End-to-End Driving Models (Arxiv 2024) [Paper]
Probing Multimodal LLMs as World Models for Driving (Arxiv 2024) [Paper]
Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models (Arxiv 2024) [Paper]
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (Arixv 2024) [Paper]
Unified End-to-End V2X Cooperative Autonomous Driving (Arxiv 2024) [paper]
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving (Arxiv 2024) [paper]
OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [paper]
MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving (Arxiv 2024) [Paper]
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes (Arxiv 2024) [Paper]
Language-Image Models with 3D Understanding (Arxiv 2024) [paper] [Project]
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? (Arxiv 2024) [Paper]
GFlow: Recovering 4D World from Monocular Video (Arxiv 2024) [Paper] [Github]
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving (Arxiv 2024) [Paper] [Github]
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability (Arxiv 2024) [Paper] [Github] [Project]
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github] [Project]
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences (Arxiv 2024) [Paper] [Github]
AD-H: Autonomous Driving with Hierarchical Agents (Arxiv 2024) [Paper]
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
A Superalignment Framework in Autonomous Driving with Large Language Models (Arxiv 2024) [Paper]
Enhancing End-to-End Autonomous Driving with Latent World Model (Arxiv 2024) [Paper]
SimGen: Simulator-conditioned Driving Scene Generation (Arxiv 2024) [paper]
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset (Arxiv 2024) [paper] [Project]
WonderWorld: Interactive 3D Scene Generation from a Single Image (Arxiv 2024) [Paper]
CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation (Arxiv 2024) [paper]
CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space (Arxiv 2024) [Paper] [Github]
Exploring the Causality of End-to-End Autonomous Driving (Arxiv 2024) [paper] [Github]
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving (Arxiv 2024) [Paper]
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving (Arxiv 2024) [Paper] [Github]
Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving (Arxiv 2024) [Paper]
Open 3D World in Autonomous Driving (Arxiv 2024) [Paper]
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving (Arxiv 2024) [Paper]
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (Arxiv 2024) [Paper]
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving (Arxiv 2024) [Paper]
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving (Arxiv 2024) [Paper]
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving (Arxiv 2024) [Paper]
ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models (ITSC 2024) [Paper]
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving (Arxiv 2024) [paper]
RenderWorld: World Model with Self-Supervised 3D Label (Arxiv 2024) [Paper]
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving (Arxiv 2024) [Paper]
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input (Arxiv 2024) [Paper] [Project] [Github]
METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance (Arxiv 2024) [paper]
DOES END-TO-END AUTONOMOUS DRIVING REALLY NEED PERCEPTION TASKS? (Arxiv 2024) [Paper]
Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models (Arxiv 2024) [paper]
ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding (Arxiv) [Paper]
Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
HE-Drive: Human-Like End-to-End Driving with Vision Language Models (Arxiv 2024) [Paper] [Project] [Paper]
UniDrive: Towards Universal Driving Perception Across Camera Configurations (Arxiv 2024) [Paper] [Github]
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation (Arxiv 2024) [paper] [Github] [Project]
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model (NeurIPS 2024) [Paper] [Project] [Github]
EMMA: End-to-End Multimodal Model for Autonomous Driving (Arxiv 2024) [Paper] [Github]
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control (Arxiv 2024) [Paper] [Github] [Project]
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving (Arxiv 2024) [paper] [Github]
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving (Arxiv 2024) [Paper]
Language Driven Occupancy Prediction (Arxiv 2024) [paper] [Github]
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention (Arxiv 2024) [Paper] [Project]
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models (Arxiv 2024) [Paper] [Github]
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model (Arxiv 2024) [paper] [Github]
UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving (Arxiv 2024) [paper]
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving (Arxiv 2024) [Paper] [Github]
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving (Arxiv 2024) [Paper] [Github]
Physical Informed Driving World Model (Arxiv 2024) [Paper] [Github]
GPD-1: Generative Pre-training for Driving (Arxiv 2024) [Paper] [Github]
Doe-1: Closed-Loop Autonomous Driving with Large World Model (Arxiv 2024) [Paper] [Github]
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving (Arxiv 2024) [paper] [Github]
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving (Arxiv 2024) [Paper] [Github]
SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout (NeurIPS 2024) [paper]
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models (Arxiv 2024) [Paper] [Github]
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training (Arxiv 2024) [paper]
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving (Arxiv 2024) [Paper] [paper]
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving (Arxiv 2024) [Paper] [Github]
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision (Arxiv 2024) [paper]
DriveGPT: Scaling Autoregressive Behavior Models for Driving (Arxiv 2024) [paper]
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers (Arxiv 2024) [paper]
UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision (Arxiv 2024) [paper]
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT (Arxiv 2024) [paper] [Github]
AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data (Arxiv 2025) [Paper] [Github]
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests (Arxiv 2025) [Paper]
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives (Arxiv 2025) [paper] [Github]
Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving (Arxiv 2025) [paper]
Distilling Multi-modal Large Language Models for Autonomous Driving (Arxiv 2025) [Paper]
A Survey of World Models for Autonomous Driving (Arxiv 2025) [Paper]
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation (Arxiv 2025) [paper] [Paper]
SSF: Sparse Long-Range Scene Flow for Autonomous Driving (Arxiv 2025) [Paper]
V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models (Arxiv 2025) [Paper]
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction (Arxiv 2025) [paper]
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey (Arxiv 2025) [Paper]
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning (Arxiv 2025) [Paper]
VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion (Arxiv 2025) [Paper]
VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers (Arxiv 2025) [Paper]
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering (Arxiv 2025) [[Paper]](FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering)
BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving (Arxiv 2025) [Paper]
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving (Arxiv 2025) [Paper] [Github]
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning (Arxiv 2025) [Paper] [Github]
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting (Arxiv 2025) [Paper]
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving (Arxiv 2025) [Paper] [Github]
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder (Arxiv 2025) [paper]
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving (Arxiv 2025) [Paper]
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment (Arxiv 2025) [Paper]
Post-interactive Multimodal Trajectory Prediction for Autonomous Driving (Arxiv 2025) [Paper]
Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation (Arxiv 2025) [Paper]
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding (Arxiv 2025) [Paper] [Github]
Unlock the Power of Unlabeled Data in Language Driving Model (Arxiv 2025) [Paper]
Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback (Arxiv 2025) [Paper]
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models (Arxiv 2025) [Paper]
Active Learning from Scene Embeddings for End-to-End Autonomous Driving (Arxiv 2025) [Paper]
Centaur: Robust End-to-End Autonomous Driving with Test-Time Training (Arxiv 2025) [Paper]
InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving (Arxiv 2025) [Paper] [Paper]
Hydra-MDP++: Advancing End-to-End Driving via Expert-Guided Hydra-Distillation (Arxiv 2025) [Paper]
Tracking Meets Large Multimodal Models for Driving Scenario Understanding (Arxiv 2025) [Paper]
ChatBEV: A Visual Language Model that Understands BEV Maps (Arxiv 2025) [paper]
RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving (Arxiv 2025) [paper]
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning (CVPR 2025) [Paper]
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving (Arxiv 2025) [Paper]

Other

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving (Arxiv 2025) [Paper] [Github]
X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios (Arxiv 2024) [Paper] [Github]
Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views (AAAI 2021) [Paper] [Github] [Project]
Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers (WACV 2023) [[Paper]](Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers)
ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view (Arxiv 2022) [paper]
360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View (Arxiv 2023) [Paper] [Github] [Project]
F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving (Arxiv 2023) [Paper]
NVAutoNet: Fast and Accurate 360∘ 3D Visual Perception For Self Driving (Arxiv 2023) [Paper]
FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems (Arxiv 2023) [Paper]
Aligning Bird-Eye View Representation of PointCloud Sequences using Scene Flow (IEEE IV 2023) [Paper] [Github]
MotionBEV: Attention-Aware Online LiDARMoving Object Segmentation with Bird’s Eye Viewbased Appearance and Motion Features (Arxiv 2023) [Paper]
WEDGE: A multi-weather autonomous driving dataset built from generativevision-language models (Arxiv 2023) [Paper] [Github] [Project]
Leveraging BEV Representation for360-degree Visual Place Recognition (Arxiv 2023) [Paper]
NMR: Neural Manifold Representation for Autonomous Driving (Arxiv 2023) [Paper]
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022) [Paper] [Github]
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative3D Object Detection (CVPR 2022) [Paper] [Github]
Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task (CVPR 2022) [Paper] [Github] [Project]
A Motion and Accident Prediction Benchmark for V2X Autonomous Driving (Arxiv 2023) [Paper] [Project]
BEVBert: Multimodal Map Pre-training for Language-guided Navigation (ICCV 2023) [Paper]
V2X-Seq: A Large-Scale Sequential Dataset forVehicle-Infrastructure Cooperative Perception and Forecasting (Arxiv 2023) [Paper] [Github] [Project]
BUOL: A Bottom-Up Framework with Occupancy-aware Lifting forPanoptic 3D Scene Reconstruction From A Single Image (CVPR 2023) [paper] [Github]
BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird’s-Eye-View in Dynamic Scenarios (Arxiv 2023) [Paper]
Bird’s-Eye-View Scene Graph for Vision-Language Navigation (Arxiv 2023) [paper]
OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data (Arxiv 2023) [paper]
Hidden Biases of End-to-End Driving Models (ICCV 2023) [Paper] [[Github]][https://github.com/autonomousvision/carla_garage]
EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps (Arxiv 2023) [Paper]
End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2023) [paper] [Github]
BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View Images (ICCV 2023) [paper]
I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through Bird’s Eye View Projections (IROS 2023) [Paper]
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving (Arxiv 2023) [Paper] [Project]
BEV-DG: Cross-Modal Learning under Bird’s-Eye View for Domain Generalization of 3D Semantic Segmentation (ICCV 2023) [paper]
MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (ICCV 2023) [Paper] [Github] [Project]
Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [Paper] [Github]
Occ2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions (ICCV 2023) [Paper]
QUEST: Query Stream for Vehicle-Infrastructure Cooperative Perception (Arxiv 2023) [paper]
Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction (Arxiv 2023) [Paper]
SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping an Building Change Detection (Arxiv 2023) [paper]
Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review (Arxiv 2023) [Paper]
BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving (Arxiv 2023) [paper]
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation (Arxiv 2023) [Paper]
Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception (Arxiv 2023) [paper]
PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds (Arxiv 2023) [paper]
BEVTrack: A Simple Baseline for 3D Single Object Tracking in Birds's-Eye-View (Arxiv 2023) [Paper] [Github]
BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation (Arxiv 2023) [Paper]
UC-NERF: NEURAL RADIANCE FIELD FOR UNDER-CALIBRATED MULTI-VIEW CAMERAS IN AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Project] [Github]
All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes (Arxiv 2023) [paper]
BEVSeg2TP: Surround View Camera Bird’s-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction (Arxiv 2023) [Paper]
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout (Arxiv 2023) [Paper]
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI (Arxiv 2023) [Paper] [Github]
A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2023) [paper]
C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation (Arxiv 2023) [paper]
Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals (Arxiv 2024) [Paper]
GeoDecoder: Empowering Multimodal Map Understanding (Arxiv 2024) [Paper]
Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird’s-Eye-View (Arxiv 2024) [Paper]
Text2Street: Controllable Text-to-image Generation for Street Views (Arxiv 2024) [paper]
Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps (Arxiv 2024) [Paper]
EV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues (Arxiv 2024) [paper]
OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving (Arxiv 2024) [paper]
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion (Arxiv 2024) [Paper] [Github]
M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving (Arxiv 2024) [Paper]
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors (Arxiv 2024) [Paper]
Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization (Arxiv 2024) [Paper]
MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps (Arxiv 2024) [Paper]
Neural Semantic Map-Learning for Autonomous Vehicles (Arxiv 2024) [Paper]
AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [[Paper]](Arxiv 2024) [paper] [Project]
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability (Arxiv 2024) [paper] [Github]
SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm (Arxiv 2024) [Paper] [Github]
UrbanWorld: An Urban World Model for 3D City Generation (Arxiv 2024) [Paper]
From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model (ICRA 2024) [Paper]
Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation (Arxiv 2024) [paper]
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation (Arxiv 2024) [Paper] [Project]
BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autononomous Driving (Arxiv 2024) [Paper]
Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model (Arxiv 2024) [Paper]
RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models (Arxiv 2024) [Paper]
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving (Arxiv 2024) [paper]
Hidden Biases of End-to-End Driving Datasets (Arxiv 2024) [paper] [Github]
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization (Arxiv 2024) [Paper]
VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving (Arxiv 2024) [Paper] [Project] [Github]
OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving (Arxiv 2024) [Paper] [Project] [Github]
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes (Arxiv 2024) [paper] [Project] [Github]
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving (Arxiv 2024) [Paper]
Joint Perception and Prediction for Autonomous Driving: A Survey (Arxiv 2024) [[paper]](Joint Perception and Prediction for Autonomous Driving: A Survey)
3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous Driving (Arxiv 2025) [Paper]
Range and Bird’s Eye View Fused Cross-Modal Visual Place Recognition (Arxiv 2025) [paper]
Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction (Arxiv 2025) [Paper]
BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model (Arxiv 2025) [Paper] [Github]
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments (Arxiv 2025) [Paper] [Project] [Github]
RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation (Arxiv 2025) [paper]

License

vasgaowei/BEV-Perception

Folders and files

Latest commit

History

Repository files navigation

Awesome Bird's Eye View Perception

News

Contents

Papers

Survey

3D Object Detection

Radar Lidar

Radar Camera

Lidar Camera

Lidar

Monocular

Multiple Camera

BEV Segmentation

Lidar Camera

Lidar

Monocular

Multiple Camera

Perception Prediction Planning

Monocular

Multiple Camera

Mapping

Lidar

Lidar Camera

Monocular

Multiple Camera

Lanegraph

Monocular

Tracking

Locate

Occupancy Prediction

Occupancy Challenge

Challenge

Dataset

World Model

Other

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages