Limin Wang

Limin Wang (王利民)
Multimedia Computing Group
Department of Computer Science and Technology
Nanjing University
Office: CS Building 506
Email: lmwang.nju [at] gmail.com

About Me (CV)

I am a Professor at Department of Computer Science and Technology and also affiliated with State Key Laboratory for Novel Software Technology, Nanjing University.

Previously, I received the B.S. degree from Nanjing University in 2011, and the Ph.D. degree from The Chinese University of Hong Kong under the supervision of Prof. Xiaoou Tang in 2015. From 2015 to 2018, I was a Post-Doctoral Researcher with Prof. Luc Van Gool in the Computer Vision Laboratory (CVL) at ETH Zurich.

News

2026年05月04日: Three papers are accepted by ICML.
2026年03月30日: SparseBEV++ is accepted by TPAMI.
2026年03月30日: CompViT is accepted by IJCV.
2026年03月30日: MoG is accepted by SCIS.
2026年02月21日: Eight papers are accepted by CVPR 2026.
2026年01月30日: We open-source the Video-o3 project: a framework of interleaved clue seeking for long video multi-hop reasoning.
2026年01月27日: Eight papers are accepted by ICLR 2026.
2025年12月15日: We open-source several interesting projects: SAM2++, Sora2-mini, SteadyDancer.
2025年12月01日: We release the InternVideo-Next : a new Video Foundation Model for world modeling
2025年09月18日: Six papers are accepted by NeurIPS 2025.
2025年07月30日: Two papers are accepted by T-PAMI.
2025年06月27日: Five papers are accepted by ICCV 2025.
2025年05月01日: Two papers are accepted by ICML 2025.
2025年04月19日: Prof. Limin Wang is invited to be an Associate Editor of T-PAMI.
2025年03月28日: JointFormer is accepted by T-PAMI.
2025年02月27日: Six papers are accepted by CVPR 2025.
2025年01月23日: Six papers are accepted by ICLR 2025.
2024年12月12日: The extension of PDPP is accepted by T-PAMI.
2024年09月26日: Four papers are accepted by NeurIPS 2024.
2024年07月01日: InternVideo2 is accepted by ECCV 2024.
2024年07月01日: Six papers are accepted by ECCV 2024 on video foundation model & image generation etc.
2024年06月01日: Our MixFormer is selected as the Featured Article of TPAMI.
2024年04月12日: The VLG work is accepted by IJCV.
2024年04月07日: The extension of STMixer is accepted by T-PAMI.
2024年02月27日: Ten papers are accepted by CVPR 2024 on video foundation model & video generation & benchmarks etc.
2024年01月15日: Two papers (InternVid & SparseFormer) are accepted by ICLR 2024.
2024年01月01日: The extension of MixFormer is accepted by T-PAMI.
2023年11月30日: Our LogN is accepted by IJCV.
2023年11月01日: Our CamLiFlow is accepted by T-PAMI.
2023年11月01日: Our RefineTAD receives the Best Paper Honorable Mention Award of ACM MM 2023.
2023年10月25日: Our Dynamic MDETR is accepted by T-PAMI.
2023年09月22日: Our MixFormer V2 is accepted by NeurIPS 2023.
2023年09月02日: One paper on crowded pose estimation is accepted by IJCV.
2023年07月21日: Our survey paper on 3D human mesh recovery is accepted by T-PAMI.
2023年07月14日: Our UMT Foundation Model is accepted by ICCV 2023.
2023年07月14日: Our SportsMOT dataset is accepted by ICCV 2023.
2023年07月14日: Ten papers are accepted by ICCV 2023 (Topics: Video foundation models, action detection and anticipation, multi-object tracking, (3D) object detection, new dataset.)
2023年07月13日: We release the InterVid dataset for multi-modal video understanding and generation.
2023年06月25日: We release the Grasp Anything project for embodied AI by leveraging vision foundation model.
2023年06月15日: Prof. Limin Wang is invited to be an Editorial Board Member of IJCV.
2023年06月10日: I am invited to give a ARP talk at VALSE 2023 (slide).
2022年05月25日: We propose the MixFormer V2, a real-time object tracker. We have released the source code.
2023年05月19日: Temporal Perceiver is accepted by T-PAMI. We have released the source code.
2023年05月10日: We present the VideoChat system, by combining video foundation model and LLM.
2023年03月18日: We propose the VideoMAE V2, training the first billion-level video transformer (source code).
2023年03月01日: Five papers on video understanding and point cloud analysis are accepted by CVPR 2023.
2023年02月01日: One paper is accepted by ICLR 2023 and one by AAAI 2023.
2022年11月01日: The FineAction dataset is accepted by TIP.
2022年10月09日: The extension of LIP is accepted by IJCV.
2022年09月15日: VideoMAE and PointTAD are accepted by NeurIPS 2022.
2022年09月15日: We present the BasicTAD, an end-to-end TAD baseline method. We have released the source code.
2022年08月10日: One paper is accepted by ECCV 2022 and one paper (CDG) is accepted by IJCV.
2022年05月01日: We are organizing the second DeeperAction Challenge at ECCV 2022, by introducing five new benchmarks on temporal action localization, multi-actor tracking, spatiotemporal action detection, part-level action parsing, and fine-grained video anomaly recognition.
2022年03月23日: We present the VideoMAE, a self-supervised video transformer obtaining SOTA performance on the benchmarks of Kinetics, Something-Something, and AVA. We have released the source code and pre-trained models.
2022年03月02日: We present the MixFormer, a compact and efficient object tracker, obtaining SOTA performance on several benchmarks. We have released the source code.
2022年03月02日: We present the AdaMixer, a fast-converging query based object detector tracker, obtaining competitive performance on the MS COCCO benchmark. We have released the source code.
2022年03月02日: Seven papers on object detection, object tracking, action recognition etc. are accepted by CVPR 2022.
2021年07月25日: Eight papers on video understanding are accepted by ICCV 2021: new dataset (MultiSports), backbone (TAM), sampling method (MGSampler), detection frameworks (RTD and TRACE). For more details, please refer to our papers.
2021年07月15日: We release the MultiSports dataset for spatiotemporal action detection.
2021年07月15日: Our team secures the first place at ACM MM Pre-training for Video Understanding Challenge for Track 2.
2021年06月15日: Our team secures the first place at CVPR Kinetics Challenge for Self-Supervised Task.
2021年06月15日: Our team secures the first place at CVPR PIC Challenge for Human-Centric Spatio-Temporal Video Grounding Task.
2021年06月01日: We are organizing DeeperAction Challenge at ICCV 2021, by introducing three new benchmarks on temporal action localization, spatiotemporal action detection, and part-level action parsing.
2021年04月20日: The extension of TRecgNet is accepted by IJCV.
2021年04月07日: We propose a target transformer for accurate anchor-free tracking, termed as TREG (code).
2021年04月07日: We present a transformer decoder for direct action proposal generation, termed as RTD-Net (code).
2021年03月01日: Two papers on action recognition and point cloud segmentation are accepted by CVPR 2021.
2020年12月30日: We propose a new video architecture of using temporal difference, termed as TDN and realease the code.
2020年07月03日: Three papers on action detection and segmentation are accepted by ECCV 2020.
2020年06月28日: Our proposed DSN, a dynamic version of TSN for efficient action recognition, is accepted by TIP.
2020年05月14日: We propose a temporal adaptive module for video recognition, termed as TAM and code.
2020年04月16日: The code of our published papers will be made available at Github: MCG-NJU.
2020年04月16日: We propose a fully convolutional online tracking framwork, termed as FCOT and code.
2020年03月10日: Our proposed temporal module TEA is accepted by CVPR 2020.
2020年01月20日: We propose an efficient video representation learning framwork, termed as CPD and release the code.
2020年01月15日: We present an anchor-free action tubelet detector, termed as MOC-Detector and release the code.
2019年12月20日: Our proposed V4D, a principled video-level representation learning framework, is accepted by ICLR 2020.
2019年11月21日: Our proposed TEINet, an efficient video architecture for video recognition, is accepted by AAAI 2020.
2019年07月23日: Our proposed LIP, a general alternative to average or max pooling, is accepted by ICCV 2019.
2019年03月15日: Two papers are accepted by CVPR 2019: one for group activity recognition and one for RGB-D transfer learning.
2018年08月19日: One paper is accepted by ECCV 2018 and one (TSN) by T-PAMI.
2018年04月01日: I join Nanjing University as a faculty member at Department of Computer Science and Technology .
2017年11月28日: We released a recent work on video architecture design for spatiotemporal feature learning. [ arXiv ] [ Code ].
2017年09月08日: We have released the TSN models learned in the Kinetics dataset. These models could be transferred well to the existing datasets for action recognition and detection [ Link ].
2017年09月01日: One paper is accepted by ICCV 2017 and one (OS2E-CNN) by IJCV.
2017年07月18日: I am invited to give a talk at the Workshop on Frontiers of Video Technology-2017 [ Slide ].
2017年03月28日: I am co-organizing the CVPR2017 workshop and challenge on Visual Understanding by Learning from Web Data. For more details, please see the workshop page and challenge page.
2017年02月28日: Two papers are accepted by CVPR 2017.
2016年12月20日: We release the code and models for SR-CNN paper [ Code ].
2016年10月05日: We release the code and models for Places2 scene recognition challenge [ arXiv ] [ Code ].
2016年08月03日: Code and model of Temporal Segment Networks is released [ arXiv ] [ Code ].
2016年07月15日: One paper is accepted by ECCV 2016 and one by BMVC 2016.
2016年06月16日: Our team secures the 1st place for untrimmed video classification at ActivityNet Challenge 2016 [ Result ].
Basically, our solution is based on our works of Temporal Segment Networks (TSN) and Trajectory-pooled Deep-convolutional Descriptors (TDD).
2016年03月01日: Two papers are accepted by CVPR 2016.
2015年12月10日: Our SIAT_MMLAB team secures the 2nd place for scene recognition at ILSVRC 2015 [ Result ].
2015年09月30日: We rank 3rd for cultural event recognition on ChaLearn Looking at People challenge, at ICCV 2015.
2015年08月07日: We release the Places205-VGGNet models [ Link ].
2015年07月22日: Code of Trajectory-Pooled Deep-onvolutional Descriptors (TDD) is released [ Link ].
2015年07月15日: Very deep two stream ConvNets are proposed for action recognition [ Link ].
2015年03月15日: We are the 1st winner of both tracks for action recognition and cultural event recognition, on ChaLearn Looking at People Challenge at CVPR 2015.
2015年03月03日: One paper is accepted by CVPR 2015, details coming soon.
2014年09月05日: We rank 4th for action recognition and 2nd for action detection, on THUMOS'14 Challenge at ECCV 2014.
2014年06月16日: Two papers are accepted by ECCV 2014.
2014年06月10日: We are the 1st winner of both track 1 and track2, and rank 4th for track3, on ChaLearn Looking at People Challenge at ECCV 2014.
2014年05月20日: A comprehensive study paper on action recognition [ Link ].
2014年05月16日: New homepage on Github launched!

Selected Publications [ Full List ] [ Google Scholar ] [ Github: MCG-NJU ]

SparseBEV: A Fully Sparse Framework for Multi view 3D Object Detection
Y. Chen, H. Liu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026.
[ Paper ] [ Code ]

Deep Equilibrium Object Detection and Segmentation
S. Wang, Y. Teng, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.
[ Paper ] [ Code ]

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
L. Chen, Z. Tong, Y. Song, G. Wu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.
[ Paper ] [ Code ]

JointFormer: A Unifed Framework with Joint Modeling for Video Object Segmentation
J. Zhang, Y. Cui, G. Wu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.
[ Paper ] [ Code ]

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
H. Wang, Y. Wu, S. Guo, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.
[ Paper ] [ Code ]

STMixer: A One-Stage Sparse Action Detector
T. Wu, M. Cao, Z. Gao, G. Wu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[ Paper ] [ Code ]

MixFormer: End-to-End Tracking with Iterative Mixed Attention
Y. Cui, C. Jiang, G. Wu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[ Paper ] [ Code ]

Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
H.Liu, T. Lu, Y. Xu, J. Liu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[ Paper ] [ Code ]

Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
F. Shi, R. Gao, W. Huang, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[ Paper ] [ Code ]

Recovering 3D Human Mesh from Monocular Images: A Survey
Y. Tian, H. Zhang, Y. Liu, L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
[ Paper ] [ Code ]

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
J. Tan, Y. Wang, G. Wu L. Wang
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
[ Paper ] [ Code ]

CompViT: Real-Time Compressed Video Action Recognition with Asymmetric Transformer Networks
T. Wu, S. Chen, L. Mi, W. Wang, H. Dai, L. Wang
in International Journal of Computer Vision (IJCV), 2026.
[ Paper ] [ Code ]

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
G. Chen, Y. Huang, J. Xu, B. Pei, J. Wang, Z. Chen, Z. Li, T. Lu, L. Wang
in International Journal of Computer Vision (IJCV), 2026.
[ Paper ] [ Code ]

Progressive Visual Prompt Learning with Contrastive Feature Re-formation
C. Xu, Y. Zhu, H. Shen, B. Chen, Y. Liao, X. Chen, L. Wang
in International Journal of Computer Vision (IJCV), 2025.
[ Paper ] [ Code ]

VLG: General Video Recognition with Web Textual Knowledge
J. Lin, Z. Liu, W. Wang, W. Wu, L. Wang
in International Journal of Computer Vision (IJCV), 2024.
[ Paper ] [ Code ]

Logit Normalization for Long-tail Object Detection
L. Zhang, Y. Teng, L. Wang
in International Journal of Computer Vision (IJCV), 2024.
[ Paper ] [ Code ]

Dual Graph Networks for Pose Estimation in Crowded Scenes
J. Tu, G. Wu, L. Wang
in International Journal of Computer Vision (IJCV), 2024.
[ Paper ] [ Code ]

LIP: Local Importance-based Pooling
Z. Gao, L. Wang, G. Wu
in International Journal of Computer Vision (IJCV), 2023.
[ Paper ] [ Code ]

Cross-Domain Gated Learning for Domain Generalization
D. Du, J. Chen, Y. Li, K. Ma, G. Wu, Y Zheng, L. Wang
in International Journal of Computer Vision (IJCV), 2022.
[ Paper ] [ Code ]

Cross-Modal Pyramid Translation for RGB-D Scene Recognition
D. Du, L. Wang, Z. Li, G. Wu
in International Journal of Computer Vision (IJCV), 2021.
[ Paper ] [ Code ]

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Y. Wang, K. Li, X. Li, J. Yu, Y. He, G. Chen, B. Pei, R. Zheng, J. Xu, Z. Wang, Y. Shi, T. Jiang, S. Li, H. Zhang, Y. Huang, Y. Qiao, Y. Wang, L. Wang
in European Conference on Computer Vision (ECCV), 2024.
[ Paper ] [ Code ]
STOA performance on more than 60 video understanding tasks.

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
L. Wang, B. Huang. Z. Zhao, Z. Tong, Y. He, Y. Wang, Y. Wang, Yu Qiao
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[ Paper ] [ Code ]

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Z. Tong, Y. Song, J. Wang, L. Wang
in Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), 2022.
[ Paper ] [ Code ]

MixFormer: End-to-End Tracking with Iterative Mixed Attention
Y. Cui, C. Jiang, L. Wang, G. Wu
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[ Paper ] [ Code ]

AdaMixer: A Fast-Converging Query-Based Object Detector
Z. Gao, L. Wang, B. Han, S. Guo
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[ Paper ] [ Code ]

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Y. Li, L. Chen, R. He, Z. Wang, G. Wu, L. Wang
in IEEE International Conference on Computer Vision (ICCV), 2021.
[ Paper ] [ Data ] [ Code ] [ Challenge ]

TDN: Temporal Difference Networks for Efficient Action Recognition
L. Wang, Z. Tong, B. Ji, G. Wu
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[ Paper ] [ Code ]

Temporal Action Detection with Structured Segment Networks
Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin
in International Journal of Computer Vision (IJCV), 2020.
[ Paper ] [ Code ]

Actions as Moving Points
Y. Li, Z. Wang, L. Wang, G. Wu
in European Conference on Computer Vision (ECCV), 2020.
[ Paper ] [ Code ]

Temporal Segment Networks for Action Recognition in Videos
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool
in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
[ Paper ] [ Code ]

Transferring Deep Object and Scene Representations for Event Recognition in Still Images
L. Wang, Z. Wang, Y. Qiao, and L. Van Gool
in International Journal of Computer Vision (IJCV), 2018.
[ Paper ] [ Code ]
STOA performance for event recognition on ChaLearn LAP cultural event, WIDER datasets.

Appearance-and-Relation Networks for Video Classification
L. Wang, W. Li, W. Li, and L. Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[ Paper ] [ Code ]
A new architecture for spatiotemporal feature learning.

UntrimmedNets for Weakly Supervised Action Recognition and Detection
L. Wang, Y. Xiong, D. Lin, and L. Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[ Paper ] [ BibTex ][ Code ]
An end-to-end architecture to learn from untrimmed videos.

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool
in European Conference on Computer Vision (ECCV), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Code ] [ Journal Version]
Proposing a segmental architecture and obtaining the state-of-the-art performance on UCF101 and HMDB51

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
L. Wang, Y. Qiao, and X. Tang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ Paper ] [ BibTex ] [ Extended Abstract ] [ Poster ] [ Project Page ] [ Code ]
State-of-the-art performance: HMDB51: 65.9%, UCF101: 91.5%.

Contests

ActivityNet Large Scale Activity Recognition Challenge, 2016: Untrimmed Video Classification, Rank: 1/24.
ImageNet Large Scale Visual Recognition Challenge, 2015: Scene Recognition, Rank: 2/25.
ChaLearn Looking at People Challenge, 2015, Rank: 1/6
THUMOS Action Recognition Challenge, 2015, Rank: 5/11.
ChaLearn Looking at People Challenge, 2014 , Rank: 1/6, 4/17.
THUMOS Action Recognition Challenge, 2014, Rank: 4/14, 2/3.
ChaLearn Multi-Modal Gesture Recognition Challenge, 2013 , Rank: 4/54.
THUMOS Action Recognition Challenge, 2013, Rank: 4/16.

Academic Service

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Image Processing

IEEE Transactions on Multimedia

IEEE Transactions on Circuits and Systems for Video Technology

Pattern Recognition

Pattern Recognition Letter

Image and Vision Computing

Computer Vision and Image Understanding

Conference Reviewer

IEEE Conference on Computer Vision and Pattern Recognition, 2017

IEEE International Conference on Automatic Face and Gesture Recognition, 2017

European Conference on Computer Vision, 2016

Asian Conference on Computer Vision, 2016

International Conference on Pattern Recognition, 2016

Friends

Wen Li (ETH), Jie Song (ETH), Sheng Guo (Malong), Weilin Huang (Malong), Bowen Zhang (USC), Zhe Wang (UCI), Wei Li (Google), Yuanjun Xiong (Amazon), Xiaojiang Peng (SIAT), Zhuowei Cai (Google), Xingxing Wang (NTU)

Last Updated on 15th June., 2023

Published with GitHub Pages