OpenGVLab/VideoMAEv2

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
dataset		dataset
docs		docs
misc		misc
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_for_finetuning.py		engine_for_finetuning.py
engine_for_pretraining.py		engine_for_pretraining.py
extract_tad_feature.py		extract_tad_feature.py
optim_factory.py		optim_factory.py
requirements.txt		requirements.txt
run_class_finetuning.py		run_class_finetuning.py
run_mae_pretraining.py		run_mae_pretraining.py
utils.py		utils.py

Repository files navigation

[CVPR 2023] Official Implementation of VideoMAE V2

flowchart

PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao
Nanjing University, Shanghai AI Lab, CAS

News

[2024年09月19日] Checkpoints have been migrated to Hugging Face. You can obtain weights from VideoMAEv2-hf.
[2023年05月29日] VideoMAE V2-g features for THUMOS14 and FineAction datasets are available at TAD.md now.
[2023年05月11日] We have supported testing of our distilled models at MMAction2 (dev version)! See PR#2460.
[2023年05月11日] The feature extraction script for TAD datasets has been released! See instructions at TAD.md.
[2023年04月19日] ViT-giant model weights have been released! You can get the download links from MODEL_ZOO.md.
[2023年04月18日] Code and the distilled models (vit-s & vit-b) have been released!
[2023年04月03日] ~~(削除) Code and models will be released soon. (削除ここまで)~~

Model Zoo

We now provide the model weights in MODEL_ZOO.md. We have additionally provided distilled models in MODEL_ZOO.

Model	Dataset	Teacher Model	#Frame	K710 Top-1	K400 Top-1	K600 Top-1
ViT-small	K710	vit_g_hybrid_pt_1200e_k710_ft	16x5x3	77.6	83.7	83.1
ViT-base	K710	vit_g_hybrid_pt_1200e_k710_ft	16x5x3	81.5	86.6	85.9

Installation

Please follow the instructions in INSTALL.md.

Data Preparation

Please follow the instructions in DATASET.md for data preparation.

Pre-training

The pre-training instruction is in PRETRAIN.md.

Fine-tuning

The fine-tuning instruction is in FINETUNE.md.

Citation

If you find this repository useful, please use the following BibTeX entry for citation.

@InProceedings{wang2023videomaev2,
 author = {Wang, Limin and Huang, Bingkun and Zhao, Zhiyu and Tong, Zhan and He, Yinan and Wang, Yi and Wang, Yali and Qiao, Yu},
 title = {VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking},
 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
 month = {June},
 year = {2023},
 pages = {14549-14560}
}
@misc{videomaev2,
 title={VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking},
 author={Limin Wang and Bingkun Huang and Zhiyu Zhao and Zhan Tong and Yinan He and Yi Wang and Yali Wang and Yu Qiao},
 year={2023},
 eprint={2303.16727},
 archivePrefix={arXiv},
 primaryClass={cs.CV}
}

About

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

arxiv.org/abs/2303.16727

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

OpenGVLab/VideoMAEv2

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2023] Official Implementation of VideoMAE V2

News

Model Zoo

Installation

Data Preparation

Pre-training

Fine-tuning

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

OpenGVLab/VideoMAEv2

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2023] Official Implementation of VideoMAE V2

News

Model Zoo

Installation

Data Preparation

Pre-training

Fine-tuning

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages