Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

License

Notifications You must be signed in to change notification settings

OpenGVLab/Vision-RWKV

Repository files navigation

Vision-RWKV

The official implementation of "Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures".

NewsπŸš€πŸš€πŸš€

  • 2025εΉ΄02月18ζ—₯: A new version of the CUDA code has been added in the cuda_new folder to eliminate the hardcoding of T_MAX.
  • 2025εΉ΄02月11ζ—₯: 🎊🎊 Vison-RWKV is accepted by ICLR 2025!
  • 2024εΉ΄04月14ζ—₯: We support rwkv6 in classification task, higher performance!
  • 2024εΉ΄03月04ζ—₯: We release the code and models of Vision-RWKV.

Highlights

  • High-Resolution Efficiency: Processed high-resolution images smoothly with a global receptive field.
  • Scalability: Pre-trained with large-scale datasets and posses scale up stablity.
  • Superior Performance: Achieved a better performance in classfication tasks than ViTs. Surpassed window-based ViTs and comparabled to global attention ViTs with lower flops and higher speed in dense prediction tasks.
  • Efficient Alternative: Capability to be an alternative backbone to ViT in comprehensive vision tasks.
image

Overview

image

Schedule

  • Support RWKV6 as VRWKV6
  • Release VRWKV-L
  • Release VRWKV-T/S/B

Model Zoo

Pretrained Models

Model Size Pretrain Download
VRWKV-L 192 ImageNet-22K ckpt

Image Classification (ImageNet-1K)

Model Size #Param #FLOPs Top-1 Acc Download
VRWKV-T 224 6.2M 1.2G 75.1 ckpt | cfg
VRWKV-S 224 23.8M 4.6G 80.1 ckpt | cfg
VRWKV-B 224 93.7M 18.2G 82.0 ckpt | cfg
VRWKV-L 384 334.9M 189.5G 86.0 ckpt | cfg
VRWKV6-T 224 7.6M 1.6G 76.6 ckpt | cfg
VRWKV6-S 224 27.7M 5.6G 81.1 ckpt | cfg
VRWKV6-B 224 104.9M 20.9G 82.6 ckpt | cfg
  • VRWKV-L is pretrained on ImageNet-22K and then finetuned on ImageNet-1K.
  • We train VRWKV-L with the internimage codebase for a higher speed.

Object Detection with Mask-RCNN head (COCO)

Model #Param #FLOPs box AP mask AP Download
VRWKV-T 8.4M 67.9G 41.7 38.0 ckpt | cfg
VRWKV-S 29.3M 189.9G 44.8 40.2 ckpt | cfg
VRWKV-B 106.6M 599.0G 46.8 41.7 ckpt | cfg
VRWKV-L 351.9M 1730.6G 50.6 44.9 ckpt | cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Semantic Segmentation with UperNet head (ADE20K)

Model #Param #FLOPs mIoU Download
VRWKV-T 8.4M 16.6G 43.3 ckpt | cfg
VRWKV-S 29.3M 46.3G 47.2 ckpt | cfg
VRWKV-B 106.6M 146.0G 49.2 ckpt | cfg
VRWKV-L 351.9M 421.9G 53.5 ckpt | cfg
  • We report the #Param and #FLOPs of the backbone in this table.

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{duan2024vrwkv,
 title={Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures},
 author={Duan, Yuchen and Wang, Weiyun and Chen, Zhe and Zhu, Xizhou and Lu, Lewei and Lu, Tong and Qiao, Yu and Li, Hongsheng and Dai, Jifeng and Wang, Wenhai},
 journal={arXiv preprint arXiv:2403.02308},
 year={2024}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Vision-RWKV is built with reference to the code of the following projects: RWKV, MMPretrain, MMDetection, MMSegmentation, ViT-Adapter, InternImage. Thanks for their awesome work!

About

[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /