Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Training RoPE ViT #2557

Answered by rwightman
sinahmr asked this question in Q&A
Jul 25, 2025 · 1 comments · 8 replies
Discussion options

Hello,

I want to train a ViT model with RoPE instead of absolute positional embedding. I noticed the following item in the list of supported models in the README, but I couldn't find out how to access such a model: "ROPE-ViT - https://arxiv.org/abs/2403.13298"

There doesn't seem to be an option for it in the vision_transformer.py and I couldn't find a specific .py file for this method either.
Could you please guide me on how to train such a model?

Thank you very much.

You must be logged in to vote

@sinahmr yes, currently all of the ViT models with ROPE embeddings in timm are based on the EVA model (eva.py). It was the first to include ROPE embeddings and it's been extended to support several variants. It's essentially a ViT with ROPE support (w/ abs pos embed option), SwiGLU option. The comments in the file have references for the model sources, papers, etc. There are also several timm definitions that I trained with registers like

vit_base_patch16_rope_reg1_gap_256.sbb_in1k
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k

All of the models are a bit different, even at the same 'size' range. ...

Replies: 1 comment 8 replies

Comment options

Update: I just noticed that there is a use_rot_pos_emb option in eva.py and models like vit_base_patch16_rope_mixed_ape_224 are defined in that file. So it seems like my problem is resolved, sorry for the premature question.
Just to make sure (I'm unfortunately not familiar with EVA), is training such a model (vit_base_patch16_rope_mixed_ape_224) the correct way to train ROPE-ViT?

You must be logged in to vote
8 replies
Comment options

@sinahmr yes, currently all of the ViT models with ROPE embeddings in timm are based on the EVA model (eva.py). It was the first to include ROPE embeddings and it's been extended to support several variants. It's essentially a ViT with ROPE support (w/ abs pos embed option), SwiGLU option. The comments in the file have references for the model sources, papers, etc. There are also several timm definitions that I trained with registers like

vit_base_patch16_rope_reg1_gap_256.sbb_in1k
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k

All of the models are a bit different, even at the same 'size' range. So you'll want to dig into the details a bit and figure out what fits your goals.... you'll also want to decide whether to use pretrained weights or from scratch. The EVA and PE models were pretrained with huge datasets, many of the other just with ImageNet-22/12k or 1k.

Answer selected by sinahmr
Comment options

Thank you very much for the detailed answer!

Comment options

What is the difference between EvaAttention and AttentionRope?

Comment options

The beauty of open source, you can read the code...

Comment options

Sorry, I phrased my question poorly. I read the code and I see that they are different, but I don't understand why EvaAttention wasn't implemented as a special case of AttentionRope. They look very similar to me.

Please correct me if I'm wrong: The conceptual difference seem to be

  1. qkv bias: In EvaAttention there is the option to handle qkv_bias as a seperate addition. I have no idea what this achieves though.
  2. qk bias: In Eva Attention there is no k bias.
  3. Attention Mask: Attention Mask is handled differently. AttentionRope uses maybe_add_mask it assumes attn_mask to be 0 or -inf. EvaAttention assumes attn_mask to be boolean values and converts them to 0 and -inf.

Everything else seems to be the same ignoring variable / config names / default values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /