Name	Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows	.github/workflows
assets	assets
data	data
docs	docs
raptor	raptor
tests	tests
.bumpversion.cfg	.bumpversion.cfg
.gitignore	.gitignore
.pylintrc	.pylintrc
LICENSE	LICENSE
README.md	README.md
environment.yml	environment.yml
setup.py	setup.py
tox.ini	tox.ini

Block-Recurrent Dynamics in ViTs (Raptor)

tests arXiv

Mozes Jacobs $^{\star1}$ Thomas Fel $^{\star1}$ Richard Hakim $^{\star1}$
Alessandra Brondetta $^{2}$ Demba Ba $^{1,3}$ T. Andy Keller $^{1}$

$^1$Kempner Institute, Harvard University $^2$Osnabrück University $^3$Harvard University

tl;dr Our work introduces the Block-Recurrent Hypothesis (BRH), by noticing that foundation models like DINOv2 can be rewritten using only two recurrent blocks to recover 96% of the original accuracy. We leverage our framework and explore a Dynamical Interpretability approach where we interpret token evolution through layers as trajectories and show that they converge into class-dependent angular basins while late-stage updates collapse into low-rank attractors.

Ultimately, the study reveals that Vision Transformers seems to naturally converge toward compact, iterative programs instead of unique layer-by-layer transformations (indicating a lower algorithmic complexity / Kolmogorov complexity).

Setup

Environment

To run the code, you will need to create a mamba (or conda) environment from the environment.yml file. Create and activate the environment with

mamba env create -f environment.yml
mamba activate raptor

Paths

Edit src/paths.py to have the correct absolute paths to different datasets.

Extracting DINOv2 Activations for ImageNet-1k

For ImageNet, we precompute the DINOv2 activations so that Raptor can train faster. We provide a script to extract the activations from the ImageNet-1k dataset. This script is available in the data directory. This script takes around 5 hours to run on 1 H100 GPU, and storing the activations requires a lot of disk space.

cd data
python precompute_dinov2_act.py

Download Pretrained Classifiers

Download the DINOv2 linear heads from Meta's repository. These are used during training of Raptor.

cd src
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_reg4_linear_head.pth
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_reg4_linear_head.pth
cp dinov2_vitb14_reg4_linear_head.pth imagenet_probes/dinov2_vitb14_reg4_linear_head.pth
cp dinov2_vits14_reg4_linear_head.pth imagenet_probes/dinov2_vits14_reg4_linear_head.pth

Usage Example

Raptor training follows 4 main steps. Here, we show example usage for a 3-block Raptor:

Determine max-cut segmentations. This has been done for you in src/000_max_cut_dinov2_base.ipynb.
Train each block independently.

cd src
python trainer.py --teacher_force --mse --sigma 0 --lr 3e-4 --wandb --t_scale --swiglu --start_layer 0 --end_layer 7 --seed 100
python trainer.py --teacher_force --mse --sigma 0 --lr 3e-4 --wandb --t_scale --swiglu --start_layer 7 --end_layer 10 --seed 101
python trainer.py --teacher_force --weighted --sigma 0 --lr 3e-4 --wandb --t_scale --swiglu --start_layer 10 --end_layer 12 --seed 104

Train the full model with the pretrained blocks.

cd src
BP1="final_weighted_False_autoregressive_False_distillation_False_teacher_True_mse_True_cosine_False_t_scale_True_swiglu_True_sigma_0.0_start_0_end_7_lr_0.0003_cls_weight_0.34_reg_weight_0.33_patch_weight_0.33_seed_100_step_312500.pt"
BP2="final_weighted_False_autoregressive_False_distillation_False_teacher_True_mse_True_cosine_False_t_scale_True_swiglu_True_sigma_0.0_start_7_end_10_lr_0.0003_cls_weight_0.34_reg_weight_0.33_patch_weight_0.33_seed_101_step_312500.pt"
BP3="final_weighted_True_autoregressive_False_distillation_False_teacher_True_mse_False_cosine_False_t_scale_True_swiglu_True_sigma_0.0_start_10_end_12_lr_0.0003_cls_weight_0.34_reg_weight_0.33_patch_weight_0.33_seed_104_step_312500.pt"
python trainer.py --raptor3 --autoreg --weighted --sigma 0 --lr 3e-4 --wandb --t_scale --swiglu --start_layer 0 --end_layer 12 --cls_weight 0.45 --reg_weight 0.10 --patch_weight 0.45 --bp1 $BP1 --bp2 $BP2 --bp3 $BP3 --seed 1101

Train linear probes on the frozen pretrained checkpoints.

cd src/imagenet_probes
python train_probe.py --variant raptor3 --model_seed 1101 --seed 4005

cd src/ade20k_probes
python train_probe.py --variant raptor3 --model_seed 1101 --seed 5005

cd src/nyud_probes
python train_probe.py --variant raptor3 --model_seed 1101 --seed 6005

Reproducing Foundation Models Results (Section 3)

To reproduce the results for the foundation models section (Table 1 and Figure 7), do the following:

Determine max-cut segmentations. This has been done for you in src/max_cut_dinov2_base.ipynb.
Train each block independently.

cd src/runs
sbatch blocks.sh

Train the full model with the pretrained blocks.

cd src/runs
sbatch 002_raptor2_pretrained.sh
sbatch 003_raptor3_pretrained.sh
sbatch 004_raptor4_pretrained.sh

Train linear probes on the frozen pretrained checkpoints.

cd src/ade20k_probes
sbatch run_all.sh

cd src/imagenet_probes
sbatch run_all.sh

cd src/nyud_probes
sbatch run_all.sh

Table 1

cd src
python aggregate_results.py

Figure 7 Run the notebook in src/imagenet_probes/101_eval_error_bars.ipynb.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

KempnerInstitute/raptor

Folders and files

Latest commit

History

Repository files navigation

Block-Recurrent Dynamics in ViTs (Raptor)

Setup

Environment

Paths

Extracting DINOv2 Activations for ImageNet-1k

Download Pretrained Classifiers

Usage Example

Reproducing Foundation Models Results (Section 3)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

KempnerInstitute/raptor

Folders and files

Latest commit

History

Repository files navigation

Block-Recurrent Dynamics in ViTs (Raptor)

Setup

Environment

Paths

Extracting DINOv2 Activations for ImageNet-1k

Download Pretrained Classifiers

Usage Example

Reproducing Foundation Models Results (Section 3)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages