GitHub - ginwind/VLA-JEPA: VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Name	Name	Last commit message	Last commit date
Latest commit History 21 Commits
deployment	deployment
examples	examples
scripts	scripts
starVLA	starVLA
README.md	README.md
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Paper PDF Project Page Hugging Face Code License

⭐ If our project helps you, please give us a star on GitHub to support us!

TODO

Partial training code
LIBERO evaluation code
LIBERO-Plus evaluation code
SimplerEnv evaluation code
Training codes for custom datasets

Environment Setup

git clone https://github.com/ginwind/VLA-JEPA
# Create conda environment
conda create -n VLA_JEPA python=3.10 -y
conda activate VLA_JEPA
# Install requirements
pip install -r requirements.txt
# Install FlashAttention2
pip install flash-attn --no-build-isolation
# Install project
pip install -e .

This repository's code is based on the starVLA.

Training

0️⃣ Pretrained Model Preparation

Download the Qwen3-VL-2B and the V-JEPA2 encoder.

1️⃣ Data Preparation

Download the following datasets:

2️⃣ Start Training

Depending on whether you are conducting pre-training or post-training, select the appropriate training script and YAML configuration file from the /scripts directory.

Ensure the following configurations are updated in the YAML file:

framework.qwenvl.basevlm and framework.vj2_model.base_encoder should be set to the paths of your respective checkpoints.
Update datasets.vla_data.data_root_dir, datasets.video_data.video_dir, and datasets.video_data.text_file to match the paths of your datasets.

Once the configurations are updated, you can proceed to start the training process.

Evaluation

Download the model checkpoints from Hugging Face: https://huggingface.co/ginwind/VLA-JEPA

Environment: Install the required Python packages into your VLA-JEPA environment:

pip install tyro matplotlib mediapy websockets msgpack
pip install numpy==1.24.4

LIBERO

LIBERO setup: Prepare the LIBERO benchmark in a separate conda environment following the official LIBERO instructions: https://github.com/Lifelong-Robot-Learning/LIBERO
Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:
- framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
- framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
Evaluation script: Edit examples/LIBERO/eval_libero.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.
Run evaluation: Launch the evaluation (the script runs the four task suites in parallel across 4 GPUs):

bash ./examples/LIBERO/eval_libero.sh

LIBERO-Plus

LIBERO-Plus setup: Clone the LIBERO-Plus repository: https://github.com/sylvestf/LIBERO-plus. In ./examples/LIBERO-Plus/libero_plus_init.py, update line 121 to point to your LIBERO-Plus/libero/libero/benchmark/task_classification.json. Replace the original LIBERO-Plus/libero/libero/benchmark/__init__.py with the provided modified implementation (see ./examples/LIBERO-Plus/libero_plus_init.py) to enable evaluation over perturbation dimensions. Finally, follow the official LIBERO-Plus installation instructions and build the benchmark in a separate conda environment.
Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:
- framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
- framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
Evaluation script: Edit examples/LIBERO-Plus/eval_libero_plus.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO-Plus code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO-Plus conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.
Run evaluation: Launch the evaluation (the script runs the seven pertubation dimensions in parallel across 7 GPUs):

bash ./examples/LIBERO-Plus/eval_libero_plus.sh

Notes: Ensure each process has access to a GPU and verify that all checkpoint paths in the configuration files are correct before running the evaluation.

Acknowledgement

We extend our sincere gratitude to the starVLA project and the V-JEPA2 project for their invaluable open-source contributions.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ginwind/VLA-JEPA

Folders and files

Latest commit

History

Repository files navigation

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

TODO

Environment Setup

Training

0️⃣ Pretrained Model Preparation

1️⃣ Data Preparation

2️⃣ Start Training

Evaluation

LIBERO

LIBERO-Plus

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

ginwind/VLA-JEPA

Folders and files

Latest commit

History

Repository files navigation

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

TODO

Environment Setup

Training

0️⃣ Pretrained Model Preparation

1️⃣ Data Preparation

2️⃣ Start Training

Evaluation

LIBERO

LIBERO-Plus

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages