Paper PDF Project Page Hugging Face Code License
⭐ If our project helps you, please give us a star on GitHub to support us!
- Partial training code
- LIBERO evaluation code
- LIBERO-Plus evaluation code
- SimplerEnv evaluation code
- Training codes for custom datasets
git clone https://github.com/ginwind/VLA-JEPA
# Create conda environment
conda create -n VLA_JEPA python=3.10 -y
conda activate VLA_JEPA
# Install requirements
pip install -r requirements.txt
# Install FlashAttention2
pip install flash-attn --no-build-isolation
# Install project
pip install -e .
This repository's code is based on the starVLA.
Download the Qwen3-VL-2B and the V-JEPA2 encoder.
Download the following datasets:
Depending on whether you are conducting pre-training or post-training, select the appropriate training script and YAML configuration file from the /scripts directory.
Ensure the following configurations are updated in the YAML file:
framework.qwenvl.basevlmandframework.vj2_model.base_encodershould be set to the paths of your respective checkpoints.- Update
datasets.vla_data.data_root_dir,datasets.video_data.video_dir, anddatasets.video_data.text_fileto match the paths of your datasets.
Once the configurations are updated, you can proceed to start the training process.
Download the model checkpoints from Hugging Face: https://huggingface.co/ginwind/VLA-JEPA
Environment: Install the required Python packages into your VLA-JEPA environment:
pip install tyro matplotlib mediapy websockets msgpack pip install numpy==1.24.4
-
LIBERO setup: Prepare the LIBERO benchmark in a separate conda environment following the official LIBERO instructions: https://github.com/Lifelong-Robot-Learning/LIBERO
-
Configuration: In the downloaded checkpoint folder, update
config.jsonandconfig.yamlto point the following fields to your local checkpoints:framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpointframework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
-
Evaluation script: Edit
examples/LIBERO/eval_libero.shand set theLIBERO_HOMEenvironment variable (line 4) to your local LIBERO code path, and set thesim_pythonvariable (line 9) to the Python executable of the LIBERO conda environment. Finally, set theyour_ckptvariable (line 11) to the path of the downloadedLIBERO/checkpoints/VLA-JEPA-LIBERO.pt. -
Run evaluation: Launch the evaluation (the script runs the four task suites in parallel across 4 GPUs):
bash ./examples/LIBERO/eval_libero.sh
-
LIBERO-Plus setup: Clone the LIBERO-Plus repository: https://github.com/sylvestf/LIBERO-plus. In
./examples/LIBERO-Plus/libero_plus_init.py, update line 121 to point to yourLIBERO-Plus/libero/libero/benchmark/task_classification.json. Replace the originalLIBERO-Plus/libero/libero/benchmark/__init__.pywith the provided modified implementation (see./examples/LIBERO-Plus/libero_plus_init.py) to enable evaluation over perturbation dimensions. Finally, follow the official LIBERO-Plus installation instructions and build the benchmark in a separate conda environment. -
Configuration: In the downloaded checkpoint folder, update
config.jsonandconfig.yamlto point the following fields to your local checkpoints:framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpointframework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
-
Evaluation script: Edit
examples/LIBERO-Plus/eval_libero_plus.shand set theLIBERO_HOMEenvironment variable (line 4) to your local LIBERO-Plus code path, and set thesim_pythonvariable (line 9) to the Python executable of the LIBERO-Plus conda environment. Finally, set theyour_ckptvariable (line 11) to the path of the downloadedLIBERO/checkpoints/VLA-JEPA-LIBERO.pt. -
Run evaluation: Launch the evaluation (the script runs the seven pertubation dimensions in parallel across 7 GPUs):
bash ./examples/LIBERO-Plus/eval_libero_plus.sh
Notes: Ensure each process has access to a GPU and verify that all checkpoint paths in the configuration files are correct before running the evaluation.
We extend our sincere gratitude to the starVLA project and the V-JEPA2 project for their invaluable open-source contributions.