You can install dependencies by running the following commands:
conda create -n tgpo python=3.10 conda activate tgpo pip install airports-py git clone https://github.com/helldog-star/TGPO git clone https://github.com/dottxt-ai/outlines cd outlines git checkout 0.0.46 pip install . cd ../TGPO/luffy pip install -r requirements.v2.txt pip install -e . cd verl pip install -e . cd ../.. pip install transformers==4.55.4
If you encounter issues when installing flash-attn, we recommend you to install it here flash-attn. For example, we use this version.
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl pip install flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
This repository includes:
luffy: Codes for training on-policy or mixed-policy (using off-policy reasoning traces) or on-policy distill models. Our main code changes are in luffy/verl/verl/mix_src.data: Data and code for training and evaluating LUFFY.exp_scripts: Example script to train models.eval_scripts: Evaluation scripts on math and out-of-distribution benchmarks.ExGRPO: Implementation and notes for ExGRPO, which leverages off-policy experience replay to further boost performance without external guidance.
我们的项目建立在luffy之上,感谢luffy的开源工作!
确认 data/download.sh 和 data/my_prepare_train.sh 中的 CONDA_SH_PATH / CONDA_ENV_NAME / BASE_DIR 即可
cd data
bash download.sh
bash my_prepare_train.sh