Name	Name	Last commit message	Last commit date
Latest commit History 20 Commits
ExGRPO	ExGRPO
config	config
data	data
eval_scripts	eval_scripts
exp_scripts	exp_scripts
exp_scripts_qwen2d5_math_1d5b	exp_scripts_qwen2d5_math_1d5b
exp_scripts_qwen2d5_math_7b	exp_scripts_qwen2d5_math_7b
exp_scripts_useless	exp_scripts_useless
figures	figures
luffy	luffy
.DS_Store	.DS_Store
.gitignore	.gitignore
README.md	README.md
environment.yml	environment.yml
run_train_merge_eval.sh	run_train_merge_eval.sh
train.sh	train.sh

Name

Last commit message

Last commit date

Latest commit

History

exp_scripts_qwen2d5_math_1d5b

exp_scripts_qwen2d5_math_7b

run_train_merge_eval.sh

train.sh

✨Getting Started

Installation

You can install dependencies by running the following commands:

conda create -n tgpo python=3.10
conda activate tgpo
pip install airports-py
git clone https://github.com/helldog-star/TGPO
git clone https://github.com/dottxt-ai/outlines
cd outlines
git checkout 0.0.46
pip install .
cd ../TGPO/luffy
pip install -r requirements.v2.txt
pip install -e .
cd verl
pip install -e .
cd ../..
pip install transformers==4.55.4

If you encounter issues when installing flash-attn, we recommend you to install it here flash-attn. For example, we use this version.

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Repo Structure

This repository includes:

luffy: Codes for training on-policy or mixed-policy (using off-policy reasoning traces) or on-policy distill models. Our main code changes are in luffy/verl/verl/mix_src.
data: Data and code for training and evaluating LUFFY.
exp_scripts: Example script to train models.
eval_scripts: Evaluation scripts on math and out-of-distribution benchmarks.
ExGRPO: Implementation and notes for ExGRPO, which leverages off-policy experience replay to further boost performance without external guidance.

我们的项目建立在luffy之上,感谢luffy的开源工作!

🔧Usage

Model and Dataset Preparation

确认 data/download.sh 和 data/my_prepare_train.sh 中的 CONDA_SH_PATH / CONDA_ENV_NAME / BASE_DIR 即可

cd data
bash download.sh
bash my_prepare_train.sh

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helldog-star/TGPO

Folders and files

Latest commit

History

Repository files navigation

✨Getting Started

Installation

Repo Structure

🔧Usage

Model and Dataset Preparation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨Getting Started

Installation

Repo Structure

🔧Usage

Model and Dataset Preparation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages