Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

helldog-star/TGPO

Repository files navigation

✨Getting Started

Installation

You can install dependencies by running the following commands:

conda create -n tgpo python=3.10
conda activate tgpo
pip install airports-py
git clone https://github.com/helldog-star/TGPO
git clone https://github.com/dottxt-ai/outlines
cd outlines
git checkout 0.0.46
pip install .
cd ../TGPO/luffy
pip install -r requirements.v2.txt
pip install -e .
cd verl
pip install -e .
cd ../..
pip install transformers==4.55.4

If you encounter issues when installing flash-attn, we recommend you to install it here flash-attn. For example, we use this version.

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Repo Structure

This repository includes:

  • luffy: Codes for training on-policy or mixed-policy (using off-policy reasoning traces) or on-policy distill models. Our main code changes are in luffy/verl/verl/mix_src.
  • data: Data and code for training and evaluating LUFFY.
  • exp_scripts: Example script to train models.
  • eval_scripts: Evaluation scripts on math and out-of-distribution benchmarks.
  • ExGRPO: Implementation and notes for ExGRPO, which leverages off-policy experience replay to further boost performance without external guidance.

我们的项目建立在luffy之上,感谢luffy的开源工作!


🔧Usage

Model and Dataset Preparation

确认 data/download.sh 和 data/my_prepare_train.sh 中的 CONDA_SH_PATH / CONDA_ENV_NAME / BASE_DIR 即可

cd data
bash download.sh
bash my_prepare_train.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /