Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Unakar/Logic-RL

Repository files navigation

Logic-RL

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

News

[2025年03月20日] We release the ADORA: A Scalable Paradigm for Steering Learning Trajectories .

[2025年03月19日] For stable length control, refer to https://github.com/lblankl/Short-RL

Main results

Benchmark

Model 2ppl 3ppl 4ppl 5ppl 6ppl 7ppl 8ppl
o3-mini-high 0.99 0.98 0.97 0.95 0.94 0.89 0.83
o1-2024年12月17日 0.83 0.51 0.38 0.38 0.35 0.30 0.20
GPT-4o 0.68 0.57 0.49 0.32 0.23 0.21 0.11
Deepseek-Math-7b 0.35 0.21 0.08 0.06 0.02 0.00 0.00
Qwen2.5-7B-Instruct-1M 0.49 0.40 0.25 0.11 0.02 0.06 0.01
Qwen2.5-7B-Logic-RL (ours) 0.99 0.99 0.94 0.92 0.91 0.80 0.67

Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e . # For verl integration
pip install wandb IPython matplotlib

Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
 --local_dir {processed_data_path} \
 --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
 --template_type=qwen-instruct \
 --local_dir {processed_data_path} \
 --data_path {raw_data_path}

Training Execution

×ばつA100 80G">
conda activate logic
bash main_grpo.sh # ×ばつA100 80G

⚙️ Implementation Details

Component Location
Reward Modeling verl/utils/reward_score/kk.py
Data Preprocessing examples/data_preprocess/kk.py

Citation

@misc{xie2025logicrlunleashingllmreasoning,
 title={Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning}, 
 author={Tian Xie and Zitian Gao and Qingnan Ren and Haoming Luo and Yuqian Hong and Bryan Dai and Joey Zhou and Kai Qiu and Zhirong Wu and Chong Luo},
 year={2025},
 eprint={2502.14768},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 url={https://arxiv.org/abs/2502.14768}, 
}

Acknowledgements


Star History

Star History Chart

About

Reproduce R1 Zero on Logic Puzzle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /