Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning

Notifications You must be signed in to change notification settings

nick7nlp/FastCuRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

95 Commits

Repository files navigation

FastCuRL: Curriculum Reinforcement Learning with Stage-wise
Context Scaling for Efficient LLM Reasoning

Paper Hugging Face

🎉News

  • [2025年05月23日] 🎉 We release FastCuRL-1.5B-V3 and FastCuRL-1.5B-V2. Specifically, FastCuRL-1.5B-V2 and FastCuRL-1.5B-V3 are models with continued training based on the Preview version.
  • [2025年03月17日] We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms 📈 the previous SOTA DeepScaleR-1.5B-Preview with 🚀 50% training steps! We propose a curriculum RL framework with stage-wise context scaling to achieve efficient training and concise CoT reasoning based on DeepSeek-R1-Distil-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

✨Key Results

We report Pass@1 accuracy averaged over 16 samples for each problem.

Model AIME 2024 MATH 500 AMC 2023 Minerva Math OlympiadBench Avg.
Qwen2.5-Math-7B-Instruct 13.3 79.8 50.6 34.6 40.7 43.8
rStar-Math-7B 26.7 78.4 47.5 - 47.1 -
Eurus-2-7B-PRIME 26.7 79.2 57.8 38.6 42.1 48.9
Qwen2.5-7B-SimpleRL 26.7 82.4 62.5 39.7 43.3 50.9
DeepSeek-R1-Distill-Qwen-1.5B 28.8 82.8 62.9 26.5 43.3 48.9
Still-1.5B 32.5 84.4 66.7 29.0 45.4 51.6
DeepScaleR-1.5B-Preview 43.1 87.8 73.6 30.2 50.0 57.0
FastCuRL-1.5B-Preview 43.1 88.0 74.2 31.6 50.4 57.5
FastCuRL-1.5B-V2 47.5 89.3 77.0 32.8 53.3 60.0
FastCuRL-1.5B-V3 49.6 90.5 78.5 34.7 54.5 61.6
Model Training Steps Training Stages Number of GPUs Used in Each Stage
DeepScaleR-1.5B-Preview ~1,750 3 8, 16, 32
FastCuRL-1.5B-Preview ~860 4 8, 8, 8, 8
FastCuRL-1.5B-V2 ~1,710 5 8, 8, 8, 8, 8
FastCuRL-1.5B-V3 ~2,620 5 8, 8, 8, 8, 8

Here, we uniformly set the batch size to 128 for counting training steps, meaning two steps with a batch size of 64 are counted as one with a batch size of 128.

🎯Getting Started

Installation

# Installing Python 3.10 Environment.
conda create -n rllm python=3.10 -y
conda activate rllm
# Installing RLLM dependencies.
cd rllm
pip install -e ./verl
pip install -e .

Training Data

Following DeepScaleR, our training dataset consists of 40,315 unique problem-answer pairs compiled from:

  • AIME problems (1984-2023)
  • AMC problems (before 2023)
  • Omni-MATH dataset
  • Still dataset

Entropy

Training Scripts

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS
# Run 8K context length training, 160 steps
bash ./scripts/train/run_fastcurl_1.5b_8k_stage1.sh | tee -a fastcurl-1.5b-stage1.log
# Run 16K context length training, 590 steps
bash ./scripts/train/run_fastcurl_1.5b_16k_stage2.sh | tee -a fastcurl-1.5b-stage2.log
# Run 24K context length training, 230 steps
bash ./scripts/train/run_fastcurl_1.5b_24k_stage3.sh | tee -a fastcurl-1.5b-stage3.log
# Run 16K context length training, 580 steps
bash ./scripts/train/run_fastcurl_1.5b_16k_stage4.sh | tee -a fastcurl-1.5b-stage4.log

Evaluate

python3 -m verl.trainer.main_generation \
 trainer.nnodes=1 \
 trainer.n_gpus_per_node=8 \
 data.path=./fastcurl/data/test/xxx.parquet \
 data.output_path=${OUTPUT_DIR}/xxx.parquet \
 data.n_samples=16 \
 data.batch_size=2048 I am running a few minutes late; my previous meeting is running over.
 
 model.path=${MODEL_PATH} \
 rollout.temperature=0.6 \
 rollout.response_length=32768 \
 rollout.top_k=-1 \
 rollout.top_p=1 \
 rollout.gpu_memory_utilization=0.9 \
 rollout.tensor_model_parallel_size=1

🎈Citation

If you find our work helpful, feel free to give us a cite.

@misc{fastcurl,
 title={FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models}, 
 author={Mingyang Song and Mao Zheng and Zheng Li and Wenjie Yang and Xuan Luo and Yue Pan and Feng Zhang},
 year={2025},
 eprint={2503.17287},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 url={https://arxiv.org/abs/2503.17287}, 
}

🌻Acknowledgements

  • Our model is trained on top of DeepSeek-R1-Distill-Qwen-1.5B.
  • Our training experiments are powered by our heavily modified fork of verl.
  • We directly use DeepScaleR's code to finish our experiments. However, we have modified parts of the code related to naming conflicts to avoid confusion.

About

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /