Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets	assets
src	src
README.md	README.md

Awesome-Slow-Reason-System

arXiv

Welcome to the repository for our survey paper, "The o1 Era: Recent Advances in Slow-Thinking Reasoning Systems". This repository provides resources and updates related to our research. For a detailed introduction, please refer to our survey paper.

📢 Latest News

2025.02: We released a survey paper "The o1 Era: Recent Advances in Slow-thinkingReasoning Systems". Feel free to cite or open pull requests.

📒 Table of Contents

Part 1: O1系统复现

Title	Venue	Date
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems	arXiv	2024-12
o1-Coder: an o1 Replication for Coding	arXiv	2024-12
Enhancing LLM Reasoning with Reward-guided Tree Search	arXiv	2024-11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions	arXiv	2024-11
O1 Replication Journey--Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?	arXiv	2024-11
O1 Replication Journey: A Strategic Progress Report -- Part 1	arXiv	2024-10

Part 2: PRM相关

Title	Venue	Date
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.	arXiv	2025-01
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding	arXiv	2025-01
The Lessons of Developing Process Reward Models in Mathematical Reasoning.	arXiv	2025-01
ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark.	arXiv	2025-01
AutoPSV: Automated Process-Supervised Verifier	NeurIPS	2024-12
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search	NeurIPS	2024-12
Free Process Rewards without Process Labels.	arXiv	2024-12
Outcome-Refining Process Supervision for Code Generation	arXiv	2024-12
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	ACL	2024-08
OVM: Outcome-supervised Value Models for Planning in Mathematical Reasoning	ACL Findings	2024-08
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs	arXiv	2024-06
Let's Verify Step by Step.	ICLR	2024-05
Improve Mathematical Reasoning in Language Models by Automated Process Supervision	arXiv	2023-06
Making Large Language Models Better Reasoners with Step-Aware Verifier	arXiv	2023-06
Solving Math Word Problems with Process and Outcome-Based Feedback	arXiv	2022-11

Part 3: LLM RL Supervise

Title	Venue	Date
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search	arXiv	2025-02
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling	arXiv	2025-01
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies	arXiv	2025-01
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	arXiv	2025-01
Kimi k1.5: Scaling Reinforcement Learning with LLMs	arXiv	2025-01
Does RLHF Scale? Exploring the Impacts From Data, Model, and Method	arXiv	2024-12
Offline Reinforcement Learning for LLM Multi-Step Reasoning	arXiv	2024-12
ReFT: Representation Finetuning for Language Models	ACL	2024-08
Deepseekmath: Pushing the limits of mathematical reasoning in open language models	arXiv	2024-02

Part 4: MCTS/Tree Search相关

Title	Venue	Date
On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes	IEEE TAC	2025-01
Search-o1: Agentic Search-Enhanced Large Reasoning Models	arXiv	2025-01
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking	arXiv	2025-01
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search	NeurIPS	2024-12
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning	arXiv	2024-12
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs	arXiv	2024-12
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search	arXiv	2024-12
Proposing and solving olympiad geometry with guided tree search	arXiv	2024-12
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models	arXiv	2024-12
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning	arXiv	2024-12
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models	arXiv	2024-11
GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection	arXiv	2024-11
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree	arXiv	2024-11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions	arXiv	2024-11
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation	arXiv	2024-11
Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding	CoLM	2024-10
AFlow: Automating Agentic Workflow Generation	arXiv	2024-10
Interpretable Contrastive Monte Carlo Tree Search Reasoning	arXiv	2024-10
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning	arXiv	2024-10
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning	arXiv	2024-10
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	arXiv	2024-10
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination	arXiv	2024-10
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation	arXiv	2024-09
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search	arXiv	2024-08
LiteSearch: Efficacious Tree Search for LLM	arXiv	2024-07
Tree Search for Language Model Agents	arXiv	2024-07
Uncertainty-Guided Optimization on Large Language Model Search Trees	arXiv	2024-07
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B	arXiv	2024-06
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping	ICLR WorkShop	2024-05
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models	ICLR WorkShop	2024-05
AlphaMath Almost Zero: process Supervision without process	arXiv	2024-05
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search	arXiv	2024-05
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time	arXiv	2024-05
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	arXiv	2024-05
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	arXiv	2024-05
Stream of Search (SoS): Learning to Search in Language	arXiv	2024-04
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing	arXiv	2024-04
Reasoning with Language Model is Planning with World Model	EMNLP	2023-12
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning	NeurIPS	2023-12
ALPHAZERO-LIKE TREE-SEARCH CAN GUIDE LARGE LANGUAGE MODEL DECODING AND TRAINING	NeurIPS WorkShop	2023-12
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training	NeurIPS WorkShop	2023-12
MAKING PPO EVEN BETTER: VALUE-GUIDED MONTE-CARLO TREE SEARCH DECODING	arXiv	2023-09

Part 5: Self-Training/Improve

Title	Venue	Date
Small LLMs Can Master Reasoning with Self-Evolved Deep Thinking (Rstar-Math)	arXiv	2025-01
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search	NeurIPS	2024-12
Recursive Introspection: Teaching Language Model Agents How to Self-Improve	NeurIPS	2024-12
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner	arXiv	2024-12
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models	TMLR	2024-09
ReFT: Representation Finetuning for Language Models	ACL	2024-08
Interactive Evolution: A Neural-Symbolic Self-Training Framework for Large Language Models	arXiv	2024-06
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing	ICLR	2024-05
Enhancing Large Vision Language Models with Self-Training on Image Comprehension	arXiv	2024-05
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking	arXiv	2024-03
V-star: Training Verifiers for Self-Taught Reasoners	arXiv	2024-02
Self-Refine: Iterative Refinement with Self-Feedback	NeurIPS	2023-12
ReST: Reinforced Self-Training for Language Modeling	arXiv	2023-08
STaR: Bootstrapping Reasoning With Reasoning	NeurIPS2022	2022-05
Expert Iteration: Thinking Fast and Slow with Deep Learning and Tree Search	NeurIPS	2017-12

Part 6: 特殊Action/反思相关

Title	Venue	Date
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs	arXiv	2024-12
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning	arXiv	2024-11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step	arXiv	2024-11
Vision-Language Models Can Self-Improve Reasoning via Reflection	arXiv	2024-11
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers	arXiv	2024-08
Reflection-Tuning: An Approach for Data Recycling	arXiv	2023-10

Part 7: 高效System2:system1/system2 balance; hidden CoT; CoT压缩

Title	Venue	Date
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning	arXiv	2025-01
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking	arXiv	2025-01
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models	EMNLP	2024-12
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner	arXiv	2024-12
Token-Budget-Aware LLM Reasoning	arXiv	2024-12
Training Large Language Models to Reason in a Continuous Latent Space	arXiv	2024-12
Guiding Language Model Reasoning with Planning Tokens	CoLM	2024-10

Part 8: Slow-Fast系统的可解释性

Title	Venue	Date
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture	NeurIPS WorkShop	2024-12
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective	arXiv	2024-10
When a Language Model is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1	arXiv	2024-10
The Impact of Reasoning Step Length on Large Language Models	ACL Findings	2024-08
Distilling System 2 into System 1	arXiv	2024-07
System 2 Attention (is something you might need too)	arXiv	2023-11

Part 9: 多模态/Agent相关的Slow Fast System

Title	Venue	Date
Visual Agents as Fast and Slow Thinkers	ICLR2025	2025-01
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM	arXiv	2025-01
Diving into Self-Evolving Training for Multimodal Reasoning	ICLR2025	2024-12
Scaling Inference-Time Search With Vision Value Model for Improved Visual Comprehension	arXiv	2024-12
Slow Perception: Let's Perceive Geometric Figures Step-by-Step	arXiv	2024-12
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning	arXiv	2024-11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step	arXiv	2024-11
Vision-Language Models Can Self-Improve Reasoning via Reflection	arXiv	2024-11

Part 10: LLM-Driven SRS System Benchmark

Title	Venue	Date
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models	arXiv	2025-01
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs	NeurIPS	2024-12
Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs	arXiv	2024-12
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?	arXiv	2024-09

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BuaaLearn/Awesome-Slow-Reason-System

Folders and files

Latest commit

History

Repository files navigation

Awesome-Slow-Reason-System

📢 Latest News

📒 Table of Contents

Part 1: O1系统复现

Part 2: PRM相关

Part 3: LLM RL Supervise

Part 4: MCTS/Tree Search相关

Part 5: Self-Training/Improve

Part 6: 特殊Action/反思相关

Part 7: 高效System2:system1/system2 balance; hidden CoT; CoT压缩

Part 8: Slow-Fast系统的可解释性

Part 9: 多模态/Agent相关的Slow Fast System

Part 10: LLM-Driven SRS System Benchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

BuaaLearn/Awesome-Slow-Reason-System

Folders and files

Latest commit

History

Repository files navigation

Awesome-Slow-Reason-System

📢 Latest News

📒 Table of Contents

Part 1: O1系统复现

Part 2: PRM相关

Part 3: LLM RL Supervise

Part 4: MCTS/Tree Search相关

Part 5: Self-Training/Improve

Part 6: 特殊Action/反思相关

Part 7: 高效System2:system1/system2 balance; hidden CoT; CoT压缩

Part 8: Slow-Fast系统的可解释性

Part 9: 多模态/Agent相关的Slow Fast System

Part 10: LLM-Driven SRS System Benchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages