Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BuaaLearn/Awesome-Slow-Reason-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

2 Commits

Repository files navigation

Awesome-Slow-Reason-System

arXiv

Welcome to the repository for our survey paper, "The o1 Era: Recent Advances in Slow-Thinking Reasoning Systems". This repository provides resources and updates related to our research. For a detailed introduction, please refer to our survey paper.

📢 Latest News

📒 Table of Contents

Part 1: O1系统复现

Title Venue Date
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems arXiv 2024-12
o1-Coder: an o1 Replication for Coding arXiv 2024-12
Enhancing LLM Reasoning with Reward-guided Tree Search arXiv 2024-11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions arXiv 2024-11
O1 Replication Journey--Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? arXiv 2024-11
O1 Replication Journey: A Strategic Progress Report -- Part 1 arXiv 2024-10

Part 2: PRM相关

Title Venue Date
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models. arXiv 2025-01
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding arXiv 2025-01
The Lessons of Developing Process Reward Models in Mathematical Reasoning. arXiv 2025-01
ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark. arXiv 2025-01
AutoPSV: Automated Process-Supervised Verifier NeurIPS 2024-12
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search NeurIPS 2024-12
Free Process Rewards without Process Labels. arXiv 2024-12
Outcome-Refining Process Supervision for Code Generation arXiv 2024-12
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations ACL 2024-08
OVM: Outcome-supervised Value Models for Planning in Mathematical Reasoning ACL Findings 2024-08
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs arXiv 2024-06
Let's Verify Step by Step. ICLR 2024-05
Improve Mathematical Reasoning in Language Models by Automated Process Supervision arXiv 2023-06
Making Large Language Models Better Reasoners with Step-Aware Verifier arXiv 2023-06
Solving Math Word Problems with Process and Outcome-Based Feedback arXiv 2022-11

Part 3: LLM RL Supervise

Title Venue Date
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search arXiv 2025-02
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling arXiv 2025-01
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies arXiv 2025-01
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning arXiv 2025-01
Kimi k1.5: Scaling Reinforcement Learning with LLMs arXiv 2025-01
Does RLHF Scale? Exploring the Impacts From Data, Model, and Method arXiv 2024-12
Offline Reinforcement Learning for LLM Multi-Step Reasoning arXiv 2024-12
ReFT: Representation Finetuning for Language Models ACL 2024-08
Deepseekmath: Pushing the limits of mathematical reasoning in open language models arXiv 2024-02

Part 4: MCTS/Tree Search相关

Title Venue Date
On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes IEEE TAC 2025-01
Search-o1: Agentic Search-Enhanced Large Reasoning Models arXiv 2025-01
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking arXiv 2025-01
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search NeurIPS 2024-12
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning arXiv 2024-12
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs arXiv 2024-12
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search arXiv 2024-12
Proposing and solving olympiad geometry with guided tree search arXiv 2024-12
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models arXiv 2024-12
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning arXiv 2024-12
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models arXiv 2024-11
GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection arXiv 2024-11
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree arXiv 2024-11
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions arXiv 2024-11
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation arXiv 2024-11
Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding CoLM 2024-10
AFlow: Automating Agentic Workflow Generation arXiv 2024-10
Interpretable Contrastive Monte Carlo Tree Search Reasoning arXiv 2024-10
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning arXiv 2024-10
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning arXiv 2024-10
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling arXiv 2024-10
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination arXiv 2024-10
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation arXiv 2024-09
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search arXiv 2024-08
LiteSearch: Efficacious Tree Search for LLM arXiv 2024-07
Tree Search for Language Model Agents arXiv 2024-07
Uncertainty-Guided Optimization on Large Language Model Search Trees arXiv 2024-07
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B arXiv 2024-06
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping ICLR WorkShop 2024-05
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models ICLR WorkShop 2024-05
AlphaMath Almost Zero: process Supervision without process arXiv 2024-05
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search arXiv 2024-05
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time arXiv 2024-05
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning arXiv 2024-05
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning arXiv 2024-05
Stream of Search (SoS): Learning to Search in Language arXiv 2024-04
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing arXiv 2024-04
Reasoning with Language Model is Planning with World Model EMNLP 2023-12
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning NeurIPS 2023-12
ALPHAZERO-LIKE TREE-SEARCH CAN GUIDE LARGE LANGUAGE MODEL DECODING AND TRAINING NeurIPS WorkShop 2023-12
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training NeurIPS WorkShop 2023-12
MAKING PPO EVEN BETTER: VALUE-GUIDED MONTE-CARLO TREE SEARCH DECODING arXiv 2023-09

Part 5: Self-Training/Improve

Title Venue Date
Small LLMs Can Master Reasoning with Self-Evolved Deep Thinking (Rstar-Math) arXiv 2025-01
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search NeurIPS 2024-12
Recursive Introspection: Teaching Language Model Agents How to Self-Improve NeurIPS 2024-12
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner arXiv 2024-12
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models TMLR 2024-09
ReFT: Representation Finetuning for Language Models ACL 2024-08
Interactive Evolution: A Neural-Symbolic Self-Training Framework for Large Language Models arXiv 2024-06
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing ICLR 2024-05
Enhancing Large Vision Language Models with Self-Training on Image Comprehension arXiv 2024-05
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking arXiv 2024-03
V-star: Training Verifiers for Self-Taught Reasoners arXiv 2024-02
Self-Refine: Iterative Refinement with Self-Feedback NeurIPS 2023-12
ReST: Reinforced Self-Training for Language Modeling arXiv 2023-08
STaR: Bootstrapping Reasoning With Reasoning NeurIPS2022 2022-05
Expert Iteration: Thinking Fast and Slow with Deep Learning and Tree Search NeurIPS 2017-12

Part 6: 特殊Action/反思相关

Title Venue Date
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs arXiv 2024-12
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning arXiv 2024-11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step arXiv 2024-11
Vision-Language Models Can Self-Improve Reasoning via Reflection arXiv 2024-11
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers arXiv 2024-08
Reflection-Tuning: An Approach for Data Recycling arXiv 2023-10

Part 7: 高效System2:system1/system2 balance; hidden CoT; CoT压缩

Title Venue Date
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning arXiv 2025-01
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking arXiv 2025-01
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models EMNLP 2024-12
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoner arXiv 2024-12
Token-Budget-Aware LLM Reasoning arXiv 2024-12
Training Large Language Models to Reason in a Continuous Latent Space arXiv 2024-12
Guiding Language Model Reasoning with Planning Tokens CoLM 2024-10

Part 8: Slow-Fast系统的可解释性

Title Venue Date
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture NeurIPS WorkShop 2024-12
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective arXiv 2024-10
When a Language Model is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1 arXiv 2024-10
The Impact of Reasoning Step Length on Large Language Models ACL Findings 2024-08
Distilling System 2 into System 1 arXiv 2024-07
System 2 Attention (is something you might need too) arXiv 2023-11

Part 9: 多模态/Agent相关的Slow Fast System

Title Venue Date
Visual Agents as Fast and Slow Thinkers ICLR2025 2025-01
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM arXiv 2025-01
Diving into Self-Evolving Training for Multimodal Reasoning ICLR2025 2024-12
Scaling Inference-Time Search With Vision Value Model for Improved Visual Comprehension arXiv 2024-12
Slow Perception: Let's Perceive Geometric Figures Step-by-Step arXiv 2024-12
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning arXiv 2024-11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step arXiv 2024-11
Vision-Language Models Can Self-Improve Reasoning via Reflection arXiv 2024-11

Part 10: LLM-Driven SRS System Benchmark

Title Venue Date
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models arXiv 2025-01
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs NeurIPS 2024-12
Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs arXiv 2024-12
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? arXiv 2024-09

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%

AltStyle によって変換されたページ (->オリジナル) /