reward-model

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

reinforcement-learning post-training self-correction reward-learning large-language-models llm llms reward-models reward-model reward-modeling guided-decoding test-time-scaling

Updated Jun 13, 2025

tongjingqi / Awesome-Agent-RL

Star 52

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

agent awesome reinforcement-learning rl awesome-list llm reward-model agentic-ai rlvr agent-training

Updated Sep 1, 2025

InternLM / Spark

Star 24

An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"

self-improvement multi-modal large-language-models vision-language-model reward-model large-vision-language-models self-rewarding math-reasoning

Updated Oct 23, 2025
Python

yeyimilk / CrowdVLM-R1

Star 20

Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.

reinforcement-learning vlm crowdcounting llm reward-model r1-zero vlm-r1 multimodal-r1

Updated Apr 11, 2025
Python

rochitasundar / Generative-AI-with-Large-Language-Models

Star 16

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

NiuTrans / GRAM

Star 15

Code for ICML 2025 paper "GRAM: A Generative Foundation Reward Model for Reward Generalization"

generative generalization rlhf reward-model

Updated Sep 4, 2025
Python

RewardSDS

itaychachy / RewardSDS

Star 12

Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!

ai computer-vision 3d-generation mechine-learning reward-model

Updated Jun 10, 2025
Python

AlignRM / CheemsRM

Star 10

ACL'25: Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

reinforcement-learning large-language-model reward-model

Updated Jun 10, 2025
Python

opendilab / LightRFT

Star 8

LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework

reinforcement-learning multi-modal vlm rft llm reward-model llm-training grpo dapo

Updated Dec 28, 2025
Python

Junpliu / DocReward

Star 8

DocReward: A Document Reward Model for Structuring and Stylizing

reinforcement-learning structure style document reward-model

Updated Dec 25, 2025
Python

hlp-ai / miniChatGPT

Star 6

Mini ChatGPT

pytorch ppo sft gpt2 chatgpt instructgpt reward-model

Updated May 12, 2023
Python

taishan1994 / Reward-Model-Finetuning

Star 4

专门用于训练奖励模型的仓库。

reward-model qwen2

Updated Aug 7, 2024
Python

kaicheng001 / Awesome-R1

Star 2

A curated list of research papers, models, and resources related to R1-style reasoning models following DeepSeek-R1's breakthrough in January 2025.

awesome thinking r1 vlm lmm llm mllm reward-model reasoning-models deepseek-r1

Updated Jul 2, 2025

techandy42 / LLM_Reward_Model

Star 2

Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.

language-model reward-model hfrl

Updated Dec 28, 2023
Jupyter Notebook

m-serious / module-reward-models

Star 1

This project implements a multi-module reward model training system designed to further RL finetune agent performance across complex multi-turn tasks.

reinforcement-learning-agent reward-model

Updated Aug 16, 2025
Python

kantkrishan0206-crypto / AlignGPT

Star 1

"This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs."

python nlp machine-learning deep-learning transformers pytorch alignment language-models tokenization ai-safety fine-tuning preference-learning ppo policy-optimization dpo human-feedback rlhf reward-model

Updated Oct 19, 2025
Jupyter Notebook

sebastianpinedaar / llumux

Star 1

Compose, train and test fast LLM routers

model-selection automl-pipeline large-language-models llms reward-model llm-training llm-inference llm-evaluation llm-pipeline llm-routing

Updated Oct 8, 2025
Python

Improve this page

Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reward-model

Here are 28 public repositories matching this topic...

wendell0218 / Awesome-RL-for-Video-Generation

modelscope / OpenJudge

Westlake-AI / SemiReward

bobxwu / learning-from-rewards-llm-papers

tongjingqi / Awesome-Agent-RL

InternLM / Spark

yeyimilk / CrowdVLM-R1

rochitasundar / Generative-AI-with-Large-Language-Models

NiuTrans / GRAM

itaychachy / RewardSDS

AlignRM / CheemsRM

opendilab / LightRFT

Junpliu / DocReward

hlp-ai / miniChatGPT

taishan1994 / Reward-Model-Finetuning

kaicheng001 / Awesome-R1

techandy42 / LLM_Reward_Model

m-serious / module-reward-models

kantkrishan0206-crypto / AlignGPT

sebastianpinedaar / llumux

Improve this page

Add this topic to your repo