preference-optimization

🔬 Official implementation of ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection (ICLR 2026). Novel multimodal RL approach for interpretable and explainable content moderation.

multimodal-learning explainable-ai content-moderation vision-language-models preference-optimization grpo iclr-2026 hateful-meme-detection multimodal-rl

Updated Mar 1, 2026
Python

pilancilab / COALA

Star 6

Convex Optimization for Alignment and Preference Learning on a Single GPU

convex-optimization convex preference-learning llms preference-optimization

Updated May 28, 2026
Python

shaheennabi / open-posttraining-system

Sponsor

Star 4

Open-source research engineering project for building the end-to-end post-training stack for reasoning language models, including SFT, preference learning, RLHF/RLVR, evaluation, inference-time scaling, and scalable systems for frontier-level reasoning.

open-source evaluation inference text-generation benchmarks post-training rlhf reward-modeling preference-optimization supervised-fine-tuning inference-time-scaling open-post-training-system

Updated Jun 18, 2026
Jupyter Notebook

Yellow4Submarine7 / LLMDoctor

Star 3

🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment (AAAI 2026)

transformers pytorch alignment lora llm qwen preference-optimization aaai2026 test-time-alignment

Updated Jan 17, 2026
Python

jmagly / aiwg-training

Sponsor

Star 1

AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.

provenance synthetic-data training-data fine-tuning dpo model-cards decontamination dataset-curation sharegpt llm-training preference-optimization alpaca-format aiwg benchmark-contamination datasheets-for-datasets

Updated Apr 16, 2026
Python

martimfasantos / CustomPOs-for-SLMs

Star 1

Novel Preference Optimization Algorithms for state-of-the-art small LMs, enhancing performance in GenAI and NLP tasks

nlp evaluation preference-learning human-preferences llms gen-ai preference-optimization

Updated Jan 5, 2025
Python

YuanaHao / Awesome-Diffusion-RL

Star 1

A weekly updated awesome list of RL, RLHF, DPO, GRPO, reward models, and preference optimization for image and video diffusion generation.

reinforcement-learning image-generation awesome-list video-generation diffusion-models rlhf preference-optimization grpo

Updated May 19, 2026

runhaoli-creator / dmapo

Star 1

Direct multi-agent policy optimization — unified DPO/KTO/ORPO/SimPO framework.

multi-agent post-training dpo trl llm rlhf preference-optimization

Updated Apr 14, 2026
Python

Sonnet-Code / awesome-human-in-the-loop

Star 0

Curated list of papers, tools, datasets and resources for human-in-the-loop AI training — SFT, RLHF, preference optimization, red-teaming, evaluation.

awesome awesome-list model-evaluation human-in-the-loop red-teaming dpo ai-training llm rlhf preference-optimization

Updated Apr 21, 2026

miharcan / lora-preference-optimization-comparison

Star 0

Preference optimization framework for text classification (DPO/ORPO/KTO), with SFT, encoder, and XGBoost baselines plus unified run pipeline and reproducible outputs.

nlp machine-learning text-classification xgboost lora peft dpo kto tranformers qlora llm-fine-tuning orpo preference-optimization

Updated Apr 4, 2026
Python

Improve this page

Add a description, image, and links to the preference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preference-optimization

Here are 25 public repositories matching this topic...

general-preference / general-preference-model

iBacklight / PipelineLLM

sahsaeedi / TPO

s-vco / s-vco

warlockee / oxRL

RUCKBReasoning / DPO_Text2SQL

Sreyan88 / Synthio

JIA-Lab-research / TGDPO

DtYXs / Pre-DPO

sahsaeedi / DCPO-T2I

JingbiaoMei / ExPO-HM

pilancilab / COALA

shaheennabi / open-posttraining-system

Yellow4Submarine7 / LLMDoctor

jmagly / aiwg-training

martimfasantos / CustomPOs-for-SLMs

YuanaHao / Awesome-Diffusion-RL

runhaoli-creator / dmapo

Sonnet-Code / awesome-human-in-the-loop

miharcan / lora-preference-optimization-comparison

Improve this page

Add this topic to your repo