Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ma921/HolderDPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

57 Commits

Repository files navigation

HolderDPO

  • Robust DPO with provable redescending property.
  • Principled data valuation and cleaning method.

Quick start

We will tidy up the code soon. If you have already the code basis for DPO, our loss is simply replace it as follows:

import torch.nn.functional as F
# pi_logps : policy logprobs, shape (B,)
# ref_logps : reference model logprobs, shape (B,)
# yw_idxs : preferred completion indices, shape (T,)
# yl_idxs : dispreferred indices, shape (T,)
# beta, beta_1 : regularization coefficients
pi_yw_logps = pi_logps[yw_idxs]
pi_yl_logps = pi_logps[yl_idxs]
ref_yw_logps = ref_logps[yw_idxs]
ref_yl_logps = ref_logps[yl_idxs]
reward_win = pi_yw_logps - ref_yw_logps
reward_lose = pi_yl_logps - ref_yl_logps
g_theta = reward_win - reward_lose
if self.method == "dpo":
 loss = -F.logsigmoid(self.beta * g_theta).mean()
elif self.method == "holder_dpo":
 p = F.sigmoid(self.beta * g_theta)
 loss = - (1.0 + self.gamma) * p.pow(self.gamma).mean() \
 + self.gamma * (p.pow(self.gamma + 1)).mean()
return loss

Cite as

Please cite this work as

@article{fujisawa2025scalable,
 title={Scalable Valuation of Human Feedback through Provably Robust Model Alignment},
 author={Fujisawa, Masahiro and Adachi, Masaki and Osborne, Michael A},
 booktitle={Advances in Neural Information Processing Systems},
 doi={https://doi.org/10.48550/arXiv.2505.17859},
 year={2025}
}

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /