Table of Contents

Shortcuts

torchtune.rlhf.loss.RSOLoss

torchtune.rlhf.loss.RSOLoss = <function RSOLoss>[source]

https://arxiv.org/abs/2309.06657. Intuition from the paper:

DPO is a logistic regression on human preference data, and SLiC (https://arxiv.org/abs/2305.10425) is almost equivalent to a support vector machine (SVM) with hinge loss. [RSO] improve[s] SLiC as the SVM counter part of DPO.

Based on the implementation in HF’s TRL library: https://github.com/huggingface/trl/blob/4dce042a3863db1d375358e8c8092b874b02934b/trl/trainer/dpo_trainer.py#L1141

Parameters:: gamma (float) – Equivalent temperature parameter (from DPO) for the RSO loss.
Type:: Statistical Rejection Sampling Optimization (RSO) or “hinge” loss module

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources