AdaptiveKLController¶
- class torchrl.data.AdaptiveKLController(*, init_kl_coef: float, target: float, horizon: int, model: nn.Module | None = None)[source]¶
Adaptive KL Controller as described in Ziegler et al. “Fine-Tuning Language Models from Human Preferences”.
- Keyword Arguments:
init_kl_coef (float) – The starting value of the coefficient.
target (float) – The target KL value. When the observed KL is smaller, the coefficient is decreased, thereby relaxing the KL penalty in the training objective and allowing the model to stray further from the reference model. When the observed KL is greater than the target, the KL coefficient is increased, thereby pulling the model back towards the reference model.
horizon (int) – Scaling factor to control how aggressively we update the coefficient.
model (nn.Module, optional) – wrapped model that needs to be controlled. Must have an attribute
"kl_coef"
. If provided, the"kl_coef"
will be updated in-place.
Reference: Section 2.2 https://arxiv.org/pdf/1909.08593.pdf#page=2 Source: https://github.com/openai/lm-human-preferences/blob/master/lm_human_preferences/train_policy.py