=============== torchtune.rlhf =============== .. currentmodule:: torchtune.rlhf Components and losses for RLHF algorithms like PPO and DPO. .. autosummary:: :toctree: generated/ :nosignatures: estimate_advantages get_rewards_ppo truncate_sequence_at_first_stop_token loss.PPOLoss loss.DPOLoss loss.RSOLoss