RewardNormalizer¶
- class torchrl.trainers.RewardNormalizer(decay: float = 0.999, scale: float = 1.0, eps: float | None = None, log_pbar: bool = False, reward_key=None)[原始碼]¶
獎勵歸一化器鉤子。
- 引數:
decay (
float, 可選) – 指數移動平均衰減引數。預設為 0.999scale (
float, 可選) – 歸一化後的獎勵用於相乘的縮放因子。預設為 1.0。eps (
float, 可選) – 用於防止數值下溢的 epsilon 抖動。預設為torch.finfo(DEFAULT_DTYPE).eps,其中DEFAULT_DTYPE=torch.get_default_dtype()。reward_key (str 或 tuple, 可選) – 在輸入批次中查詢獎勵的鍵。預設為
("next", "reward")
示例
>>> reward_normalizer = RewardNormalizer() >>> trainer.register_op("batch_process", reward_normalizer.update_reward_stats) >>> trainer.register_op("process_optim_batch", reward_normalizer.normalize_reward)
- register(trainer: Trainer, name: str = 'reward_normalizer')[原始碼]¶
Registers the hook in the trainer at a default location.
- 引數:
trainer (Trainer) – the trainer where the hook must be registered.
name (str) – the name of the hook.
注意
To register the hook at another location than the default, use
register_op().