快捷方式

MultiStepActorWrapper

class torchrl.modules.tensordict_module.MultiStepActorWrapper(*args, **kwargs)[原始碼]

包裝多步動作的 Actor。

此類允許在環境中執行宏觀操作。Actor 的動作(actions)條目必須有一個額外的時間維度才能被消耗。它必須放置在輸入 tensordict 的最後一個維度(即 tensordict.ndim)的旁邊。

如果未提供,動作條目鍵將透過一個簡單的啟發式方法自動從 Actor 中檢索(任何以 "action" 字串結尾的巢狀鍵)。

輸入 tensordict 中還必須存在一個 "is_init" 條目,用於跟蹤當前集合何時應該中斷,因為遇到了“完成”狀態。與 action_keys 不同,此鍵必須是唯一的。

引數:
  • actor (TensorDictModuleBase) – 一個 Actor。

  • n_steps (int, optional) – Actor 一次輸出的動作數量(前瞻視窗)。預設為 None

關鍵字引數:
  • action_keys (list of NestedKeys, optional) – 環境的動作鍵。可以從 env.action_keys 中檢索。預設為 actor 中以 "action" 字串結尾的所有 out_keys

  • init_key (NestedKey, optional) – 指示環境何時已重置的條目鍵。預設為 "is_init",這是 InitTracker 變換的 out_key

  • keep_dim (bool, optional) – 在索引期間是否保留宏觀操作的時間維度。預設為 False

示例

>>> import torch.nn
>>> from torchrl.modules.tensordict_module.actors import MultiStepActorWrapper, Actor
>>> from torchrl.envs import CatFrames, GymEnv, TransformedEnv, SerialEnv, InitTracker, Compose
>>> from tensordict.nn import TensorDictSequential as Seq, TensorDictModule as Mod
>>>
>>> time_steps = 6
>>> n_obs = 4
>>> n_action = 2
>>> batch = 5
>>>
>>> # Transforms a CatFrames in a stack of frames
>>> def reshape_cat(data: torch.Tensor):
...     return data.unflatten(-1, (time_steps, n_obs))
>>> # an actor that reads `time_steps` frames and outputs one action per frame
>>> # (actions are conditioned on the observation of `time_steps` in the past)
>>> actor_base = Seq(
...     Mod(reshape_cat, in_keys=["obs_cat"], out_keys=["obs_cat_reshape"]),
...     Mod(torch.nn.Linear(n_obs, n_action), in_keys=["obs_cat_reshape"], out_keys=["action"])
... )
>>> # Wrap the actor to dispatch the actions
>>> actor = MultiStepActorWrapper(actor_base, n_steps=time_steps)
>>>
>>> env = TransformedEnv(
...     SerialEnv(batch, lambda: GymEnv("CartPole-v1")),
...     Compose(
...         InitTracker(),
...         CatFrames(N=time_steps, in_keys=["observation"], out_keys=["obs_cat"], dim=-1)
...     )
... )
>>>
>>> print(env.rollout(100, policy=actor, break_when_any_done=False))
TensorDict(
    fields={
        action: Tensor(shape=torch.Size([5, 100, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        action_orig: Tensor(shape=torch.Size([5, 100, 6, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        counter: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.int32, is_shared=False),
        done: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        is_init: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                is_init: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                obs_cat: Tensor(shape=torch.Size([5, 100, 24]), device=cpu, dtype=torch.float32, is_shared=False),
                observation: Tensor(shape=torch.Size([5, 100, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([5, 100]),
            device=cpu,
            is_shared=False),
        obs_cat: Tensor(shape=torch.Size([5, 100, 24]), device=cpu, dtype=torch.float32, is_shared=False),
        observation: Tensor(shape=torch.Size([5, 100, 4]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([5, 100, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([5, 100]),
    device=cpu,
    is_shared=False)

另請參閱

torchrl.envs.MultiStepEnvWrapper 是此包裝器的 EnvBase 的對應物件:它包裝一個環境並解綁動作,逐個元素執行。

forward(tensordict: TensorDictBase) TensorDictBase[原始碼]

定義每次呼叫時執行的計算。

所有子類都應重寫此方法。

注意

儘管前向傳播的實現需要在此函式中定義,但您應該在之後呼叫 Module 例項而不是此函式,因為前者會處理註冊的鉤子,而後者則會靜默忽略它們。

property init_key: NestedKey

批次中給定元素的初始步驟指示器。

文件

訪問全面的 PyTorch 開發者文件

檢視文件

教程

為初學者和高階開發者提供深入的教程

檢視教程

資源

查詢開發資源並讓您的問題得到解答

檢視資源