TimeMaxPool¶

class torchrl.envs.transforms.TimeMaxPool(in_keys: Sequence[NestedKey] | None = None, out_keys: Sequence[NestedKey] | None = None, T: int = 1, reset_key: NestedKey | None = None)[原始碼]¶

取最後 T 個觀測值在每個位置上的最大值。

此轉換會在最後一個 T 時間步內，對所有 in_keys 張量中的每個位置取最大值。

引數:

in_keys (NestedKey 序列, 可選) – 將應用 max pool 的輸入鍵。如果為空，則預設為“observation”。
out_keys (NestedKey 序列, 可選) – 將寫入輸出的輸出鍵。如果為空，則預設為 in_keys。
T (int, 可選) – 應用 max pooling 的時間步數。
reset_key (NestedKey, 可選) – 要用作部分重置指示器的重置鍵。必須是唯一的。如果未提供，則預設為父環境的唯一重置鍵（如果只有一個），否則引發異常。

示例

>>> from torchrl.envs import GymEnv
>>> base_env = GymEnv("Pendulum-v1")
>>> env = TransformedEnv(base_env, TimeMaxPool(in_keys=["observation"], T=10))
>>> torch.manual_seed(0)
>>> env.set_seed(0)
>>> rollout = env.rollout(10)
>>> print(rollout["observation"])  # values should be increasing up until the 10th step
tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0216,  0.0000],
        [ 0.0000,  0.1149,  0.0000],
        [ 0.0000,  0.1990,  0.0000],
        [ 0.0000,  0.2749,  0.0000],
        [ 0.0000,  0.3281,  0.0000],
        [-0.9290,  0.3702, -0.8978]])

注意

TimeMaxPool 目前僅支援根目錄下的 done 訊號。巢狀的 done（如 MARL 設定中發現的）目前不受支援。如果需要此功能，請在 TorchRL 倉庫中提出一個 issue。

forward(tensordict: TensorDictBase) → TensorDictBase[原始碼]¶

讀取輸入 tensordict，並對選定的鍵應用轉換。

預設情況下，此方法

直接呼叫 _apply_transform()。
不呼叫 _step() 或 _call()。

此方法不會在任何時候在 env.step 中呼叫。但是，它會在 sample() 中呼叫。

注意

forward 也可以使用 dispatch 將引數名稱轉換為鍵，並使用常規關鍵字引數。

示例

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.

transform_observation_spec(observation_spec: TensorSpec) → TensorSpec[原始碼]¶

轉換觀察規範，使結果規範與轉換對映匹配。

引數:: observation_spec (TensorSpec) – 轉換前的規範
返回:: 轉換後的預期規範

TimeMaxPool¶

文件

教程

資源