BatchSizeTransform¶
- class torchrl.envs.transforms.BatchSizeTransform(*, batch_size: torch.Size | None = None, reshape_fn: Callable[[TensorDictBase], TensorDictBase] | None = None, reset_func: Callable[[TensorDictBase, TensorDictBase], TensorDictBase] | None = None, env_kwarg: bool = False)[原始碼]¶
用於修改環境批次大小的轉換。
此轉換有兩種不同的用法:它可以用於設定非批次鎖定(例如無狀態)環境的批次大小,以透過資料收集器進行資料收集。它還可以用於修改環境的批次大小(例如,squeeze、unsqueeze 或 reshape)。
此轉換會修改環境的批次大小以匹配提供的批次大小。它期望父環境的批次大小可以擴充套件到提供的批次大小。
- 關鍵字引數:
batch_size (torch.Size 或 等效值,可選) – 環境的新批次大小。與
reshape_fn互斥。reshape_fn (callable, optional) –
一個用於修改環境批次大小的可呼叫物件。與
batch_size互斥。注意
目前支援涉及
reshape、flatten、unflatten、squeeze和unsqueeze的轉換。如果需要其他重塑操作,請在 TorchRL GitHub 上提交功能請求。reset_func (callable, optional) – 一個生成重置 tensordict 的函式。簽名必須與
Callable[[TensorDictBase, TensorDictBase], TensorDictBase]匹配,其中第一個輸入引數是呼叫reset()時傳遞給環境的可選 tensordict,第二個引數是TransformedEnv.base_env.reset的輸出。如果env_kwarg=True,它還可以支援一個可選的env關鍵字引數。env_kwarg (bool, optional) – 如果為
True,則reset_func必須支援env關鍵字引數。預設為False。傳遞的 env 將是伴隨其轉換的 env。
示例
>>> # Changing the batch-size with a function >>> from torchrl.envs import GymEnv >>> base_env = GymEnv("CartPole-v1") >>> env = TransformedEnv(base_env, BatchSizeTransform(reshape_fn=lambda data: data.reshape(1, 1))) >>> env.rollout(4) >>> # Setting the shape of a stateless environment >>> class MyEnv(EnvBase): ... batch_locked = False ... def __init__(self): ... super().__init__() ... self.observation_spec = Composite(observation=Unbounded(3)) ... self.reward_spec = Unbounded(1) ... self.action_spec = Unbounded(1) ... ... def _reset(self, tensordict: TensorDictBase, **kwargs) -> TensorDictBase: ... tensordict_batch_size = tensordict.batch_size if tensordict is not None else torch.Size([]) ... result = self.observation_spec.rand(tensordict_batch_size) ... result.update(self.full_done_spec.zero(tensordict_batch_size)) ... return result ... ... def _step( ... self, ... tensordict: TensorDictBase, ... ) -> TensorDictBase: ... result = self.observation_spec.rand(tensordict.batch_size) ... result.update(self.full_done_spec.zero(tensordict.batch_size)) ... result.update(self.full_reward_spec.zero(tensordict.batch_size)) ... return result ... ... def _set_seed(self, seed: Optional[int]) -> None: ... pass ... >>> env = TransformedEnv(MyEnv(), BatchSizeTransform([5])) >>> assert env.batch_size == torch.Size([5]) >>> assert env.rollout(10).shape == torch.Size([5, 10])
reset_func可以建立一個具有所需批次大小的 tensordict,從而允許進行精細化的 reset 呼叫>>> def reset_func(tensordict, tensordict_reset, env): ... result = env.observation_spec.rand() ... result.update(env.full_done_spec.zero()) ... assert result.batch_size != torch.Size([]) ... return result >>> env = TransformedEnv(MyEnv(), BatchSizeTransform([5], reset_func=reset_func, env_kwarg=True)) >>> print(env.rollout(2)) TensorDict( fields={ action: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False)
此轉換可用於在資料收集器中部署非批次鎖定環境
>>> from torchrl.collectors import SyncDataCollector >>> collector = SyncDataCollector(env, lambda td: env.rand_action(td), frames_per_batch=10, total_frames=-1) >>> for data in collector: ... print(data) ... break TensorDict( fields={ action: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), collector: TensorDict( fields={ traj_ids: Tensor(shape=torch.Size([5, 2]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False), observation: Tensor(shape=torch.Size([5, 2, 3]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([5, 2, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([5, 2]), device=None, is_shared=False) >>> collector.shutdown()
- forward(next_tensordict: TensorDictBase) TensorDictBase¶
讀取輸入 tensordict,並對選定的鍵應用轉換。
預設情況下,此方法
直接呼叫
_apply_transform()。不呼叫
_step()或_call()。
此方法不會在任何時候在 env.step 中呼叫。但是,它會在
sample()中呼叫。注意
forward也可以使用dispatch將引數名稱轉換為鍵,並使用常規關鍵字引數。示例
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- transform_input_spec(input_spec: Composite) Composite[原始碼]¶
轉換輸入規範,使結果規範與轉換對映匹配。
- 引數:
input_spec (TensorSpec) – 轉換前的規範
- 返回:
轉換後的預期規範