DdpgMlpActor¶

class torchrl.modules.DdpgMlpActor(action_dim: int, mlp_net_kwargs: dict | None = None, device: DEVICE_TYPING | None = None)[原始碼]¶

DDPG Actor 類。

在“CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING”中提出，https://arxiv.org/pdf/1509.02971.pdf

DDPG Actor 以觀測向量作為輸入，並從中返回一個動作。它被訓練來最大化 DDPG Q 值網路返回的值。

引數:

action_dim (int) – 動作向量的長度

mlp_net_kwargs (dict, optional) –

MLP 的關鍵字引數。預設為

>>> {
...     'in_features': None,
...     'out_features': action_dim,
...     'depth': 2,
...     'num_cells': [400, 300],
...     'activation_class': nn.ELU,
...     'bias_last_layer': True,
... }

device (torch.device, optional) – 建立模組的裝置。

示例

>>> import torch
>>> from torchrl.modules import DdpgMlpActor
>>> actor = DdpgMlpActor(action_dim=4)
>>> print(actor)
DdpgMlpActor(
  (mlp): MLP(
    (0): LazyLinear(in_features=0, out_features=400, bias=True)
    (1): ELU(alpha=1.0)
    (2): Linear(in_features=400, out_features=300, bias=True)
    (3): ELU(alpha=1.0)
    (4): Linear(in_features=300, out_features=4, bias=True)
  )
)
>>> obs = torch.zeros(10, 6)
>>> action = actor(obs)
>>> print(action.shape)
torch.Size([10, 4])

forward(observation: Tensor) → Tensor[原始碼]¶

定義每次呼叫時執行的計算。

所有子類都應重寫此方法。

注意

儘管前向傳播的實現需要在此函式中定義，但您應該在之後呼叫 Module 例項而不是此函式，因為前者會處理註冊的鉤子，而後者則會靜默忽略它們。

DdpgMlpActor¶

文件

教程

資源