beta distribution in PPO

Sourabh

2024 2월 2

0 답변

조회 수: 5 (30일)

0 개 추천

I want to confine the actions of my PPO algorithm and I was thinking whether or not I can implement beta distribution for my PPO algorithm to confine my action space somehow.

heres the script of networks i am using

----------

commonPath = [

featureInputLayer(prod(obsInfo.Dimension),Name="comPathIn")

fullyConnectedLayer(120)

tanhLayer

fullyConnectedLayer(1,Name="comPathOut")

];

% Define mean value path

meanPath = [

fullyConnectedLayer(64,Name="meanPathIn")

tanhLayer

fullyConnectedLayer(64,Name="fc_2")

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension))

leakyReluLayer(0.1,Name="meanPathOut")

];

% Define standard deviation path

sdevPath = [

fullyConnectedLayer(64,"Name","stdPathIn")

tanhLayer

fullyConnectedLayer(64)

tanhLayer

fullyConnectedLayer(prod(actInfo.Dimension));

softmaxLayer(Name="stdPathOut")

];

% Add layers to layerGraph object

actorNet = layerGraph(commonPath);

actorNet = addLayers(actorNet,meanPath);

actorNet = addLayers(actorNet,sdevPath);

% Connect paths

actorNet = connectLayers(actorNet,"comPathOut","meanPathIn/in");

actorNet = connectLayers(actorNet,"comPathOut","stdPathIn/in");

actorNetwork = dlnetwork(actorNet);

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Kautuk Raj 2024년 2월 15일

To implement a Beta distribution for the action outputs in the PPO algorithm, I think we would need to modify the network architecture to output the parameters (alpha and beta) of the Beta distribution. These parameters must be positive, so one would typically use an activation function that ensures positivity, such as the softplus function.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question