can i decide the RL agents actions

조회 수: 4 (최근 30일)
Sourabh
Sourabh 2023년 9월 2일
댓글: Sourabh 2023년 10월 28일
I am training a PPO agent and issue is it keeps on searching for a better value even after reaching close to stable state.
what i mean is I want my agent to keep applying last action values as soon as the error values reaches <= 0.05 (to prevent oscillations and offset near the set point as shown in shared image.)
my question is can i do it in matlab because i know you can do it in python for sure. any help would be really really helpfull :)
  댓글 수: 3
Sourabh
Sourabh 2023년 9월 3일
actually i saw it in a IEEE paper and when i asked that guy he told me he was using python.
I dont have any code with me right now but surely there can be a way to decide the action of my agent i feel.
Sourabh
Sourabh 2023년 9월 4일
okay i might get some code after a week or so
but all i want is to limit the actions of my PPO agent to settle after some time, not act like as shown in image attached.

댓글을 달려면 로그인하십시오.

답변 (2개)

Sam Chak
Sam Chak 2023년 9월 4일
I believe that it has something to do with the StopTrainingCriteria and StopTrainingValue options of your rlTrainingOptions object. Is the condition "steady-state error ≤ 0.05" reflected in the training termination condition? Typically, the agent will continue to train until MaxEpisodes is reached when the stopping condition is not satisfied.
maxepisodes = 6000;
maxsteps = 150;
trainingOpts = rlTrainingOptions(...
'MaxEpisodes', maxepisodes,...
'MaxStepsPerEpisode', maxsteps,...
'ScoreAveragingWindowLength', 5, ...
'Verbose', false,...
'Plots', 'training-progress',...
'StopTrainingCriteria', 'AverageReward',...
'StopTrainingValue', 1500);
Also, please note that the rewards obtained by the final agents are not necessarily the greatest achieved during the training episodes. You need to save the agents that meet the "steady-state error ≤ 0.05" condition during training by specifying the SaveAgentCriteria and SaveAgentValue properties in the rlTrainingOptions object.
See also:
  댓글 수: 2
Sourabh
Sourabh 2023년 9월 4일
then y r DDPG and TD3 agents working fine?
it has nothing to do with stop training criteria. i just want to settle my agent outputs to previous value as soon as error value reaches 0.05 in training episode.
Sourabh
Sourabh 2023년 10월 28일

댓글을 달려면 로그인하십시오.


Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2023년 9월 25일
편집: Emmanouil Tzorakoleftherakis 2023년 9월 25일
It seems like the paper you saw uses some logic to implement the behavior you mention. You could do the same with an if statement in MATLAB.
  댓글 수: 1
Sourabh
Sourabh 2023년 9월 26일
you mean in my script or in my environment.
like can u give an example

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by