필터 지우기
필터 지우기

RL: Continuous action space, but within a desired range

조회 수: 14 (최근 30일)
Wing Yin Ng
Wing Yin Ng 2021년 1월 5일
편집: Francisco Serra 2023년 11월 13일
I am now trying to use a PPO in RL training with continuous action space.
However, I want to have my actor's output always within certain range (e.g. only between 0 to 1). I once tried to map / suppress any out of bound actions to the range, but the performance seemed not good. Are there other ways to tackle this situation? Thank you.

답변 (1개)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021년 1월 8일
Hello,
There are two ways to enforce this:
1) Using the upper and lower limits in rlNumericSpec when you are creating the action space
2) Adding tanh layer followed by a scaling layer in the "mean" path of your actor network as show in this example. This way, you can scale the mean value to the desired range
  댓글 수: 4
Ammad Sadaqat
Ammad Sadaqat 2021년 11월 10일
편집: Ammad Sadaqat 2021년 11월 10일
Hello,
I tried to use the:
rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1)
But i am still facing the same issue that the values for the actions is not bounded between 0 and 1 although i also use the tanh layer followed by a scaling layer
another thing is that if i bound my actions in the simulink enviornment e:g all the values less than 0 should be 0 and all the values greater than 1 should be 1, except changing those values which is already between 0 and 1. What i think is, still it would be inefficicent in terms of simulation time. I would really appreaciate if one can suggest any possible solution for this.
Thanks in advance!
Francisco Serra
Francisco Serra 2023년 11월 13일
편집: Francisco Serra 2023년 11월 13일
Hello. I am trying to control a dynamical system ̈p=u, driving p to 0.
For that I am using a rlPPOAgent. I want the actions to be bounded by -10 and 10 (-10 < u < 10).
If the actor samples from a gaussian distribution in which the mean and stdv values are given by the Neural Network, how can we ensure the boundedness? The rlNumericSpec is only a way to store the values, but does nothing in practical terms, right? I tried to apply a tanh activation function to the meanPath of my actor to squash the values to [-1, 1] and then apply a scaling layer to scale it to [-10, 10]:
meanPath = [
fullyConnectedLayer(16, 'Name', 'meanPathIn')
reluLayer('Name', 'relu5')
fullyConnectedLayer(numAct, 'Name', 'fc6')
tanhLayer(Name="tanhStd")
scalingLayer(Name='meanPathOut', ...
Scale=ainfo.UpperLimit)]; #--> this is where the rlNumericSpec defined above is used
the standard deviation only has a ReLu layer to enforce non-negativity.
In my way of seeing this,I am bounding the mean value to be in this interval, but the actual sampled action can get out of this bounds.
This is one episode of my training process in which we can see that the control input u isn't bounded!
Can somebody help?

댓글을 달려면 로그인하십시오.

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by