RL: Continuous action space, but within a desired range

Question

Wing Yin Ng 2021년 1월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/709388-rl-continuous-action-space-but-within-a-desired-range

편집: Francisco Serra 2023년 11월 13일

I am now trying to use a PPO in RL training with continuous action space.

However, I want to have my actor's output always within certain range (e.g. only between 0 to 1). I once tried to map / suppress any out of bound actions to the range, but the performance seemed not good. Are there other ways to tackle this situation? Thank you.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2021년 1월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/709388-rl-continuous-action-space-but-within-a-desired-range#answer_593983

Hello,

There are two ways to enforce this:

1) Using the upper and lower limits in rlNumericSpec when you are creating the action space

2) Adding tanh layer followed by a scaling layer in the "mean" path of your actor network as show in this example. This way, you can scale the mean value to the desired range

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Ammad Sadaqat 2021년 11월 10일

편집: Ammad Sadaqat 2021년 11월 10일

MATLAB Online에서 열기

Hello,

I tried to use the:

rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1)

But i am still facing the same issue that the values for the actions is not bounded between 0 and 1 although i also use the tanh layer followed by a scaling layer

another thing is that if i bound my actions in the simulink enviornment e:g all the values less than 0 should be 0 and all the values greater than 1 should be 1, except changing those values which is already between 0 and 1. What i think is, still it would be inefficicent in terms of simulation time. I would really appreaciate if one can suggest any possible solution for this.

Thanks in advance!

Francisco Serra 2023년 11월 13일

편집: Francisco Serra 2023년 11월 13일

MATLAB Online에서 열기

Hello. I am trying to control a dynamical system ̈p=u, driving p to 0.

For that I am using a rlPPOAgent. I want the actions to be bounded by -10 and 10 (-10 < u < 10).

If the actor samples from a gaussian distribution in which the mean and stdv values are given by the Neural Network, how can we ensure the boundedness? The rlNumericSpec is only a way to store the values, but does nothing in practical terms, right? I tried to apply a tanh activation function to the meanPath of my actor to squash the values to [-1, 1] and then apply a scaling layer to scale it to [-10, 10]:

meanPath = [
    fullyConnectedLayer(16, 'Name', 'meanPathIn')
    reluLayer('Name', 'relu5')
    fullyConnectedLayer(numAct, 'Name', 'fc6')
    tanhLayer(Name="tanhStd")
    scalingLayer(Name='meanPathOut', ...
    Scale=ainfo.UpperLimit)];  #--> this is where the rlNumericSpec defined above is used

the standard deviation only has a ReLu layer to enforce non-negativity.

In my way of seeing this,I am bounding the mean value to be in this interval, but the actual sampled action can get out of this bounds.

This is one episode of my training process in which we can see that the control input u isn't bounded!

Can somebody help?

댓글을 달려면 로그인하십시오.

RL: Continuous action space, but within a desired range

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

RL: Continuous action space, but within a desired range

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기