What should be the values of Noise parameters (for agent) if my action range is between -0.5 to -5 in DDPG reinforcement learning I want to explore whole action range for each sample time? Also is there anyway to make the noise options (for agent) independent of sample time?

 채택된 답변

Drew Davis
Drew Davis 2019년 6월 19일
편집: Drew Davis 2019년 6월 19일

3 개 추천

Hi Surya
It is fairly common to have Variance*sqrt(SampleTime) somewhere between 1 and 10% of your action range for Ornstein Uhlenbeck (OU) action noise. So in your case, the variance can be set between 4.5*0.01/sqrt(SampleTime) and 4.5*0.10/sqrt(SampleTime). The other important factor is the VarianceDecayRate, which will dictate how fast the variance will decay. You can calculate how many samples it will take for your variance to be halved by this simple formula:
halflife = log(0.5)/log(1-VarianceDecayRate)
It is critically important for your agent to explore while learning so keeping the VarianceDecayRate small (or even zero) is a good idea. The other noise parameters can usually be left as default.
You can check out this pendulum example which does a pretty good job of exploring during training.
The sample time of the noise options will be inherited by the agent, so it is not necessary to configure. By default, the noise model will be queried at the same rate as the agent.
Hope this helps
Drew

댓글 수: 5

I realized I read your range wrong, I initially thought it was -0.5 to 0.5. Edited the above answer
Thank you Drew for your suggestions, but I used tanh layer at the end of actor network and mapped the values from range [-1,1] to [-5,-0.5] using linear mapping and it worked fine. I used variance of 0.15/sqrt(sampletime) and variance decay rate 1e-6 for the above mentioned model. Nevertheless I will try your suggested method as well.
And to avoid algebraic loops, I used 1 timestep lag block, hope it doesn't affect the model.
Dear Drew, Many thanks for the half-life formula.
Could you please point me to a source for this formula, as I need to reference it.
Although my DDPG training is still under tuning, I think it may have helped me tune it.
You can derive this formula pretty easily:
decayfactor = 0.5 = (1 - decayrate)^(#steps)
Thank you Drew

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Atikah Surriani
Atikah Surriani 2023년 4월 30일

0 개 추천

can i change noise model of ddpg using matlab? for example, the original ddpg using OU noise, while my study tends to change it using gaussian?

댓글 수: 3

Hello Atikah,
Yeah, you are right they used OU noise but later with their paper, TD3:
"Afterwards, we use an off-policy exploration strategy, adding Gaussian noise N (0, 0.1) to each action. Unlike the original implementation of DDPG, we used uncorrelated noise for exploration as we found noise drawn from the Ornstein-Uhlenbeck (Uhlenbeck & Ornstein, 1930) process offered no performance benefits."
So basically they(gaussian and OU) do the same thing.
thank you for the answer, so we can change the noise option on DDPG using matlab?
for example:
rl.option.OrnsteinUhlenbeckActionNoise
we change as " rl.option.gaussianActionNoise or rl.option.anythingActionNoise "
or else
thankyou
or do any modification to the noise?

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Reinforcement Learning Toolbox에 대해 자세히 알아보기

제품

릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by