How can we analyse learning curves (of reinforcement learning agent training) to predict what is wrong with the design of our network/reward function?

Question

Nicolas CRETIN 2024년 6월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2126451-how-can-we-analyse-learning-curves-of-reinforcement-learning-agent-training-to-predict-what-is-wro

편집: Nicolas CRETIN 2024년 6월 24일

Hi everyone,

I'm trying to train a PPO (continuous) agent, and I have some general questions about reinforcement learning:

How can we analyse training curves to predict what is wrong with our design (agent type, network layers types and size and training parameters)?

More specifically, I'm training a controller agent that give the four values of my PID gain to be used (P, I, D and N). I'm trying to make it learn to follow the reference speed (image on the right). We can also see on the right, in yellow, the instant reward, that is set to 10 if the five last values of output speed are close to the reference speed (less than 10 rpm of error).

I have currently this result:

On this picture, the system perform pretty well.

Can I conclude from these curves

that my training will not converge?
that the number of learnables of my net is too small?
that I didn't choose the good learning rate (if yes, is it too small or too big?)
is my reward function unefficient?
anything else?
we cannot conclude anything?

I can provide any other informations if needed (just ask which one, please)! The size of my observation vector is [13, 1].

Thanks a lot in advance for your help!

Best regards,

Nicolas

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Shivansh 2024년 6월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2126451-how-can-we-analyse-learning-curves-of-reinforcement-learning-agent-training-to-predict-what-is-wro#answer_1474741

Hi Nicolas,

Your approach to analyze the learning curves is indeed a good approach to identify possible issues in the training of a reinforcement learning model.

The conclusion of the question asked might also depend on other factors but one can interpret a few things by analyzing these curves.

It seems like training is stuck between lower episode reward and might converge to a suboptimal policy. It seems like you have to increase the complexity of the model by increasing the learnables.

The selection of a learning rate is an iterative process and you can experiment with different values and analyze the results. It looks like you might have a slightly higher learning rate as the learning curve is oscillating between different values.

You might want to consider the structure of the reward function as it might not be communicating the desired outcome to the agent resulting in irregular improvement in learning curve.

You might want to experiment with different values for hyperparameters, learning rates and complexity of model and analyze their impact on the learning curve.

I hope it helps!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Nicolas CRETIN 2024년 6월 24일

편집: Nicolas CRETIN 2024년 6월 24일

Thank you very much for your answer Shivansh!

I'm beggining to belive that the most important issue is actually the reward function as you said!

댓글을 달려면 로그인하십시오.

How can we analyse learning curves (of reinforcement learning agent training) to predict what is wrong with the design of our network/reward function?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How can we analyse learning curves (of reinforcement learning agent training) to predict what is wrong with the design of our network/reward function?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기