Reinforcement Learning Episode Manager

Question

蔷蔷汪 2022년 1월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1628110-reinforcement-learning-episode-manager

답변: Poorna 2023년 11월 21일

Why do episode Q0 and episode reward coincide in some applications（Train DDPG Agent to Control Double Integrator System - MATLAB & Simulink - MathWorks 中国） and episode Q0 and episode reward do not coincide in some applications（Train DDPG Agent for Path-Following Control - MATLAB & Simulink - MathWorks 中国） when using ddpg algorithm?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Poorna 2023년 11월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1628110-reinforcement-learning-episode-manager#answer_1357067

Hi 蔷蔷汪,

I understand that you need to know why the initial Q0 values, and the episode reward align in few applications while they do not in other applications.

The alignment of the episode’s initial Q0 value and the episode reward depends on many parameters like the complexity of the environment, hyperparameters, the neural network architecture, and the exploration strategy.

In simpler applications with straightforward environments, the critic network can accurately estimate the initial Q0 value due to the limited complexity. As a result, the initial Q0 value and the episode reward tend to align well.

However, in more complex environments, the initial Q-value estimate from the critic network may not perfectly align with the episode reward. This misalignment can be attributed to the intricacies and variability of the task.

To enhance the convergence and performance of the DDPG algorithm, it is crucial to fine-tune the hyperparameters, adjust the neural network architecture, and experiment with different exploration strategies. These optimizations can help improve the alignment between episode Q0 and episode reward, ultimately leading to enhanced learning and policy performance.

Hope this Helps!

Best regards,

Poorna.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Reinforcement Learning Episode Manager

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Reinforcement Learning Episode Manager

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기