The agent can learn the policy through the external action port in the RL Agent so that the agent mimics the output of the reference signal

Question

凡 2023년 9월 11일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2019326-the-agent-can-learn-the-policy-through-the-external-action-port-in-the-rl-agent-so-that-the-agent-mi

댓글: 凡 2024년 2월 26일

I created a DDPG agent that I wanted to learn from the output of an existing controller before training it later. So, I input the reference signal through the external action port, and set the use external action to 1 for training, when training, the output of the agent is the reference signal, but after the training. When I set the use external action to 0 for verification, the output of the agent is not the same as the reference signal, and the difference is a bit big. Does the external action port work with my idea? What should I do to realize my idea?

The figure below shows that when the external action is set to 0, the output of the trained agent is a red curve, and the reference signal is a green curve

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 9월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2019326-the-agent-can-learn-the-policy-through-the-external-action-port-in-the-rl-agent-so-that-the-agent-mi#answer_1317927

It seems the agent started learning how to imitate the existing controller but needs more time. What does the Episode Manager look like? What is your reward signal?

댓글 수: 2
없음 표시없음 숨기기

凡 2024년 2월 26일

Meaning the idea is feasible, but may it need more training Episodes?

凡 2024년 2월 26일

This is the Episode Manager，My bonus signal is: -4*u^2-du/dt，u is an observational measurement，My control goal is to make u 0. My project is to replace the PID controller with an agent，In PID control，u is the input quantity，So I want the agent to mimic the output of the PID at the beginning

댓글을 달려면 로그인하십시오.

The agent can learn the policy through the external action port in the RL Agent so that the agent mimics the output of the reference signal

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

The agent can learn the policy through the external action port in the RL Agent so that the agent mimics the output of the reference signal

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기