How a varying PI parameter output by Reinforcement Learning Agent help to tune the static PI controller

Question

轩 2024년 1월 4일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2066351-how-a-varying-pi-parameter-output-by-reinforcement-learning-agent-help-to-tune-the-static-pi-control

댓글: 轩 2024년 1월 19일

Tune PI Controller using Reinforcement Learning - MATLAB & Simulink - MathWorks Benelux

I read this example and I am confused by the methodologic of this algorithme.

When I train the agent, the PI controller parameters represented by neural network will change every sampletime (0.1s for this example) to find the best setting. After the training, we can deploy the parameters to our PI controller with the parameter areadly fixed.

Why is this reasonable ? For example, in the dynamic process, when the error is large, the agent will tend to increase the parameters of the neural network, and when it converges, the agent will tend to reduce the parameters. The agent can find control the system well with a small cumulate error, but a PI controller with the parameters fixed can not perform like the agent.

댓글 수: 2
없음 표시없음 숨기기

Sam Chak 2024년 1월 4일

Why it is reasonable ?

Hi @轩, I'm unsure if I understand. What is "it" being reasonable to you? Are you referring to the increment and reduction of the values of the parameters?

轩 2024년 1월 5일

@Sam Chak Thank you for your comment. I am referring the online parameters tuning doing by the agent, I think it is unreasonable.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sam Chak 2024년 1월 6일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2066351-how-a-varying-pi-parameter-output-by-reinforcement-learning-agent-help-to-tune-the-static-pi-control#answer_1384341

MATLAB Online에서 열기

Hi @轩

I believe it is scientifically logical that, when the difference between the actual level and the reference level (error) is substantial, the agent will tend to increase the control effort (resulting in a larger water flow to fill up the tank faster). Conversely, as the actual level approaches the reference level, the agent will tend to reduce the control effort (resulting in a smaller water flow to prevent overflow). This behavior is observed in both CST-tuned PI control and TD3-tuned PI control, as illustrated below.

In fact, the Control System Tuner (CST) performs an iterative tuning of the PI gains to meet tracking and stability requirements, while the TD3 agents undergo an iterative learning process to update their policy, maximizing the expected cumulative reward in the given environment.

Kp_CST = 9.80199999804512;

Ki_CST = 1.00019996230706e-06;

Kp_TD3 = 8.0822;

Ki_TD3 = 0.3958;

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

轩 2024년 1월 6일

편집: 轩 2024년 1월 6일

Hello Sam !

I think I understand what you said, but what confuses me is that: when the agent is trained, the PI parameters are regulated every 0.1s. However, when the PI setting is deployed in actual controller, it is fixed, it can't change every 0.1s.

After several days thinking, I think I have figuered it out, and I'd like to talk about my thinking:

The actor in the agent is changing its parameters, that is doing a gradient ascent to find the optimal point of the critic network, instead of finding the best action to handle this specific state. And the latter thing, finding the best action according to the specific state is what critic supposes to do. After training, critic network marks down the good and bad of different actions in different states, and actor network is supposed to find out the optimal parameters setting to the network, that means the most appropriate, the most suitable and compromise parameters to all condition (system's operating point).

Luka 2024년 1월 18일

Hello everyone,

I understood what @轩 was trying to say, and I have the same question. In the given example, shouldn't the network parameters (PI gains) be adjusted only after one episode of training has been completed, and not during the episode itself?

Because in reality, the controller will work with fixed gains (which we want to determine using RL), so I think it is correct to change these gains during the training process only after the individual episode is finished and after the reward for that particular episode is received.

轩 2024년 1월 19일

@Luka Thank you for you comment, I have a new comprehension after posting this question.

Copy one more time here:

The actor in the agent is changing its parameters, that is doing a gradient ascent to find the optimal point of the critic network, instead of finding the best action to handle this specific state. And the latter thing, finding the best action according to the specific state is what critic supposes to do. After training, critic network marks down the good and bad of different actions in different states, and actor network is supposed to find out the optimal parameters setting to the network, that means the most appropriate, the most suitable and compromise parameters to all condition (system's operating point).

댓글을 달려면 로그인하십시오.

How a varying PI parameter output by Reinforcement Learning Agent help to tune the static PI controller

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How a varying PI parameter output by Reinforcement Learning Agent help to tune the static PI controller

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기