Reinforcement learning based control not working for a positioning system.

Question

Romina Zarrabi 2024년 3월 15일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2094706-reinforcement-learning-based-control-not-working-for-a-positioning-system

답변: TARUN 2025년 4월 23일

I've adapted the MATLAB water tank example(openExample('rl/GenerateRewardFunctionFromAModelVerificationBlockExample')) for my project by replacing the water tank model and reward function with those from my own system. Despite adjusting every possible parameter to match my system's requirements, the controller remains unresponsive. Specifically, altering the initial and final positions (h0 and hf) doesn't influence the system's behavior, and notably, the system is not even approaching the final or goal position. Could anyone shed some light on why these modifications are not affecting the controller's response as anticipated?

Thank you for any guidance or suggestions!

Attached are the simulink code, reward function and the physical model, and here is the code:

% Initial and final position
h0 = 0;
hf = 200;
% Simulation and sample times
Tf = 10;
Ts = 0.01;
open_system('PositionerStepInput');
Unable to find system or file 'PositionerStepInput'.
numObs = 6;
numAct = 1;
oinfo = rlNumericSpec([numObs 1]); % Observation space specification remains unchanged
ainfo = rlNumericSpec([numAct 1], 'LowerLimit', 0, 'UpperLimit', 100); % Define action space with lower and upper limits
env = rlSimulinkEnv('PositionerStepInput','PositionerStepInput/RL Agent',oinfo,ainfo);
rng(100);
% Critic
cnet = [
    featureInputLayer(numObs,'Normalization','none','Name', 'State')
    fullyConnectedLayer(128, 'Name', 'fc1')
    concatenationLayer(1,2,'Name','concat')
    reluLayer('Name','relu1')
    fullyConnectedLayer(128, 'Name', 'fc3')
    reluLayer('Name','relu2')
    fullyConnectedLayer(1, 'Name', 'CriticOutput')];
actionPath = [
    featureInputLayer(numAct,'Normalization','none', 'Name', 'Action')
    fullyConnectedLayer(8, 'Name', 'fc2')];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','State')
    fullyConnectedLayer(128, 'Name','actorFC1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(128, 'Name','actorFC2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numAct,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,oinfo,ainfo,...
    'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
    "DiscountFactor",0.99, ...
    "ExperienceBufferLength",1e6, ...
    "MiniBatchSize",256);
agentOpts.ExplorationModel.StandardDeviation = 0.5;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-5;
agentOpts.ExplorationModel.StandardDeviationMin = 0;
agent = rlTD3Agent(actor,[critic1,critic2],agentOpts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1500, ...
    'MaxStepsPerEpisode',ceil(Tf/Ts), ...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-5,...
    'ScoreAveragingWindowLength',20);
doTraining = false;
if doTraining
    trainingStats = train(agent,env,trainOpts);
    save('myTrainedAgent.mat', 'agent')
else
    load('myTrainedAgent.mat', 'agent')
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

TARUN 2025년 4월 23일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2094706-reinforcement-learning-based-control-not-working-for-a-positioning-system#answer_1564072

Hi @Romina Zarrabi,

I understand that the modifications you have made in the water tank model are not affecting the controller’s response.

There are few possible causes for the same:

1. Variables defined in MATLAB workspace “h0” and “hf” must be linked to Simulink model blocks. You can link them by following the below steps:

Open your Simulink model.
Check the source of initial state. e.g., an Integrator block's initial condition parameter.
Make sure it is set to “h0” and not a hardcoded value.
Similarly, make sure the goal state used in reward calculation is set to “hf”.

2. The reward seems to depend on the difference between “H” and “xg” (goal) so if “xg” (goal) is not updated according to “hf”, the reward will not reflect your intended target and if “H” (system state) is not initialized to “h0”, the agent will always start from the same (possibly wrong) state.

3. Set “doTraining = true” and retrain after every change to initial/final states or reward.

4. Additionally, you can log the values of “h0”, “hf”, “H”, and “xg” during simulation to ensure they match your expectations.

Feel free to go through the following documentation to learn more about water tank model:

https://www.mathworks.com/help/releases/R2021b/reinforcement-learning/ug/water-tank-reinforcement-learning-environment-model.html?searchHighlight=water%20tank%20model&searchResultIndex=3

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Reinforcement learning based control not working for a positioning system.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Reinforcement learning based control not working for a positioning system.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기