Reinforcement learning based control not working for a positioning system.

조회 수: 5 (최근 30일)
Romina Zarrabi
Romina Zarrabi 2024년 3월 15일
답변: TARUN 2025년 4월 23일
I've adapted the MATLAB water tank example(openExample('rl/GenerateRewardFunctionFromAModelVerificationBlockExample')) for my project by replacing the water tank model and reward function with those from my own system. Despite adjusting every possible parameter to match my system's requirements, the controller remains unresponsive. Specifically, altering the initial and final positions (h0 and hf) doesn't influence the system's behavior, and notably, the system is not even approaching the final or goal position. Could anyone shed some light on why these modifications are not affecting the controller's response as anticipated?
Thank you for any guidance or suggestions!
Attached are the simulink code, reward function and the physical model, and here is the code:
% Initial and final position
h0 = 0;
hf = 200;
% Simulation and sample times
Tf = 10;
Ts = 0.01;
open_system('PositionerStepInput');
Unable to find system or file 'PositionerStepInput'.
numObs = 6;
numAct = 1;
oinfo = rlNumericSpec([numObs 1]); % Observation space specification remains unchanged
ainfo = rlNumericSpec([numAct 1], 'LowerLimit', 0, 'UpperLimit', 100); % Define action space with lower and upper limits
env = rlSimulinkEnv('PositionerStepInput','PositionerStepInput/RL Agent',oinfo,ainfo);
rng(100);
% Critic
cnet = [
featureInputLayer(numObs,'Normalization','none','Name', 'State')
fullyConnectedLayer(128, 'Name', 'fc1')
concatenationLayer(1,2,'Name','concat')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name', 'fc3')
reluLayer('Name','relu2')
fullyConnectedLayer(1, 'Name', 'CriticOutput')];
actionPath = [
featureInputLayer(numAct,'Normalization','none', 'Name', 'Action')
fullyConnectedLayer(8, 'Name', 'fc2')];
criticNetwork = layerGraph(cnet);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObs,'Normalization','none','Name','State')
fullyConnectedLayer(128, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(128, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numAct,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,oinfo,ainfo,...
'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions("SampleTime",Ts, ...
"DiscountFactor",0.99, ...
"ExperienceBufferLength",1e6, ...
"MiniBatchSize",256);
agentOpts.ExplorationModel.StandardDeviation = 0.5;
agentOpts.ExplorationModel.StandardDeviationDecayRate = 1e-5;
agentOpts.ExplorationModel.StandardDeviationMin = 0;
agent = rlTD3Agent(actor,[critic1,critic2],agentOpts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',1500, ...
'MaxStepsPerEpisode',ceil(Tf/Ts), ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-5,...
'ScoreAveragingWindowLength',20);
doTraining = false;
if doTraining
trainingStats = train(agent,env,trainOpts);
save('myTrainedAgent.mat', 'agent')
else
load('myTrainedAgent.mat', 'agent')
end

답변 (1개)

TARUN
TARUN 2025년 4월 23일
I understand that the modifications you have made in the water tank model are not affecting the controller’s response.
There are few possible causes for the same:
1. Variables defined in MATLAB workspace h0 andhf must be linked to Simulink model blocks. You can link them by following the below steps:
  • Open your Simulink model.
  • Check the source of initial state. e.g., an Integrator block's initial condition parameter.
  • Make sure it is set to h0 and not a hardcoded value.
  • Similarly, make sure the goal state used in reward calculation is set to hf.
2. The reward seems to depend on the difference between H and xg (goal) so if xg (goal) is not updated according to hf, the reward will not reflect your intended target and if “H” (system state) is not initialized to h0, the agent will always start from the same (possibly wrong) state.
3. Set doTraining = true and retrain after every change to initial/final states or reward.
4. Additionally, you can log the values of h0, hf, H, and xg during simulation to ensure they match your expectations.
Feel free to go through the following documentation to learn more about water tank model:

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by