How do I get DQN to output the policy I want

Question

zhou wen 2024년 5월 15일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2119146-how-do-i-get-dqn-to-output-the-policy-i-want

답변: praguna manvi 2024년 7월 17일

I'm solving a problem with DQN. This environment currently has 10 optional moves, 8 states, and 20 rounds per run. I want to keep my problem variables to a minimum. The optima

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

praguna manvi 2024년 7월 17일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2119146-how-do-i-get-dqn-to-output-the-policy-i-want#answer_1486612

MATLAB Online에서 열기

Hi,

Here is a sample code on how you could train a DQN agent with the above input, I am assuming a random “step function” and “reset function” for a simplified example:

% Define your environment
numStates = 8;
numActions = 10;
% Define the observation and action spaces
obsInfo = rlNumericSpec([numStates 1]);
actInfo = rlFiniteSetSpec(1:numActions);
% Create the custom environment
env = rlFunctionEnv(obsInfo, actInfo, @myStepFunction, @myResetFunction);
% Define the DQN agent
statePath = [
    featureInputLayer(8, 'Normalization', 'none', 'Name', 'state')
    fullyConnectedLayer(24,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(24,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numActions,'Name','fc3')];
criticNetwork = dlnetwork(statePath);
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'state'},criticOpts);
agentOpts = rlDQNAgentOptions(...
    'SampleTime',1,...
    'DiscountFactor',0.99,...
    'ExperienceBufferLength',10000,...
    'MiniBatchSize',256);
agent = rlDQNAgent(critic,agentOpts);
% Train the agent
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',20,...
    'MaxStepsPerEpisode',numStates,...
    'Verbose',false,...
    'Plots','training-progress');
trainingStats = train(agent,env,trainOpts);
% Define the step function
function [nextObs, reward, isDone, loggedSignals] = myStepFunction(action, loggedSignals)
    % step function logic here, calculating the next state
    nextObs = randi([1, 8], [8, 1]);
    reward = randi([-1, 1]);
    isDone = false;
end
% Define the reset function
function [initialObs, loggedSignals] = myResetFunction()
    % reset function logic here, I have used a random intial state
    initialObs = randi([1, 8], [8, 1]);
    loggedSignals = [];
end

For a detailed example please refer to this documentation on training a Custom PG Agent:

https://www.mathworks.com/help/reinforcement-learning/ug/create-custom-pg-agent.html

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

How do I get DQN to output the policy I want

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How do I get DQN to output the policy I want

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기