Exporting my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow

조회 수: 23 (최근 30일)
I am trying to export my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow,
env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A;
initOpts = rlAgentInitializationOptions();
Obtain observation and action specifications.
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
Create a PPO agent from the environment observation and action specifications. This agent uses default deep neural networks for its actor and critic.
agent = rlPPOAgent(obsInfo,actInfo);
% agent = rlACAgent(actor,critic,agentOpts);
To modify the deep neural networks within a reinforcement learning agent, you must first extract the actor and critic function approximators.
actor = getActor(agent);
critic = getCritic(agent);
Extract the deep neural networks from both the actor and critic function approximators.
actorNet = getModel(actor);
criticNet = getModel(critic);
exportNetworkToTensorFlow(actorNet,"actorNet")
exportNetworkToTensorFlow(criticNet,"criticNet"),
The problem is that, when I import the models in python using tensorflow, after steping into the environment my actor setup consistently outputs the same index position for the maximum probability, even though the values vary the index of the maximum probability stays the same, which leads to the same decision output. This only happens in Python and not in MATLAB. Is there anything wrong with the was I am exporting my trained Neural Network?
Below is the python code for getting the action_log:
# python function to get the state_log and action_log
def eval():
action_log = []
state_log = []
env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A()
observation = env.reset()
observation = tf.ragged.constant(observation)
observation = tf.reshape(observation, (1, -1))
done = False
reward = 0
num_episodes = 720
for episode in range(num_episodes):
state = env.reset()
action_logits = model_actorNet(observation)
actionelements = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [2, 1]])
action_log_prob = tf.argmax(action_logits, axis=-1)
action_index = action_log_prob.numpy().item()
action = actionelements[action_index]
observation, reward, done, _ = env.step(action)
reward += reward
action_log.append(action)
state_log.append(observation)
if done:
break
return np.array(state_log), np.array(action_log)
Any help would be great.

답변 (1개)

Sanjana
Sanjana 2023년 8월 28일
Hi Mahsa,
I understand that you are facing an issue with using the exported “actor” and “critic” models from MATLAB, in python with TensorFlow.
As per the documentation, the code you provided for exporting the trained “actor” and “critic” models, is correct.
The reason for the “actor” to consistently output the same index position, is because of the use of “tf.argmax” function, which is mostly used in the classification tasks and this causes the “actor” to always choose the action with highest probability.
In the context of reinforcement learning, you can use the “tf.random.categorical” function, which is specifically designed for sampling from a categorical distribution, and it allows the “actor” to randomly explore different actions, even if they might not be the most probable ones.
Please refer to the following link, for further information,
Hope this helps!
Regards,
Sanjana

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by