Exporting my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow

Question

Mahsa Raeisinezhad 2023년 6월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1978624-exporting-my-trained-actor-critic-nn-agent-from-matlab-reinforcement-environment-to-tensorflow

답변: Sanjana 2023년 8월 28일

I am trying to export my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow,

env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A;

initOpts = rlAgentInitializationOptions();

Obtain observation and action specifications.

obsInfo = getObservationInfo(env);

actInfo = getActionInfo(env);

Create a PPO agent from the environment observation and action specifications. This agent uses default deep neural networks for its actor and critic.

agent = rlPPOAgent(obsInfo,actInfo);

% agent = rlACAgent(actor,critic,agentOpts);

To modify the deep neural networks within a reinforcement learning agent, you must first extract the actor and critic function approximators.

actor = getActor(agent);

critic = getCritic(agent);

Extract the deep neural networks from both the actor and critic function approximators.

actorNet = getModel(actor);

criticNet = getModel(critic);

exportNetworkToTensorFlow(actorNet,"actorNet")

exportNetworkToTensorFlow(criticNet,"criticNet"),

The problem is that, when I import the models in python using tensorflow, after steping into the environment my actor setup consistently outputs the same index position for the maximum probability, even though the values vary the index of the maximum probability stays the same, which leads to the same decision output. This only happens in Python and not in MATLAB. Is there anything wrong with the was I am exporting my trained Neural Network?

Below is the python code for getting the action_log:

# python function to get the state_log and action_log

def eval():

action_log = []

state_log = []

env = Nuc_Maint_Env_Proposal_220211_NPIC_MATLAB2022A()

observation = env.reset()

observation = tf.ragged.constant(observation)

observation = tf.reshape(observation, (1, -1))

done = False

reward = 0

num_episodes = 720

for episode in range(num_episodes):

state = env.reset()

action_logits = model_actorNet(observation)

actionelements = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [2, 1]])

action_log_prob = tf.argmax(action_logits, axis=-1)

action_index = action_log_prob.numpy().item()

action = actionelements[action_index]

observation, reward, done, _ = env.step(action)

reward += reward

action_log.append(action)

state_log.append(observation)

if done:

break

return np.array(state_log), np.array(action_log)

Any help would be great.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sanjana 2023년 8월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1978624-exporting-my-trained-actor-critic-nn-agent-from-matlab-reinforcement-environment-to-tensorflow#answer_1294807

Hi Mahsa,

I understand that you are facing an issue with using the exported “actor” and “critic” models from MATLAB, in python with TensorFlow.

As per the documentation, the code you provided for exporting the trained “actor” and “critic” models, is correct.

The reason for the “actor” to consistently output the same index position, is because of the use of “tf.argmax” function, which is mostly used in the classification tasks and this causes the “actor” to always choose the action with highest probability.

In the context of reinforcement learning, you can use the “tf.random.categorical” function, which is specifically designed for sampling from a categorical distribution, and it allows the “actor” to randomly explore different actions, even if they might not be the most probable ones.

Please refer to the following link, for further information,

https://www.mathworks.com/help/deeplearning/ref/exportnetworktotensorflow.html

Hope this helps!

Regards,

Sanjana

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Exporting my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Exporting my trained actor, critic NN agent from MATLAB Reinforcement Environment to TensorFlow

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기