How to input action in reinforcement learning template environment?

Question

Yang Chen 2023년 3월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment

댓글: Emmanouil Tzorakoleftherakis 2023년 3월 9일

I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.

function this = EdgeEnvironment()

% Initialize Observation settings

ObservationInfo(1) = rlNumericSpec([1 10]);

ObservationInfo(1).Name = 'schedule';

ObservationInfo(1).Description = 'schedule';

ObservationInfo(2) = rlNumericSpec([1 20]);

ObservationInfo(2).Name = 'ppath';

ObservationInfo(2).Description = 'ppath';

ObservationInfo(3) = rlNumericSpec([1 1]);

ObservationInfo(3).Name = 'completionTime';

ObservationInfo(3).Description = 'completionTime';

ObservationInfo(4) = rlNumericSpec([1 1]);

ObservationInfo(4).Name = 'computeDuring';

ObservationInfo(4).Description = 'computeDuring';

% Initialize Action settings

ActionInfo(1) = rlNumericSpec([1 10]);

ActionInfo(1).Name = 'schedule';

ActionInfo(2) = rlNumericSpec([1 20]);

ActionInfo(2).Name = 'ppath';

% The following line implements built-in functions of RL env

this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);

end

The step function was designed like the following.

function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)

LoggedSignals = [];

% distance

node_distance = zeros(this.device_count, this.device_count);

distance = getDistance(this, node_distance);

% parameter list

parameter_list = getstruct(this, distance);

% the parameter list of device

device_list = get_device_list(this);

% Extract action

[schedule_act, ppath_act]=get_act(Action);

% schedule_act = Action{1,1};

% ppath_act = Action{1,2};

% Unpack state vector

last_schedule = schedule_act;

last_ppath = ppath_act;

last_completionTime = this.State{1,3};

last_computeDuring = this.State{1,4};

% Update system states

[schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...

last_schedule, last_ppath, device_list, parameter_list);

[ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...

last_ppath, schedule, stay_node_list, device_list, parameter_list);

prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));

dice = rand(1);

if dice <= prob

last_ppath = ppath;

last_schedule = schedule;

last_stay_node_list = stay_node_list;

last_completionTime = completionTime;

last_computeDuring = computeDuring;

completionTime_iter(end + 1) = completionTime;

else

completionTimer_iter(end + 1) = last_computeDuring;

end

ppath = last_ppath;

schedule = last_schedule;

stay_node_list = last_stay_node_list;

completionTime = last_completionTime;

computeDuring = last_computeDuring;

Observation = {schedule, ppath, completionTime, computeDuring};

this.State = Observation;

% Check terminal condition

completionTime = Observation(3);

computeDuring = Observation(4);

IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;

this.IsDone = IsDone;

% Get reward

Reward = -completionTime;

end

We caculate the action value by the following function.

function [schedule_act, ppath_act] = get_act(action)

schedule_act = action{1,1};

ppath_act = action{1,2};

end

When I run the validateEnvironment function, the error is like the following.

I want to know how to fix them.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 3월 7일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1924530-how-to-input-action-in-reinforcement-learning-template-environment#answer_1187560

Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

Yang Chen 2023년 3월 9일

It is about the size of my discrete action space. For example, my action space is like {[1, 2, 3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}, which follows all random order of 1-3. When we increase the amount of number to 20, the amount of data size is over the system limitation.

Emmanouil Tzorakoleftherakis 2023년 3월 9일

Thanks for clarifying. This is the curse of dimensionality, not much you can do about that other than using a continuous action space unfortunately.

댓글을 달려면 로그인하십시오.

How to input action in reinforcement learning template environment?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to input action in reinforcement learning template environment?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기