SAC RL agent does not explore properly (rlSACAgent)

Willemijn Remmerswaal

2021 6월 23

0 답변

조회 수: 8 (30일)

1 개 추천

Hi,

I'm trying to create a SAC RL agent. The agent can set 8 separate continuous actions with the same upper and lower bound (-10 and 10).

During training I observe that the actions chosen are (almost!) always one of the two bounds. So they often fluctuate between the minimum or the maximum. Sporadically another value is chosen for one of the actions.

I've found a similar question HERE, but the answer given did not solve the issue. (The range of the action space for all actions is already the same, and EntropyWeight did not change anything). Besides, I've tried to scale the reward, such as suggested in this article.

Are there any other methods for solving such problem? Or could it be that the must have some patience, and train the agent for more episodes, such that the problem is solved by itself?

Thanks in advance for any reply.

Kind regards,

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Willemijn Remmerswaal 2021년 6월 24일

MATLAB Online에서 열기

Hi Emmanouil,

When I'm using the initialized actor network the actor network looks as follows:

I've also tried to make a customized actor network, but I don't know how much sense it makes. That one is as follows, and showed the same behaviour.

nI = no_states;         % (101) number of inputs (states)
nA = no_actions;        % (8) number of actions (continuous)
nL1 = 128;             
nL2 = 64;
statePath = [
    featureInputLayer(nI,'Normalization','none','Name','state')
    fullyConnectedLayer(nL1, 'Name','commonFC1')
    reluLayer('Name','CommonRelu')];
meanPath = [
    fullyConnectedLayer(nL1,'Name','MeanFC1')
    reluLayer('Name','MeanRelu')
    fullyConnectedLayer(nA,'Name','Mean')
    ];
stdPath = [
    fullyConnectedLayer(nL1,'Name','StdFC1')
    reluLayer('Name','StdRelu')
    fullyConnectedLayer(nA,'Name','StdFC2')
    softplusLayer('Name','StandardDeviation')];     %because the standardDeviation alsways needs to be positive
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('LearnRate',LearningRate);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,'Observation',{'state'},actorOptions);

Touleen Ibrahim 2024년 4월 2일

Hi, I see the question is posted long time ago but I have faced the same problem and found the root cause and I would to share it, hopping it will help others.

The input consists of two types or more of data, normalization of the componant should be considered. Otherwise, the output of the actor neural network will be biased to the larger componant values.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question