reinforcement learning will suddenly stop during the training process
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
My reinforcement learning will suddenly stop during the training process, and the following error will appear. The error code is as follows. Is there any effective way to solve this problem? I would be very grateful for your answer.
The error code: ''The derivative of the state in Simulink is not finite, the simulation will stop, and there may be a singularity in the solution.''
채택된 답변
Sam Chak
2023년 11월 1일
HI @嘻嘻
The most common issue that the error message you've encountered in Simulink indicates that there is a "division by zero" in the derivative of the state variable. As the denominator approaches 0, the division value approaches either positive infinity or negative infinity. Because the derivative value is not finite, it can lead to numerical instability in the simulation. When this happens, Simulink may stop the simulation to prevent incorrect results.
You need to review your model equations and block configurations in Simulink to identify the source of the issue. Look for any potential causes of division by zero, or trigonometric terms like 1/cos(θ), or tan(θ), when θ = ±90°.
댓글 수: 9
Thank you very much for your answer and I will take your suggestion.
嘻嘻
2023년 11월 2일
이동: Walter Roberson
2023년 11월 2일
I have checked the model and there is no problem, but such problems still occur.
"The derivative of the state in Simulink is not finite" can also happen if a state goes to +infinity or -infinity or to NaN. This is not always due to division by zero: it can happen if a state grows without bound.
If
is the state derivative, can you output the state signal
to check whether it grows without bound or not? For example, say a second-order system:
If the RL agent learns k in the negative direction, then the state x will grow without bound. So, you need to introduce some kind of penalty to discourage the negative learning behavior. This requires some priori knowledge about the nature of the system.
If the system is a total black box and a complicated high-order nonlinear system, then the learning process will be challenging.
You are right, my controlled object is indeed a black box model with three inputs and three outputs, and checking state derivative is difficult.
My controlled object is a 7×7 state space expression.
@嘻嘻,
It seems like the RL agent may have entered an unstable region. Instead of working with a complete black box, I would suggest identifying the 7th-order linear system, if possible, using the frequency response method. This approach requires the System Identification Toolbox.
Once you have an identified nominal model, you can apply LQR to determine the 'nominal' stabilizing input gains. Using the gain matrix, you can 'guide' the RL agent to explore gains in the vicinity of these values, considering relative deviations as a percentage of the nominal values.
A = magic(7) % hypothetical nominal state matrix identified
A = 7×7
30 39 48 1 10 19 28
38 47 7 9 18 27 29
46 6 8 17 26 35 37
5 14 16 25 34 36 45
13 15 24 33 42 44 4
21 23 32 41 43 3 12
22 31 40 49 2 11 20
B = [zeros(4, 3); eye(3)] % 3 inputs
B = 7×3
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
rk = rank(ctrb(A, B)); % rank of controllability matrix
ToF = logical(rk == length(A)) % check if true (1), the system is controllable
ToF = logical
1
% Then, you can apply LQR to find the stabilizing input gains
Q = eye(7); % <-- to be designed
R = eye(3); % <-- to be designed
K = lqr(A, B, Q, R) % stabilizing input gain matrix
K = 3×7
1.0e+03 *
-0.0738 0.7898 -0.2982 0.1553 0.1512 0.1311 0.0790
0.1511 1.6302 -0.4304 0.0588 0.1311 0.1946 0.1711
0.3558 0.8166 0.0559 0.0348 0.0790 0.1711 0.2302
Thank you very much. I have obtained the gain matrix K according to your method, but I don't know how to proceed in the next step. Could you tell me more about it?
Hi @嘻嘻
From an optimal control perspective, you can specify guidance for the search direction of the RL agent with known nominal values in the performance cost function, including any constraints.
I'm still learning about RL, but my colleagues who work with RL have recommended using the 'generateRewardFunction()' command to design a custom cost function that fits your application. You can find an example at this link:
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기
참고 항목
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
