Refer to 6.4 (Sarsa: On-Policy TD Control), Reinforcement learning: An introduction, RS Sutton, AG Barto , MIT press
In this demo, two different mazes have been solved by Reinforcement Learning technique, SARSA.
State-Action-Reward-State-Action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning.
SARSA, Updation of Action-Value Function:
Q(S{t}, A{t}) := Q(S{t}, A{t}) + α*[ R{t+1} + γ ∗ Q(S{t+1}, A{t+1}) − Q(S{t}, A{t}) ]
Learning rate (α)
The learning rate determines to what extent the newly acquired information will override the old information. A factor of 0 will make the agent not learn anything, while a factor of 1 would make the agent consider only the most recent information.
Discount factor (γ)
The discount factor determines the importance of future rewards. A factor of 0 will make the agent "opportunistic" by only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the Q values may diverge.
Note: Convergence is tested on particular examples, in general convergence is not sure for above demo.
인용 양식
Bhartendu (2024). SARSA Reinforcement Learning (https://www.mathworks.com/matlabcentral/fileexchange/63089-sarsa-reinforcement-learning), MATLAB Central File Exchange. 검색 날짜: .
MATLAB 릴리스 호환 정보
플랫폼 호환성
Windows macOS Linux카테고리
태그
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!버전 | 게시됨 | 릴리스 정보 | |
---|---|---|---|
1.0.0.0 |