에이전트

강화 학습 에이전트 생성 및 구성

강화 학습 에이전트는 환경으로부터 관측값과 보상을 받고, 행동을 환경으로 반환합니다. 훈련이 진행되는 동안 에이전트는 주어진 환경에 대한 정책을 개선하기 위해 파라미터를 계속해서 업데이트합니다.

Reinforcement Learning Toolbox™는 Q-러닝, DQN, PG, AC, DDPG, TD3, SAC, PPO와 같이 널리 쓰이는 여러 알고리즘을 사용하는 내장 강화 학습 에이전트를 제공합니다. 사용자 지정 에이전트를 구현할 수도 있습니다.

에이전트에 대한 소개는 강화 학습 에이전트 항목을 참조하십시오. 정책, 가치 함수, 액터 및 크리틱에 대한 소개는 Create Actors, Critics, and Policy Objects 항목을 참조하십시오.

앱

강화 학습 에이전트 설계, 훈련 및 시뮬레이션

강화 학습 에이전트

`rlQAgent`	Q-러닝 강화 학습 에이전트
`rlSARSAAgent`	SARSA 강화 학습 에이전트
`rlLSPIAgent`	Least square policy iteration reinforcement learning agent (R2025a 이후)
`rlDQNAgent`	DQN(심층 Q-신경망) 강화 학습 에이전트
`rlPGAgent`	Policy gradient (PG) reinforcement learning agent
`rlACAgent`	Actor-critic (AC) reinforcement learning agent
`rlPPOAgent`	Proximal policy optimization (PPO) reinforcement learning agent
`rlTRPOAgent`	Trust region policy optimization (TRPO) reinforcement learning agent (R2021b 이후)
`rlDDPGAgent`	DDPG(심층 결정적 정책 경사) 강화 학습 에이전트
`rlTD3Agent`	Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent
`rlSACAgent`	Soft actor-critic (SAC) reinforcement learning agent

`rlQAgentOptions`	Q-러닝 에이전트에 대한 옵션
`rlSARSAAgentOptions`	SARSA 에이전트에 대한 옵션
`rlLSPIAgentOptions`	Options for LSPI agent (R2025a 이후)
`rlDQNAgentOptions`	DQN 에이전트에 대한 옵션
`rlPGAgentOptions`	Options for PG agent
`rlACAgentOptions`	Options for AC agent
`rlPPOAgentOptions`	Options for PPO agent
`rlTRPOAgentOptions`	Options for TRPO agent (R2021b 이후)
`rlDDPGAgentOptions`	DDPG 에이전트에 대한 옵션
`rlTD3AgentOptions`	TD3 에이전트에 대한 옵션
`rlSACAgentOptions`	Options for SAC agent
`rlAgentInitializationOptions`	강화 학습 에이전트를 초기화하는 옵션
`rlConservativeQLearningOptions`	Regularizer options object to train DQN and SAC agents (R2023a 이후)
`rlBehaviorCloningRegularizerOptions`	Regularizer options object to train DDPG, TD3 and SAC agents (R2023a 이후)

`rlMBPOAgent`	Model-based policy optimization (MBPO) reinforcement learning agent (R2022a 이후)
`rlMBPOAgentOptions`	Options for MBPO agent (R2022a 이후)

`getActor`	Extract actor from reinforcement learning agent
`getCritic`	Extract critic from reinforcement learning agent
`setActor`	Set actor of reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent

getAction Obtain action from agent, actor, or policy object given environment observations

`rlReplayMemory`	Replay memory experience buffer (R2022a 이후)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (R2022b 이후)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (R2023a 이후)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (R2023a 이후)
`append`	Append experiences to replay memory buffer (R2022a 이후)
`sample`	Sample experiences from replay memory buffer (R2022a 이후)
`resize`	재생 메모리 경험 버퍼의 크기 조정 (R2022b 이후)
`allExperiences`	Return all experiences in replay memory buffer (R2022b 이후)
`validateExperience`	Validate experiences for replay memory (R2023a 이후)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (R2023a 이후)

`getActionInfo`	강화 학습 환경, 에이전트 또는 경험 버퍼에서 행동 데이터 사양 가져오기
`getObservationInfo`	강화 학습 환경, 에이전트 또는 경험 버퍼에서 관측값 데이터 사양 가져오기

reset Reset environment, agent, experience buffer, or policy object (R2022a 이후)

강화 학습 에이전트
여러 표준 강화 학습 알고리즘 중 하나를 사용하여 에이전트를 만들거나 자신만의 고유한 사용자 지정 에이전트를 정의할 수 있습니다.
Create Agents Using Reinforcement Learning Designer
Interactively create or import agents for training using the Reinforcement Learning Designer app.

Q-러닝 에이전트
Q-러닝 에이전트 설명 및 알고리즘.
SARSA 에이전트
SARSA 에이전트 설명 및 알고리즘.
LSPI Agent
LSPI agent description and algorithm.
DQN(심층 Q-신경망) 에이전트
DQN 에이전트 설명 및 알고리즘.
REINFORCE Policy Gradient (PG) Agent
Vanilla policy gradient agent description and algorithm.
Actor-Critic (AC) Agent
Actor-critic agent description and algorithm.
PPO(근위 정책 최적화) 에이전트
PPO 에이전트 설명 및 알고리즘.
Trust Region Policy Optimization (TRPO) Agent
TRPO agent description and algorithm.
DDPG(심층 결정적 정책 경사) 에이전트
DDPG 에이전트 설명 및 알고리즘.
TD3(Twin-Delayed Deep Deterministic: 트윈 지연 심층 결정적) 정책 경사 에이전트
TD3 에이전트 설명 및 알고리즘.
Soft Actor-Critic (SAC) Agent
SAC agent description and algorithm.
Model-Based Policy Optimization (MBPO) Agent
A model-based (MBPO) reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.

Create Custom Reinforcement Learning Agents
Create custom agents.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.