훈련 및 시뮬레이션

강화 학습 에이전트 훈련 및 시뮬레이션하기

훈련이 진행되는 동안 에이전트는 주어진 환경에 대한 최적의 정책을 학습하기 위해 파라미터를 계속해서 업데이트합니다. 시뮬레이션 중에 에이전트는 환경으로부터 관측값과 보상을 받고, 파라미터 업데이트 없이 행동을 환경으로 반환합니다.

Reinforcement Learning Toolbox™는 시뮬레이션을 통해 에이전트를 훈련하고 훈련 결과를 검증하는 함수를 제공합니다. 에이전트 훈련 및 시뮬레이션에 대한 소개는 강화 학습 에이전트 훈련시키기 항목을 참조하십시오.

앱

강화 학습 디자이너

강화 학습 에이전트 설계, 훈련 및 시뮬레이션

함수

모두 확장

에이전트 훈련시키기

`train`	Train reinforcement learning agents within a specified environment
`rlTrainingOptions`	Options for training reinforcement learning agents
`rlMultiAgentTrainingOptions`	Options for training multiple reinforcement learning agents (R2022a 이후)
`trainWithEvolutionStrategy`	Train DDPG, TD3 or SAC agent using an evolutionary strategy within a specified environment (R2023b 이후)
`rlEvolutionStrategyTrainingOptions`	Options for training off-policy reinforcement learning agents using an evolutionary strategy (R2023b 이후)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (R2024a 이후)

오프라인으로 에이전트 훈련시키기

`trainFromData`	Train off-policy reinforcement learning agent using existing data (R2023a 이후)
`rlTrainingFromDataOptions`	Options to train reinforcement learning agents using existing data (R2023a 이후)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (R2024a 이후)

훈련 중 에이전트 평가하기

`rlEvaluator`	Options for evaluating reinforcement learning agents during training (R2023b 이후)
`rlCustomEvaluator`	Custom object for evaluating reinforcement learning agents during training (R2023b 이후)

데이터 기록하기

`rlDataLogger`	Create either a file logger object or a monitor logger object to log training data (R2022b 이후)
`rlDataViewer`	Open Reinforcement Learning Data Viewer tool (R2023a 이후)
`FileLogger`	Log reinforcement learning training data to MAT files (R2022b 이후)
`MonitorLogger`	Log reinforcement learning training data to monitor window (R2022b 이후)
`trainingProgressMonitor`	Monitor and plot training progress for deep learning custom training loops (R2022b 이후)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 이후)
`store`	Store data in the internal memory of a (file or monitor) logger object (R2022b 이후)
`write`	Transfer stored data from the internal logger memory to the logging target (R2022b 이후)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 이후)

에이전트 시뮬레이션하기

`sim`	Simulate trained reinforcement learning agents within specified environment
`rlSimulationOptions`	Options for simulating a reinforcement learning agent within an environment

경험 버퍼

`rlReplayMemory`	Replay memory experience buffer (R2022a 이후)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (R2022b 이후)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (R2023a 이후)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (R2023a 이후)
`append`	Append experiences to replay memory buffer (R2022a 이후)
`sample`	Sample experiences from replay memory buffer (R2022a 이후)
`resize`	재생 메모리 경험 버퍼의 크기 조정 (R2022b 이후)
`allExperiences`	Return all experiences in replay memory buffer (R2022b 이후)
`validateExperience`	Validate experiences for replay memory (R2023a 이후)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (R2023a 이후)

사용자 지정 훈련

`rlOptimizer`	Creates an optimizer object for actors and critics (R2022a 이후)
`runEpisode`	Simulate reinforcement learning environment against policy or agent (R2022a 이후)
`syncParameters`	Modify the learnable parameters of one approximator toward the learnable parameters of another approximator (R2022a 이후)
`update`	Update the state of on optimizer object and a set of learnable parameters using the gradient value (R2022a 이후)
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (R2022a 이후)
`setup`	Set up reinforcement learning environment or initialize data logger object (R2022a 이후)
`cleanup`	Clean up reinforcement learning environment or data logger object (R2022a 이후)
`Future`	Object that supports deferred outputs for reinforcement learning environment simulations running on workers (R2022a 이후)
`fetchNext`	Retrieve next available unread outputs from a reinforcement learning environment simulations running on workers (R2022a 이후)
`fetchOutputs`	Retrieve results from all reinforcement learning environment simulations running on workers (R2022a 이후)
`cancel`	Cancel unfinished reinforcement learning environment simulations on workers (R2022a 이후)
`wait`	Wait for reinforcement learning environment simulations running on a workers to finish (R2022a 이후)
`dlfeval`	사용자 지정 훈련 루프에서의 딥러닝 모델 평가
`dlaccelerate`	Accelerate deep learning function
`AcceleratedFunction`	Accelerated deep learning function

파라미터 가져오기 및 설정하기

`syncParameters`	Modify the learnable parameters of one approximator toward the learnable parameters of another approximator (R2022a 이후)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`policyParameters`	Obtain structure of policy parameters to update policy during simulation or deployment (R2025a 이후)
`updatePolicyParameters`	Update policy according to structure of policy parameters given as input argument (R2025a 이후)

블록

RL Agent	강화 학습 에이전트
Policy	강화 학습 정책 (R2022b 이후)

도움말 항목

훈련 및 시뮬레이션 기본 사항

강화 학습 에이전트 훈련시키기
지정된 환경 내에서 에이전트를 훈련시켜 최적의 정책을 찾습니다.
기본 그리드 월드에서 강화 학습 에이전트 훈련시키기
MATLAB^®에서 그리드 월드를 풀도록 Q-러닝 및 SARSA 에이전트를 훈련시킵니다.
MDP 환경에서 강화 학습 에이전트 훈련시키기
일반 마르코프 결정 과정 환경에서 강화 학습 에이전트를 훈련시킵니다.

강화 학습 디자이너 앱 사용하기

Specify Training Options in Reinforcement Learning Designer
Interactively specify options for training reinforcement learning agents using the Reinforcement Learning Designer app.
Specify Simulation Options in Reinforcement Learning Designer
Interactively specify options for simulating reinforcement learning agents using the Reinforcement Learning Designer app.
강화 학습 디자이너를 사용하여 에이전트 설계 및 훈련하기
강화 학습 디자이너 앱을 사용하여 카트-폴 시스템에 대한 DQN 에이전트를 설계하고 훈련시킵니다.
Tune Hyperparameters Using Reinforcement Learning Designer
Search the hyperparameter space using Reinforcement Learning Designer.

디폴트 에이전트 생성 및 훈련

이산 카트-폴 시스템의 균형을 유지하도록 DQN 에이전트 훈련시키기
MATLAB에서 모델링된 이산 행동 공간 카트-폴 시스템의 균형을 유지하도록 DQN 에이전트를 훈련시킵니다.
진자가 위쪽으로 똑바로 서서 균형을 유지하도록 DQN 에이전트 훈련시키기
Simulink^®에서 모델링된 이산 행동 공간 진자가 위쪽으로 똑바로 서서 균형을 유지하도록 DQN 에이전트를 훈련시킵니다.
진자가 위쪽으로 똑바로 서서 균형을 유지하도록 DDPG 에이전트 훈련시키기
Simulink에서 모델링된 연속 행동 공간 진자의 균형을 유지하도록 DDPG 에이전트를 훈련시킵니다.
카트-폴 시스템이 위쪽으로 똑바로 서서 균형을 유지하도록 DDPG 에이전트 훈련시키기
Simscape™ Multibody™에서 모델링된 연속 행동 공간 카트-폴 시스템이 위쪽으로 똑바로 서서 균형을 유지하도록 DDPG 에이전트를 훈련시킵니다.
Train Default PPO Agent for Discrete Lander Vehicle
Train a default PPO agent to land a discrete action space flying vehicle.

사용자 지정 근사기를 사용하여 에이전트 생성 및 훈련

Train LSPI Agent to Balance Discrete Cart-Pole
Train an LSPI agent to balance discrete action space cart-pole system modeled in MATLAB.
이산 카트-폴 시스템의 균형을 유지하도록 PG 에이전트 훈련시키기
MATLAB에서 모델링된 이산 행동 공간 카트-폴 시스템의 균형을 유지하도록 PG 에이전트를 훈련시킵니다.
Train PG Agent with Custom Actor and Baseline Networks to Control Discrete Double Integrator
Train a PG agent with a custom actor and baseline networks to control a discrete action space double integrator system modeled in MATLAB.
Train DDPG Agent with Custom Networks Using Image Observation
Train a DDPG agent with custom networks using an image-based observation signal.
Train Soft Actor Critic Agent with Custom Networks for Discrete Lander Vehicle
Train a SAC agent to land a discrete action space flying vehicle.

사용자 지정 Simulink 환경을 위한 에이전트 생성 및 훈련

DDPG 에이전트를 사용하여 탱크의 수위 제어하기
Simulink에서 모델링된 플랜트를 훈련 환경으로 설정하여 강화 학습을 사용해 제어기를 훈련시킵니다.
Train DDPG Agent to Swing Up and Balance Pendulum with Bus Signal
Train a DDPG agent to balance a continuous action space pendulum Simulink model that contains observations in a bus signal.

다중 프로세스 및 GPU 사용하기

Train Agents Using Parallel Computing and GPUs
Accelerate agent training by running simulations in parallel on multiple cores, GPUs, clusters or cloud resources.
Train AC Agent to Balance Discrete Cart-Pole Using Parallel Computing
Train an AC agent to control a discrete action space cart-pole system using asynchronous parallel computing.
Train DQN Agent for Lane Keeping Assist Using Parallel Computing
Train a DQN agent for an automated driving application using parallel computing.

훈련 및 시뮬레이션 고급 사항

Train PPO Agent with Curriculum Learning for a Lane Keeping Application
Train a PPO agent for a lane keeping assist task by gradually increasing task complexity.
Train DQN Agent Using Hindsight Experience Replay
Train a DQN agent in a navigation environment with sparse rewards.
Train Reinforcement Learning Agent Offline to Control Quanser QUBE Pendulum
Train TD3 agent offline to control a Quanser QUBE pendulum.
Train Biped Robot to Walk Using Evolution Strategy-Reinforcement Learning Agents
Train TD3 agent using evolutionary strategy.
Create DQN Agent Using Deep Network Designer and Train Using Image Observations
Create a reinforcement learning agent using the Deep Network Designer app from the Deep Learning Toolbox™.
Transfer Learning: Fine-Tune DQN Agent for Pendulum Swing-Up from Earth to Mars
Use transfer learning to partially retrain a DQN agent to swing-up and balance a pendulum with Mars gravity conditions.

훈련 데이터 기록하기 및 하이퍼파라미터 조정하기

Log Training Data to Disk
Log a variety of data to disk while training an agent.
Train Agent or Tune Environment Parameters Using Parameter Sweeping
Tune a DDPG agent using hyperparameter sweeping.
Tune Hyperparameters Using Bayesian Optimization
Tune reinforcement learning hyperparameters using Bayesian optimization.
Configure Exploration for Reinforcement Learning Agents
Use visualization to configure exploration in reinforcement learning agents.

다중 에이전트 훈련

여러 개의 에이전트가 협업해서 작업을 수행하도록 훈련시키기
두 개의 연속 행동 공간 PPO 에이전트가 협업하여 물체를 옮기도록 훈련시킵니다.
Train Multiple Agents for Area Coverage
Train three discrete action space PPO agents to explore a grid-world environment in a collaborative-competitive manner.
Train Multiple Agents for Path Following Control
Train a DQN and a DDPG agent to collaboratively perform adaptive cruise control and lane keeping assist to follow a path.

사용자 지정 에이전트 및 훈련 알고리즘 개발하기

Train Reinforcement Learning Policy Using Custom Training Loop
Train a reinforcement learning policy using your own custom training loop.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.
Custom PPO Training Loop with Random Network Distillation
Use a custom training loop to train a custom PPO policy with random network distillation on a pendulum environment with sparse rewards.
Custom Training Loop with Simulink Action Noise
Use a custom training loop to train a continuous action space reinforcement learning policy in Simulink when action noise is generated within the model.
Custom DQN Training Loop with LSTM Network
Use a custom training loop to train a DQN agent with a LSTM network.

모델 기반의 정책 최적화 에이전트 훈련시키기

Train MBPO Agent to Balance Continuous Cart-Pole System
A model-based reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Model-Based Reinforcement Learning Using Custom Training Loop
Create a model-based reinforcement learning agent using a custom training loop.