정책 및 가치 함수

액터, 크리틱 같은 정책 및 가치 함수 근사기 정의

대부분의 에이전트는 훈련이 진행되는 동안 액터 또는 크리틱을 사용하거나, 둘 모두를 사용합니다. 액터는 수행할 행동을 선택하는 정책을 학습합니다. 크리틱은 정책의 값을 추정하는 가치 (또는 Q-값) 함수를 학습합니다.

Reinforcement Learning Toolbox™는 액터 및 크리틱에 대한 함수 근사기 객체와 사용자 지정 루프 및 배포를 위한 policy 객체를 제공합니다. 근사기 객체는 심층 신경망, 선형 기저 함수 또는 룩업 테이블 같은 다양한 근사 모델을 내부적으로 사용할 수 있습니다.

정책, 가치 함수, 액터 및 크리틱에 대한 소개는 Create Policies and Value Functions 항목을 참조하십시오.

블록

Policy

강화 학습 정책 (R2022b 이후)

함수

모두 확장

액터와 크리틱 만들기

`rlTable`	가치 테이블 또는 Q 테이블
`rlValueFunction`	Value function approximator object for reinforcement learning agents (R2022a 이후)
`rlQValueFunction`	Q-Value function approximator with a continuous or discrete action space reinforcement learning agents (R2022a 이후)
`rlVectorQValueFunction`	Vector Q-value function approximator with hybrid or discrete action space for reinforcement learning agents (R2022a 이후)
`rlContinuousDeterministicActor`	Deterministic actor with a continuous action space for reinforcement learning agents (R2022a 이후)
`rlDiscreteCategoricalActor`	Stochastic categorical actor with a discrete action space for reinforcement learning agents (R2022a 이후)
`rlContinuousGaussianActor`	Stochastic Gaussian actor with a continuous action space for reinforcement learning agents (R2022a 이후)

에이전트에서 액터와 크리틱 가져오기 및 설정하기

`getActor`	Extract actor from reinforcement learning agent
`setActor`	Set actor of reinforcement learning agent
`getCritic`	Extract critic from reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent

근사 모델과 학습 가능한 파라미터 가져오기 및 설정하기

`getModel`	Get approximation model from function approximator object (R2020b 이후)
`setModel`	Set approximation model in function approximator object (R2020b 이후)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object

입력값 정규화하기

`rlNormalizer`	Configure normalization for input of function approximator object (R2024a 이후)
`getNormalizer`	Get normalizer from function approximator object (R2024a 이후)
`setNormalizer`	Set normalizer in function approximator object (R2024a 이후)
`normalize`	Normalize input data using method defined in normalizer object (R2024a 이후)

액터와 크리틱에 대한 훈련 옵션

rlOptimizerOptions 액터와 크리틱에 대한 최적화 옵션 (R2022a 이후)

에이전트에서 Policy 객체 추출하기

`getGreedyPolicy`	Extract greedy (deterministic) policy object from agent (R2022a 이후)
`getExplorationPolicy`	Extract exploratory (stochastic) policy object from agent (R2023a 이후)

사용자 지정 훈련과 배포를 위한 Policy 객체 만들기

`rlMaxQPolicy`	Policy object to generate discrete max-Q actions for custom training loops and application deployment (R2022a 이후)
`rlEpsilonGreedyPolicy`	Policy object to generate discrete epsilon-greedy actions for custom training loops (R2022a 이후)
`rlDeterministicActorPolicy`	Policy object to generate continuous deterministic actions for custom training loops and application deployment (R2022a 이후)
`rlAdditiveNoisePolicy`	Policy object to generate continuous noisy actions for custom training loops (R2022a 이후)
`rlStochasticActorPolicy`	Policy object to generate stochastic actions for custom training loops and application deployment (R2022a 이후)

신경망 환경에 대한 근사기

`rlContinuousDeterministicTransitionFunction`	Deterministic transition function approximator object for neural network-based environment (R2022a 이후)
`rlContinuousGaussianTransitionFunction`	Stochastic Gaussian transition function approximator object for neural network-based environment (R2022a 이후)
`rlContinuousDeterministicRewardFunction`	Deterministic reward function approximator object for neural network-based environment (R2022a 이후)
`rlContinuousGaussianRewardFunction`	Stochastic Gaussian reward function approximator object for neural network-based environment (R2022a 이후)
`rlIsDoneFunction`	Is-done function approximator object for neural network-based environment (R2022a 이후)

행동 및 값 가져오기

`getAction`	Obtain action from agent, actor, or policy object given environment observations (R2020a 이후)
`getValue`	Obtain estimated value from a critic given environment observations and actions (R2020a 이후)
`getMaxQValue`	Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations (R2020a 이후)
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (R2022a 이후)

심층 신경망 계층

`quadraticLayer`	Quadratic layer for actor or critic network
`scalingLayer`	액터 또는 크리틱 신경망의 스케일링 계층
`softplusLayer`	액터 또는 크리틱 신경망의 소프트플러스 계층 (R2020a 이후)
`featureInputLayer`	특징 입력 계층 (R2020b 이후)
`reluLayer`	ReLU(Rectified Linear Unit) 계층
`tanhLayer`	쌍곡탄젠트(tanh) 계층
`fullyConnectedLayer`	완전 연결 계층
`lstmLayer`	RNN(순환 신경망)의 LSTM(장단기 기억) 계층
`softmaxLayer`	소프트맥스 계층

도움말 항목

Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
Import Neural Network Models Using ONNX
You can import existing policies from other deep learning frameworks using the ONNX™ model format.