주요 콘텐츠

이 번역 페이지는 최신 내용을 담고 있지 않습니다. 최신 내용을 영문으로 보려면 여기를 클릭하십시오.

정책 및 가치 함수

액터, 크리틱 같은 정책 및 가치 함수 근사기 정의

대부분의 에이전트는 훈련이 진행되는 동안 액터 또는 크리틱을 사용하거나, 둘 모두를 사용합니다. 액터는 수행할 행동을 선택하는 정책을 학습합니다. 크리틱은 정책의 값을 추정하는 가치 (또는 Q-값) 함수를 학습합니다.

Reinforcement Learning Toolbox™는 액터 및 크리틱에 대한 함수 근사기 객체와 사용자 지정 루프 및 배포를 위한 policy 객체를 제공합니다. 근사기 객체는 심층 신경망, 선형 기저 함수 또는 룩업 테이블 같은 다양한 근사 모델을 내부적으로 사용할 수 있습니다.

정책, 가치 함수, 액터 및 크리틱에 대한 소개는 Create Policies and Value Functions 항목을 참조하십시오.

블록

Policy강화 학습 정책 (R2022b 이후)

함수

모두 확장

rlTable가치 테이블 또는 Q 테이블
rlValueFunctionValue function approximator object for reinforcement learning agents (R2022a 이후)
rlQValueFunction Q-Value function approximator with a continuous or discrete action space reinforcement learning agents (R2022a 이후)
rlVectorQValueFunction Vector Q-value function approximator with hybrid or discrete action space for reinforcement learning agents (R2022a 이후)
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents (R2022a 이후)
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents (R2022a 이후)
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents (R2022a 이후)
rlHybridStochasticActorHybrid stochastic actor with a hybrid action space for reinforcement learning agents (R2024b 이후)
getActorExtract actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticExtract critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
getModelGet approximation model from function approximator object
setModelSet approximation model in function approximator object
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
syncParametersModify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 이후)
rlNormalizerConfigure normalization for input of function approximator object (R2024a 이후)
getNormalizerGet normalizer from function approximator object (R2024a 이후)
setNormalizerSet normalizer in function approximator object (R2024a 이후)
normalizeNormalize input data using method defined in normalizer object (R2024a 이후)
rlOptimizerOptions액터와 크리틱에 대한 최적화 옵션 (R2022a 이후)
getGreedyPolicyExtract greedy (deterministic) policy object from agent (R2022a 이후)
getExplorationPolicyExtract exploratory (stochastic) policy object from agent (R2023a 이후)
rlOptimizerCreates an optimizer object for actors and critics (R2022a 이후)
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment (R2022a 이후)
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops (R2022a 이후)
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment (R2022a 이후)
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops (R2022a 이후)
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment (R2022a 이후)
rlHybridStochasticActorPolicyPolicy object to generate hybrid stochastic actions for custom training loops and application deployment (R2024b 이후)
syncParametersModify the learnable parameters of one approximator towards the learnable parameters of another approximator (R2022a 이후)
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
policyParametersObtain structure of policy parameters to update policy during simulation or deployment (R2025a 이후)
updatePolicyParametersUpdate policy according to structure of policy parameters given as input argument (R2025a 이후)
rlContinuousDeterministicTransitionFunctionDeterministic transition function approximator object for neural network-based environment (R2022a 이후)
rlContinuousGaussianTransitionFunctionStochastic Gaussian transition function approximator object for neural network-based environment (R2022a 이후)
rlContinuousDeterministicRewardFunctionDeterministic reward function approximator object for neural network-based environment (R2022a 이후)
rlContinuousGaussianRewardFunctionStochastic Gaussian reward function approximator object for neural network-based environment (R2022a 이후)
rlIsDoneFunctionIs-done function approximator object for neural network-based environment (R2022a 이후)
getActionObtain action from agent, actor, or policy object given environment observations
getValueObtain estimated value from a critic given environment observations and actions
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data (R2022a 이후)
quadraticLayerQuadratic layer
scalingLayerScaling layer
softplusLayerSoftplus layer
featureInputLayer특징 입력 계층
reluLayerReLU(Rectified Linear Unit) 계층
tanhLayer쌍곡탄젠트(tanh) 계층
fullyConnectedLayer완전 연결 계층
lstmLayerRNN(순환 신경망)의 LSTM(장단기 기억) 계층
softmaxLayer소프트맥스 계층

도움말 항목