To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

Question

Gowri A.S. 2020년 7월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/560102-to-choose-an-action-is-it-correct-to-compute-the-value-of-successor-state-or-do-we-need-to-compute

댓글: Gowri A.S. 2020년 7월 7일

While selecting an action , that action is chosen whose Q(s,a) is maximum. Q(s,a) is sum of reward and discounted value of next state.

From a state, when I proceed computing the best action, do I need to continue computing (iterating) the value of successor states over a path till the end state

(or)

is it enough to compute the value of immediate successor state alone and decide the action among the state that yields maximum value.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 7월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/560102-to-choose-an-action-is-it-correct-to-compute-the-value-of-successor-state-or-do-we-need-to-compute#answer_461726

Hi Gowri,

Using the Q value for a state+action pair encodes all the information till 'the end of the path' weighted by a discount factor (assuming you are following the same policy).

So assuming you have a critic tha approximates the Q function relatively well, you shouldn't need to check Q values of successor states.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Emmanouil Tzorakoleftherakis 2020년 7월 6일

If the approximation of the Q function is relatively accurate (whether that's through a table, neural network, polynomial, other), then yes, looking at the Q value of the current state/action pair should be sufficient when you are trying to 'extract' the policy.

In fact, if you look at vanilla DQN, even during training the Bellman equation looks one step ahead. I am not saying than n-step learning is not an option, but you certainly don't need all subsequent Q values.

Gowri A.S. 2020년 7월 7일

Thank you v.much Sir. I got it.

댓글을 달려면 로그인하십시오.

To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기