By 이태호 in Study — Nov 23, 2023

RL-7. Function Approximation in RL

이번 강의에서는 state가 너무 많을 때 근사하여 value를 예측하는 법을 배운다. 다음 강의에서는 off-policy learning, approximate dynamic programming, learn explicit policy, model-based RL 등을 배울 예정이다.

Large-Scale Reinforcement Learning

Backgammon: $10^20$ states
Go: $10^170$ states
Helicopter: continuous state space
Robots: real world

이런 예시를 보면 state가 매우 많고 복잡함을 알 수 있다.

Value-Function Approximation

지금까지 우리는 lookup table을 고려했다. state마다 entry 값이 있다.

large MDP에서는

저장하기 위한 공간이 너무 많이 필요함
각 state 각각을 학습하기 위한 시간이 너무 많이 필요함
각 state는 fully observable하지 않음

이 문제를 해결하고자 policy를 근사하는 새로운 함수를 업데이트하도록 함

이렇게 하면 처음 보는 state에 대해서도 generalize가 됨

Function classes

Tabular: 각 state마다 entry를 갖고 있는 테이블
State Aggregation: state들을 그룹으로 묶는 것
Linear Function Approximation
- Values are linear function of features: $v_w(s) = w^Tx(s)$
- tabular, state aggregation 모두 linear function approximation의 special version이다
Differentiable Function Approximation
- $v_w(s)$ is a differentiable function of w, could be non-linear

Classes of Function Approximation

Tabular가 좋은 방법론이긴 하지만 scale 가능하진 않다.
Linear는 근거있는 좋은 방법론이지만 좋은 피처가 필요하다.
Non-linear는 설명가능성이 떨어지지만 스케일이 매우 쉽고 좋은 피처를 뽑지 않아도 괜찮다.

Gradient-based algorithms

$v_\pi(S_t)$와 $v_w(S_t)$의 차이로 gradient를 계산하여 업데이트한다.

RL-7. Function Approximation in RL

Large-Scale Reinforcement Learning

Value-Function Approximation

Function classes

Classes of Function Approximation

Gradient-based algorithms

Linear Function Approximation

Control with Value Function Approximation

Convergence and Divergence

Batch Methods

OpenAI DevDay, Opening Keynote

MLOps Now

Large-Scale Reinforcement Learning

Value-Function Approximation

Function classes

Classes of Function Approximation

Gradient-based algorithms

Linear Function Approximation

Control with Value Function Approximation

Convergence and Divergence

Batch Methods

OpenAI DevDay, Opening Keynote

MLOps Now

You might also like...