728x90

https://github.com/seohyunjun/RL_SAC/blob/main/README.md

 

GitHub - seohyunjun/RL_SAC: Soft Actor-Critic

Soft Actor-Critic. Contribute to seohyunjun/RL_SAC development by creating an account on GitHub.

github.com

 

* SAC (Soft Actor-Critic)

  • Continuous Action Space / Discrete Action Space ๋ชจ๋“  ๊ณต๊ฐ„์—์„œ ์•ˆ์ •์ ์ธ Policy๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆ
  • ๊ธฐ์กด์˜ DDPG / TD3์—์„œ ํ•œ๋ฒˆ ๋” ๋‚˜์•„๊ฐ€ ๋‹ค์Œ state์˜ action ๋˜ํ•œ ๋ณด๊ณ  ๋‹ค์Œ policy๋ฅผ ์„ ํƒ (์ข‹์€ ์˜์–‘๋ถ„๋งŒ ์ฃผ๊ฒ ๋‹ค)

* Policy Iteration - approximator

  • Policy evaluation
    •  ๊ธฐ์กด์˜ max reward Q-function

 

  • Policy improvement 
    • KL divergence (Kullback-Leibler) ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐ  if a=b, D_kl(a||b)=0 
    • ์ƒˆ๋กœ์šด policy๊ฐ€ ๊ธฐ์กด์˜ policy๋ณด๋‹ค ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คŒ  

 

 

Policy ์˜์‚ฌ ๊ฒฐ์ •
score 38796
continuous control benchmarks.

 continuous action space mujoco benchmark ๊ฒฐ๊ณผ SAC๋Š” ๋‹ค๋ฅธ method์™€ ๋‹ค๋ฅด๊ฒŒ explore๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ DDPG์˜ reward ์Œ์˜์€ ์ƒ๋‹นํžˆ ๋‘๊บผ์šด ๊ฒƒ์œผ๋กœ reward๊ฐ€ ๋“ค์‘ฅ๋‚ ์‘ฅ policy๋ฅผ ์ž˜ ์ฐพ์ง€ ๋ชปํ•˜๊ณ  ์žˆ๋‹ค.

 

 

** SAC๋Š” ๋” ์ข‹์€ policy๋ฅผ ๋ณด์žฅํ•˜๋ฉฐ ์ข‹์€ reward๋ฅผ ๊ธฐ์ค€์œผ๋กœ update

๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค