[RL] A3C (๋น„๋™๊ธฐ Advantage Actor-Critic) ์ •๋ฆฌ
ยท
๐Ÿ‘พ Deep Learning
Policy-Based ๊ธฐ์กด์— Value Based ์ฆ‰ Q-value๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์€ State์™€ action์— ์˜์กดํ•ด ํ•ญ์ƒ trajectories(state-action-reward sequence)๋ฅผ ๊ตฌํ•ด๋‚˜๊ฐ€์•ผํ•˜๋Š” ์ œ์•ฝ์ด ์žˆ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ Policy-Based๋Š” Q-value๋ฟ ์•„๋‹ˆ๋ผ Policy์— ๋Œ€ํ•œ ์ถ”์ •๋„ ๊ฐ™์ดํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์€ Agent๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ๊ธธ๋กœ ๊ฐ€๋Š” ์ „๋žต์„ ์ฐพ๋Š” ๊ฒƒ์œผ๋กœ Policy-Based๊ฐ€ ์ด๋ฅผ ๋” ์ž˜ ๋ฐ˜์˜ํ•ด์ฃผ์—ˆ๋‹ค. ์žฅ์ ์œผ๋กœ๋Š” - policy๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜๋ฏ€๋กœ ์•ˆ์ •์„ฑ์ด ๋†’๋‹ค.(ํ™˜๊ฒฝ ๋ณ€ํ™”, ๋…ธ์ด์ฆˆ์— ๋œ ๋ฏผ๊ฐ) - ํ™•๋ฅ ์ ์ธ ์ •์ฑ…(Exploration, Exploitation) ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ์กฐ์ ˆํ•˜๋ฉด์„œ π*(Optimal Policy)๋ฅผ ํ•™์Šต - Continuous spa..
[RL] A3C (Asynchronous Advantage Actor-Critic)
ยท
๐Ÿ‘พ Deep Learning
https://github.com/seohyunjun/RL_A3C GitHub - seohyunjun/RL_A3C: A3C (asynchronous advantage actor-critic) A3C (asynchronous advantage actor-critic). Contribute to seohyunjun/RL_A3C development by creating an account on GitHub. github.com
๋‹คํ–ˆ๋‹ค
'A3C' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก