[RL] Stable-baselines3 gym -> gymnasium
ยท
๐Ÿ‘พ Deep Learning
RL ๊ณ„๋ณด๋กœ ๋ณด๋ฉด OpenAI์™€ Deepmind์ด ๋‘˜์ด ๊ฑฐ์˜ ๋‹คํ–ˆ๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.. ์ฝ”๋“œ๋ฉฐ paper๋ฉฐ ํ•˜์ง€๋งŒ ์š”์ฆ˜ RL ๋ณด๋‹ค NLP LLM ๋ชจ๋ธ์— ๊ด€์‹ฌ์ด ์ ๋ฆฌ๋ฉด์„œ ๊ณผ๊ฑฐ OpenAI baseline git ์ด๋‚˜ Deepmind rl acme git์ด ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š๊ณ  ์žˆ๋‹ค. ๊ทธ ์‚ฌ์ด gym์˜ ํ›„์› ์žฌ๋‹จ์ด ๋ฐ”๋€Œ๋ฉด์„œ gymnasium์œผ๋กœ ๋ณ€ํ˜•๋˜๊ณ  ์ผ๋ถ€ return ๋ฐฉ์‹์ด ๋ฐ”๋€Œ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋Œ€๋ถ€๋ถ„์˜ 2~3๋…„์ด ์ง€๋‚œ ์ฝ”๋“œ๋“ค์€ ๊ณผ๊ฑฐ gym๋ฒ„์ „์˜ ํŒจํ‚ค์ง€๊ฐ€ ์•„๋‹ˆ๋ฉด ํ˜ธํ™˜์ด ๋˜์ง€ ์•Š๊ณ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹คํ–‰ํžˆ stable-baselines์—์„œ ์ตœ๊ทผ gymnasium์œผ๋กœ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•ด ์ฃผ์—ˆ๋‹ค. ์ด ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ธฐ์กด ๋Œ€๋ถ€๋ถ„์˜ PPO, HER, DDPG ๋“ฑ RL model์„ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ณ  custom ํ™˜๊ฒฝ๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ..
[RL] Soft Actor-Critic (a.k.a SAC)
ยท
๐Ÿ‘พ Deep Learning
https://github.com/seohyunjun/RL_SAC/blob/main/README.md GitHub - seohyunjun/RL_SAC: Soft Actor-Critic Soft Actor-Critic. Contribute to seohyunjun/RL_SAC development by creating an account on GitHub. github.com * SAC (Soft Actor-Critic) Continuous Action Space / Discrete Action Space ๋ชจ๋“  ๊ณต๊ฐ„์—์„œ ์•ˆ์ •์ ์ธ Policy๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆ ๊ธฐ์กด์˜ DDPG / TD3์—์„œ ํ•œ๋ฒˆ ๋” ๋‚˜์•„๊ฐ€ ๋‹ค์Œ state์˜ action ๋˜ํ•œ ๋ณด๊ณ  ๋‹ค์Œ policy๋ฅผ ์„ ํƒ (์ข‹์€ ์˜์–‘๋ถ„๋งŒ ์ฃผ๊ฒ ๋‹ค) * Pol..
[RL] A3C (Asynchronous Advantage Actor-Critic)
ยท
๐Ÿ‘พ Deep Learning
https://github.com/seohyunjun/RL_A3C GitHub - seohyunjun/RL_A3C: A3C (asynchronous advantage actor-critic) A3C (asynchronous advantage actor-critic). Contribute to seohyunjun/RL_A3C development by creating an account on GitHub. github.com
๋‹คํ–ˆ๋‹ค
'RL' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก