[RL] Stable-baselines3 gym -> gymnasium
ยท
๐Ÿ‘พ Deep Learning
RL ๊ณ„๋ณด๋กœ ๋ณด๋ฉด OpenAI์™€ Deepmind์ด ๋‘˜์ด ๊ฑฐ์˜ ๋‹คํ–ˆ๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.. ์ฝ”๋“œ๋ฉฐ paper๋ฉฐ ํ•˜์ง€๋งŒ ์š”์ฆ˜ RL ๋ณด๋‹ค NLP LLM ๋ชจ๋ธ์— ๊ด€์‹ฌ์ด ์ ๋ฆฌ๋ฉด์„œ ๊ณผ๊ฑฐ OpenAI baseline git ์ด๋‚˜ Deepmind rl acme git์ด ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š๊ณ  ์žˆ๋‹ค. ๊ทธ ์‚ฌ์ด gym์˜ ํ›„์› ์žฌ๋‹จ์ด ๋ฐ”๋€Œ๋ฉด์„œ gymnasium์œผ๋กœ ๋ณ€ํ˜•๋˜๊ณ  ์ผ๋ถ€ return ๋ฐฉ์‹์ด ๋ฐ”๋€Œ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋Œ€๋ถ€๋ถ„์˜ 2~3๋…„์ด ์ง€๋‚œ ์ฝ”๋“œ๋“ค์€ ๊ณผ๊ฑฐ gym๋ฒ„์ „์˜ ํŒจํ‚ค์ง€๊ฐ€ ์•„๋‹ˆ๋ฉด ํ˜ธํ™˜์ด ๋˜์ง€ ์•Š๊ณ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹คํ–‰ํžˆ stable-baselines์—์„œ ์ตœ๊ทผ gymnasium์œผ๋กœ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•ด ์ฃผ์—ˆ๋‹ค. ์ด ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ธฐ์กด ๋Œ€๋ถ€๋ถ„์˜ PPO, HER, DDPG ๋“ฑ RL model์„ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ณ  custom ํ™˜๊ฒฝ๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ..
๋‹คํ–ˆ๋‹ค
'stable-baseline' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก