728x90

 

 

https://analyticsindiamag.com/has-openai-surpassed-deepmind/

 

 RL ๊ณ„๋ณด๋กœ ๋ณด๋ฉด OpenAI์™€ Deepmind์ด ๋‘˜์ด ๊ฑฐ์˜ ๋‹คํ–ˆ๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.. ์ฝ”๋“œ๋ฉฐ paper๋ฉฐ ํ•˜์ง€๋งŒ ์š”์ฆ˜ RL ๋ณด๋‹ค NLP LLM ๋ชจ๋ธ์— ๊ด€์‹ฌ์ด ์ ๋ฆฌ๋ฉด์„œ ๊ณผ๊ฑฐ OpenAI baseline git ์ด๋‚˜ Deepmind rl acme git์ด ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š๊ณ  ์žˆ๋‹ค. ๊ทธ ์‚ฌ์ด gym์˜ ํ›„์› ์žฌ๋‹จ์ด ๋ฐ”๋€Œ๋ฉด์„œ gymnasium์œผ๋กœ ๋ณ€ํ˜•๋˜๊ณ  ์ผ๋ถ€ return ๋ฐฉ์‹์ด ๋ฐ”๋€Œ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋Œ€๋ถ€๋ถ„์˜ 2~3๋…„์ด ์ง€๋‚œ ์ฝ”๋“œ๋“ค์€ ๊ณผ๊ฑฐ gym๋ฒ„์ „์˜ ํŒจํ‚ค์ง€๊ฐ€ ์•„๋‹ˆ๋ฉด ํ˜ธํ™˜์ด ๋˜์ง€ ์•Š๊ณ ์žˆ๋‹ค. 

 

 ๊ทธ๋Ÿฌ๋‚˜ ๋‹คํ–‰ํžˆ stable-baselines์—์„œ ์ตœ๊ทผ gymnasium์œผ๋กœ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•ด ์ฃผ์—ˆ๋‹ค. ์ด ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ธฐ์กด ๋Œ€๋ถ€๋ถ„์˜ PPO, HER, DDPG ๋“ฑ RL model์„ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ณ  custom ํ™˜๊ฒฝ๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค.

 

 

https://github.com/DLR-RM/stable-baselines3

 

GitHub - DLR-RM/stable-baselines3: PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algor

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms. - GitHub - DLR-RM/stable-baselines3: PyTorch version of Stable Baselines, reliable implementatio...

github.com

 

 baseline2์—์„œ๋Š” Tensorflow๋ฅผ ์ง€์›.

ํ˜„์žฌ Trend์— ๋งž์ถฐ PyTorch์™€ Test ํ™˜๊ฒฝ ๊ตฌ์ถ•

 

https://jmlr.org/papers/volume22/20-1364/20-1364.pdf 

๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค