[ASR, ] Deepspeech2
ยท
๐Ÿ‘พ Deep Learning
Model Info ์ค‘๊ตญ Baidu์—์„œ ๊ณต๊ฐœํ•œ End-to-End ์Œ์„ฑ์ธ์‹ ๋ชจ๋ธ(2015.12) ์Œ์„ฑ๋ฐ์ดํ„ฐ์— Melspectrograms์„ ์ ์šฉ Fourier Transform์‹œ ๋ฐœ์ƒํ•˜๋Š” ๊ฐ ์Œ์„ฑ feature์˜ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์—†๋‹ค. STFT(short time fourier transform)์„ ์ ์šฉ, ์Œ์„ฑ feature๋ฅผ ์ข์€ ๋‹จ์œ„๋กœ FT๋ฅผ ์ ์šฉํ•ด feature์˜ ์œ„์น˜๋ฅผ ๋ฐ˜์˜ ์‚ฌ๋žŒ์€ ์ €์ฃผํŒŒ์ˆ˜์— ๋Œ€ํ•ด ๋ฏผ๊ฐํžˆ ์ž˜ ํŒŒ์•…ํ•œ๋‹ค. ๊ณ ์ฃผํŒŒ์ˆ˜์— ๋Œ€ํ•œ ์Œ์„ฑ์€ ์ž˜ ์ธ์‹ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์ฃผํŒŒ์ˆ˜๋ฅผ ์‚ฌ๋žŒ์˜ ์ธ์‹๋‹จ์œ„๋กœ mel scale ๋ณ€ํ™˜ Mel(f) = 2595 * log(1+ f / 700) Mel feature๋ฅผ CNN๊ณผ RNN์„ ๊ฑฐ์นœ ๋’ค CTC(Connectionist Temporal Classification)์„ ..
๋‹คํ–ˆ๋‹ค
'Deepspeech2' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก