[NVIDIA RIVA ASR] ์„ค์น˜ ๊ฐ€์ด๋“œ (feat.nvidia-riva-sdk)
ยท
๐Ÿ‘พ Deep Learning
Step-1 Service-maker ๋ชจ๋ธ ์ƒ์„ฑ ngc ๋“ฑ๋ก 1. ngc ๊ฐ€์ž… 2. nvcr.io์— API ๋“ฑ๋ก docker login nvcr.io # Username: $oauthtoken # Password: [ngc API KEY] service-maker๋กœ ์›ํ•˜๋Š” ๋ชจ๋ธ ์ƒ์„ฑ (STT) 1. git clone riva demo 2. ngc pull riva_quickstart ngc registry resource download-version "nvidia/riva/riva_quickstart:2.8.1" 3. riva network set docker network create riva-speech 4. config ํŒŒ์ผ ์ˆ˜์ • asr_acoustic_model=citrinet_1024 5. ํ•œ..
[ASR, ] Deepspeech2
ยท
๐Ÿ‘พ Deep Learning
Model Info ์ค‘๊ตญ Baidu์—์„œ ๊ณต๊ฐœํ•œ End-to-End ์Œ์„ฑ์ธ์‹ ๋ชจ๋ธ(2015.12) ์Œ์„ฑ๋ฐ์ดํ„ฐ์— Melspectrograms์„ ์ ์šฉ Fourier Transform์‹œ ๋ฐœ์ƒํ•˜๋Š” ๊ฐ ์Œ์„ฑ feature์˜ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์—†๋‹ค. STFT(short time fourier transform)์„ ์ ์šฉ, ์Œ์„ฑ feature๋ฅผ ์ข์€ ๋‹จ์œ„๋กœ FT๋ฅผ ์ ์šฉํ•ด feature์˜ ์œ„์น˜๋ฅผ ๋ฐ˜์˜ ์‚ฌ๋žŒ์€ ์ €์ฃผํŒŒ์ˆ˜์— ๋Œ€ํ•ด ๋ฏผ๊ฐํžˆ ์ž˜ ํŒŒ์•…ํ•œ๋‹ค. ๊ณ ์ฃผํŒŒ์ˆ˜์— ๋Œ€ํ•œ ์Œ์„ฑ์€ ์ž˜ ์ธ์‹ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์ฃผํŒŒ์ˆ˜๋ฅผ ์‚ฌ๋žŒ์˜ ์ธ์‹๋‹จ์œ„๋กœ mel scale ๋ณ€ํ™˜ Mel(f) = 2595 * log(1+ f / 700) Mel feature๋ฅผ CNN๊ณผ RNN์„ ๊ฑฐ์นœ ๋’ค CTC(Connectionist Temporal Classification)์„ ..
๋‹คํ–ˆ๋‹ค