[Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (4)
ยท
๐Ÿ‘พ Deep Learning
https://bnmy6581.tistory.com/133 --(1) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistory.com https://bnmy6581.tistory.com/134 --(2) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistory.com https://bnmy6581.tistory.com/135--(3) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistor..
[Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (3)
ยท
๐Ÿ‘พ Deep Learning
https://bnmy6581.tistory.com/133 --(1) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistory.com https://bnmy6581.tistory.com/134 --(2) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (2) https://bnmy6581.tistory.com/133 --(1) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistory.com https://arxiv...
[Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (2)
ยท
๐Ÿ‘พ Deep Learning
https://bnmy6581.tistory.com/133 --(1) [Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1) bnmy6581.tistory.com https://arxiv.org/abs/2109.07740 Scaling Laws for Neural Machine Translation We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model..
[Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1)
ยท
๐Ÿ‘พ Deep Learning
[Whisper] Kspon Valid --- (2) CER
ยท
๐Ÿ‘พ Deep Learning
Robust Speech Recognition via Large-Scale Weak Supervision *large model์€ 2023.1 large-v2์™€ ๋™์ผํ•˜๊ฒŒ ๋ฐ”๋€œ KsponSpeech ๋ฐ์ดํ„ฐ๋Š” ์งง์€ ๋ฐœํ™”์˜ audio๋ฅผ ์ฃผ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค. Whisper๋Š” 99๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ์ฒ˜์Œ ๋ฐœํ™”์— ๋Œ€ํ•œ ์–ธ์–ด ์˜ˆ์ธก(language identification)์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ๋„ˆ๋ฌด ์งง์€ ๋ฐœํ™” ๊ฐ™์€ ๊ฒฝ์šฐ whisper๊ฐ€ ๋‹ค๋ฅธ ์–ธ์–ด๋กœ ์˜ˆ์ธกํ•ด translate ์ž์ฒด๊ฐ€ ํ‹€๋ ค๋ฒ„๋ ค CER์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. language Configure์„ korean์œผ๋กœ ์„ค์ •ํ•˜๋ฉด language identification์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ  ๋ฐ”๋กœ transcript๋กœ ์˜ˆ์ธกํ•ด ๋” ์ข‹์€ ์„ฑ๊ณผ๊ฐ€ ๋‚ฌ๋‹ค. model size๋Š” ์˜ˆ..
[Whisper] Koreanspon Valid
ยท
๐Ÿ‘พ Deep Learning
https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=123 AI-Hub ๋ถ„์•ผํ•œ๊ตญ์–ด ์œ ํ˜• ์˜ค๋””์˜ค , ํ…์ŠคํŠธ ๊ฐฑ์‹ ๋…„์›” : 2023-02 ๊ตฌ์ถ•๋…„๋„ : 2018 ์กฐํšŒ์ˆ˜ : 6,273 ๋‹ค์šด๋กœ๋“œ : 12,094 ์šฉ๋Ÿ‰ : ๋‹ค์šด๋กœ๋“œ ๊ด€์‹ฌ๋ฐ์ดํ„ฐ ๋“ฑ๋ก ๊ด€์‹ฌ 31 aihub.or.kr Whisper ํ•œ๊ตญ์–ด ์Œ์„ฑ (Ksponspeech dataset) ๊ฒ€์ฆ Kspon์˜ eval_clean๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด Whisper์˜ ์„ฑ๋Šฅ ์ง€ํ‘œ๋ฅผ ์ž‘์„ฑํ•ด๋ณด์•˜๋‹ค. ์‚ฌ์šฉ ๋ชจ๋ธ (large ๋ชจ๋ธ, ์ถ”ํ›„ ๋‹ค์–‘ํ•œ ์‹คํ—˜ ์ง„ํ–‰) ์ฒซ ์‹คํ—˜์€ large ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด language identification์„ ํ•œ๊ตญ์–ด๋กœ ์„ค์ •ํ•˜์ง€ ์•Š๊ณ  CER์„ ๊ณ„์‚ฐํ–ˆ๋‹ค. ๊ฒฐ๊ณผ 0.42๋กœ ๋†€..
[Whisper] (1) - Abstract & Introduction
ยท
๐Ÿ‘พ Deep Learning
https://github.com/openai/whisper GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision github.com Paper Review Abstract & Introduction 680,000 ์‹œ๊ฐ„์˜ ๋‹ค๊ตญ์–ด ํ•™์Šต์„ ์ง„ํ–‰ ์‹œ fine-tuning ์—†์ด zero-shot transfer benchmark ์ˆ˜์ค€์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ..
[NVIDIA RIVA ASR] ์„ค์น˜ ๊ฐ€์ด๋“œ (feat.nvidia-riva-sdk)
ยท
๐Ÿ‘พ Deep Learning
Step-1 Service-maker ๋ชจ๋ธ ์ƒ์„ฑ ngc ๋“ฑ๋ก 1. ngc ๊ฐ€์ž… 2. nvcr.io์— API ๋“ฑ๋ก docker login nvcr.io # Username: $oauthtoken # Password: [ngc API KEY] service-maker๋กœ ์›ํ•˜๋Š” ๋ชจ๋ธ ์ƒ์„ฑ (STT) 1. git clone riva demo 2. ngc pull riva_quickstart ngc registry resource download-version "nvidia/riva/riva_quickstart:2.8.1" 3. riva network set docker network create riva-speech 4. config ํŒŒ์ผ ์ˆ˜์ • asr_acoustic_model=citrinet_1024 5. ํ•œ..
๋‹คํ–ˆ๋‹ค
'๐Ÿ‘พ Deep Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (3 Page)