728x90

https://bnmy6581.tistory.com/133 --(1)

 

[Whisper] Robust Speech Recognition via Large-Scale Weak Supervision - (1)

 

bnmy6581.tistory.com

 

https://arxiv.org/abs/2109.07740 

 

Scaling Laws for Neural Machine Translation

We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model size follows a certain scaling law. Specifically (i) We propose a for

arxiv.org

 

๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค