[BERT] 왜 BERT는 15%의 비율로 모델링 했을까?
·
🗣️ Natural Language Processing
"Should You Mask 15% in Masked Language Modeling?" https://arxiv.org/abs/2202.08005 Should You Mask 15% in Masked Language Modeling? Masked language models (MLMs) conventionally mask 15% of tokens due to the belief that more masking would leave insufficient context to learn good representations; this masking rate has been widely used, regardless of model sizes or masking strategies. In arxiv.org..