nlp.seas.harvard.edu/2018/04/01/attention.html#position-wise-feed-forward-networks
The Annotated Transformer
The recent Transformer architecture from “Attention is All You Need” @ NIPS 2017 has been instantly impactful as a new method for machine translation. It also offers a new general architecture for many NLP tasks. The paper itself is very clearly writte
nlp.seas.harvard.edu
FFN(x)=max( 0, xW1+b1 )W2 + b2
class PositionwiseFeedForward(nn.Module):
"Implements FFN equation."
def __init__(self, d_model, d_ff, dropout=0.1):
super(PositionwiseFeedForward, self).__init__()
# Torch linears have a `b` by default.
self.w_1 = nn.Linear(d_model, d_ff)
self.w_2 = nn.Linear(d_ff, d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
return self.w_2(self.dropout(F.relu(self.w_1(x))))
Transformer network 에서 encoder와 decorder는 각 각 feed forward network를 가지고 있다.
x에 linear transform을 적용 후 ReLu activation을 수행 후 다시 linear transform을 적용한다.
[ReLu 적용 이유]
[ML / 선형대수] Linear Transformation in Neural Network
이번 포스트에서는 선형대수에서 Linear Transformation ( 선형 변환 ) 이 신경망 ( Neural Network ) 에서 어떻게 사용되는지 정리를 해보도록 하겠습니다. What is Linear Transformation ? 선형대수에서 말하는..
woogi-tech.tistory.com
'Deep Learning' 카테고리의 다른 글
VAE(Variational autoencoder) 종류 (0) | 2021.02.21 |
---|---|
[Transformer] Positional Encoding (3) (0) | 2021.02.20 |
[Transformer] Self-Attension 셀프 어텐션 (0) (0) | 2021.02.19 |
VAE(Variational Autoencoder) (3) MNIST (0) | 2021.02.18 |
Tensorflow Initializer 초기화 종류 (0) | 2021.02.18 |