728x90
nlp.seas.harvard.edu/2018/04/01/attention.html#position-wise-feed-forward-networks
FFN(x)=max( 0, xW1+b1 )W2 + b2
class PositionwiseFeedForward(nn.Module):
"Implements FFN equation."
def __init__(self, d_model, d_ff, dropout=0.1):
super(PositionwiseFeedForward, self).__init__()
# Torch linears have a `b` by default.
self.w_1 = nn.Linear(d_model, d_ff)
self.w_2 = nn.Linear(d_ff, d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
return self.w_2(self.dropout(F.relu(self.w_1(x))))
Transformer network ์์ encoder์ decorder๋ ๊ฐ ๊ฐ feed forward network๋ฅผ ๊ฐ์ง๊ณ ์๋ค.
x์ linear transform์ ์ ์ฉ ํ ReLu activation์ ์ํ ํ ๋ค์ linear transform์ ์ ์ฉํ๋ค.
[ReLu ์ ์ฉ ์ด์ ]
๋ฐ์ํ
'๐พ Deep Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
VAE(Variational autoencoder) ์ข ๋ฅ (0) | 2021.02.21 |
---|---|
[Transformer] Positional Encoding (3) (0) | 2021.02.20 |
[Transformer] Self-Attension ์ ํ ์ดํ ์ (0) (0) | 2021.02.19 |
VAE(Variational Autoencoder) (3) MNIST (0) | 2021.02.18 |
Tensorflow Initializer ์ด๊ธฐํ ์ข ๋ฅ (0) | 2021.02.18 |