nlp.seas.harvard.edu/2018/04/01/attention.html#position-wise-feed-forward-networks
The Annotated Transformer
The recent Transformer architecture from “Attention is All You Need” @ NIPS 2017 has been instantly impactful as a new method for machine translation. It also offers a new general architecture for many NLP tasks. The paper itself is very clearly writte
nlp.seas.harvard.edu
FFN(x)=max( 0, xW1+b1 )W2 + b2
class PositionwiseFeedForward(nn.Module):
"Implements FFN equation."
def __init__(self, d_model, d_ff, dropout=0.1):
super(PositionwiseFeedForward, self).__init__()
# Torch linears have a `b` by default.
self.w_1 = nn.Linear(d_model, d_ff)
self.w_2 = nn.Linear(d_ff, d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
return self.w_2(self.dropout(F.relu(self.w_1(x))))
Transformer network ์์ encoder์ decorder๋ ๊ฐ ๊ฐ feed forward network๋ฅผ ๊ฐ์ง๊ณ ์๋ค.
x์ linear transform์ ์ ์ฉ ํ ReLu activation์ ์ํ ํ ๋ค์ linear transform์ ์ ์ฉํ๋ค.
[ReLu ์ ์ฉ ์ด์ ]
[ML / ์ ํ๋์] Linear Transformation in Neural Network
์ด๋ฒ ํฌ์คํธ์์๋ ์ ํ๋์์์ Linear Transformation ( ์ ํ ๋ณํ ) ์ด ์ ๊ฒฝ๋ง ( Neural Network ) ์์ ์ด๋ป๊ฒ ์ฌ์ฉ๋๋์ง ์ ๋ฆฌ๋ฅผ ํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค. What is Linear Transformation ? ์ ํ๋์์์ ๋งํ๋..
woogi-tech.tistory.com
'๐พ Deep Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
VAE(Variational autoencoder) ์ข ๋ฅ (0) | 2021.02.21 |
---|---|
[Transformer] Positional Encoding (3) (0) | 2021.02.20 |
[Transformer] Self-Attension ์ ํ ์ดํ ์ (0) (0) | 2021.02.19 |
VAE(Variational Autoencoder) (3) MNIST (0) | 2021.02.18 |
Tensorflow Initializer ์ด๊ธฐํ ์ข ๋ฅ (0) | 2021.02.18 |