728x90

  input#1์„ ๊ธฐ์ค€์œผ๋กœ #2, #3์™€์˜ ๊ด€๊ณ„๋ฅผ score๋กœ ๋งŒ๋“ค๊ณ  output #1์„ ๋งŒ๋“ ๋‹ค. ๊ทธ๋ฆฌ๊ณ  #2์™€ #1, #3์™€์˜ score๋ฅผ ๊ตฌํ•˜๊ณ  ๋‹ค์Œ #์œผ๋กœ ๋„˜์–ด๊ฐ€๋ฉด์„œ score๋ฅผ ๊ตฌํ•œ๋‹ค. ์ด ์ ์ˆ˜ score๋ฅผ ๋ชจ์•„ attention map์„ ๋งŒ๋“ ๋‹ค. 

https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a

1. Illustrations

The illustrations are divided into the following steps:

  1. Prepare inputs
  2. Initialise weights
  3. Derive key, query and value
  4. Calculate attention scores for Input 1
  5. Calculate softmax
  6. Multiply scores with values
  7. Sum weighted values to get Output 1
  8. Repeat steps 4โ€“7 for Input 2 & Input 3


Step 1: Prepare inputs

Input 1: [1, 0, 1, 0]
Input 2: [0, 2, 0, 2]
Input 3: [1, 1, 1, 1] 

๋ฐ์ดํ„ฐ ์ž…๋ ฅ

Step 2: Initialise weights

 

 ๊ฐ ์ธํ’‹ ๊ฐ’์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•œ ๋’ค ๊ฐ’์„ ๊ตฌํ•œ๋‹ค.

 ์‹ ๊ฒฝ๋ง(neural network)์€ ์—ฐ์†ํ™•๋ฅ  ๋ถ„ํฌ์ธ Gaussian, Xavier, He Kaming ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•œ๋‹ค. ์ด ์ดˆ๊ธฐํ™”๋Š” training ์ „์— ํ•œ๋ฒˆ ์‚ฌ์šฉํ•œ๋‹ค.

key:  [[0, 0, 1],    query[[1, 0, 1],       value: [[0, 2, 0],
       [1, 1, 0],                [1, 0, 0],                 [0, 3, 0],
       [0, 1, 0],                [0, 0, 1],                [1, 0, 3],
       [1, 1, 0]]                [0, 1, 1]]                [1, 1, 0]]

 

 

Step 3: Derive key, query and value

๋‚ด์  ์—ฐ์‚ฐ์„

  Input #1 : [1, 0, 1, 0]
             
                  [0, 0, 1]

[1, 0, 1, 0] X [1, 1, 0] = [0, 1, 1]
                  [0, 1, 0] 
                  [1, 1, 0]

 input #2 :  [0, 2, 0, 2]

                 [0, 0, 1]

[0, 2, 0, 2] x [1, 1, 0] = [4, 4, 0] 
                  [0, 1, 0]
                  [1, 1, 0]

# key : 
                                      [0, 0, 1]
                     [1, 0, 1, 0]   [1, 1, 0]     [0, 1, 1]
                     [0, 2, 0, 2] x [0, 1, 0] =  [4, 4, 0]
                     [1, 1, 1, 1]    [1, 1, 0]     [2, 3, 1]

    Value : 
                            [0, 2, 0]
            [1, 0, 1, 0]   [0, 3, 0]     [1, 2, 3]
            [0, 2, 0, 2] x [1, 0, 3] =  [2, 8, 0]
            [1, 1, 1, 1]   [1, 1, 0]     [2, 6, 3]

query :
                              [1, 0, 1]
              [1, 0, 1, 0]   [1, 0, 0]    [1, 0, 2]

              [0, 2, 0, 2] x [0, 0, 1] = [2, 2, 2]
              [1, 1, 1, 1]   [0, 1, 1]    [2, 1, 3]

Step 4: Calculate attention scores for Input 1

 attention scores ๊ตฌํ•˜๊ธฐ, query(red)์™€ ๋ชจ๋“  keys(orange)์™€ ๋‚ด์ ํ•œ๋‹ค.

 query $\odot$ keys = attention scores

#1 attention scores : 
#input 1 query & #1 key = 2 
#input 1 query & #2 key = 4 
#input 1 query & #3 key = 4      


                             [0, 4, 2]
               [1, 0, 2] x [1, 4, 3] = [2, 4, 4]
                             [1, 0, 1] 


Step 5: Calculate softmax

 attention scores ๊ฐ’์„ softmax๋กœ 0~1๋กœ ํ™•๋ฅ ๊ฐ’์œผ๋กœ ๋ณ€ํ˜•ํ•œ๋‹ค.

                softmax([2, 4, 4]) = [0.0, 0.5, 0.5]


Step 6: Multiply scores with values

 softmax([2, 4, 4]) = [0.0, 0.5, 0.5] softmax๋กœ ๋ณ€ํ™˜ํ•œ attention scores๊ฐ’์„ ๊ฐ€์ค‘์น˜(weighted values)๋กœ ์‚ฌ์šฉํ•ด ๊ฐ value์™€ ๊ณฑํ•œ๋‹ค.

              1: 0.0 * [1, 2, 3] = [0.0, 0.0, 0.0]
              2: 0.5 * [2, 8, 0] = [1.0, 4.0, 0.0]
              3: 0.5 * [2, 6, 3] = [1.0, 3.0, 1.5]

Step 7: Sum weighted values to get Output 1

 ๊ฐ€์ค‘์น˜ ๊ฐ’ (weighted values (yellow))๊ณผ value ๊ฐ’์„ ๊ณฑํ•œ ๊ฐ’์„ ๋ชจ๋‘ ํ•ฉ์นœ๋‹ค.
                 [0.0, 0.0, 0.0]
              + [1.0, 4.0, 0.0]
              + [1.0, 3.0, 1.5]
              -----------------
              = [2.0, 7.0, 1.5] <- input #1์˜ output 

Step 8: Repeat for Input 2 & Input 3 

input #2์™€ input #3์—๋„ ๋™์ผ ์—ฐ์‚ฐ

## query์™€ key์˜ ๊ฐ™์€ dim์ด์–ด์•ผํ•œ๋‹ค. ๋‚ด์ ์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ๋Ÿฌ๋‚˜ value์˜ dim์€ output์˜ ๋ชจ์–‘์— ๋งž์ถ”๋ฉด ๋œ๋‹ค.


[์ถœ์ฒ˜] towardsdatascience.com/illustrated-self-attention-2d627e33b20a

 

NLP์— ํ™œ์šฉ

 

 

๋ฐ˜์‘ํ˜•

'๐Ÿ‘พ Deep Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Transformer] Positional Encoding (3)  (0) 2021.02.20
[Transformer] Position-wise Feed-Forward Networks (2)  (0) 2021.02.20
VAE(Variational Autoencoder) (3) MNIST  (0) 2021.02.18
Tensorflow Initializer ์ดˆ๊ธฐํ™” ์ข…๋ฅ˜  (0) 2021.02.18
VAE(Variational Autoencoder) (2)  (0) 2021.02.18
๋‹คํ–ˆ๋‹ค