728x90

 Pytorch๋Š” BackPropagtion์„ ๊ตฌํ˜„ํ•  ๋•Œ Autograd ๋ฐฉ์‹์œผ๋กœ ์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋˜์–ด ์žˆ๋‹ค. 

BATCH_SIZE = 64
INPUT_SIZE = 1000
HIDDEN_SIZE = 100
OUTPUT_SIZE = 10

 BATCH ํฌ๊ธฐ 64๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์€ ํ•œ ๋ฒˆ์— ๋“ค์–ด๊ฐ€๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘์ด๋‹ค. INPUT์˜ ํฌ๊ธฐ๋Š” ํ•™์Šต ์‹œํ‚ฌ ๋ฐ์ดํ„ฐ์˜ ์–‘์ด๋‹ค. OUTPUT์€ ๋ง๊ทธ๋Œ€๋กœ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ์ด๋‹ค.

x = torch.randn(BATCH_SIZE,
                INPUT_SIZE,
                device = DEVICE,
                dtype = torch.float,
                requires_grad = False)

y = torch.randn(BATCH_SIZE,
                OUTPUT_SIZE,
                device = DEVICE,
                dtype = torch.float,
                requires_grad = False)

w1 = torch.randn(INPUT_SIZE,
                HIDDEN_SIZE,
                device = DEVICE,
                dtype = torch.float,
                requires_grad = True)

w2 = torch.randn(HIDDEN_SIZE,
                OUTPUT_SIZE,
                device = DEVICE,
                dtype = torch.float,
                requires_grad = True)

 

x :
  torch.randn๋Š” (0,1) ์ •๊ทœ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋งํ•œ ๊ฐ’์ด๋‹ค. INPUT ํฌ๊ธฐ 1000๊ฐœ์˜ ๋ฒกํ„ฐ๋ฅผ BATCH์˜ ํฌ๊ธฐ 64๊ฐœ ๋งŒ๋“ ๋‹ค.
dim = (64,1000) requires_grad๋Š” Gradient๋ฅผ ์˜๋ฏธํ•œ๋‹ค. 

y :
  Output๊ณผ์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด Output ํฌ๊ธฐ๋ฅผ 10์œผ๋กœ ์„ค์ •

w1:
  Input๊ณผ์˜ ํฌ๊ธฐ๊ฐ€ ๋™์ผํ•˜๊ณ  ํ–‰๋ ฌ ๊ณฑ์„ ํ†ตํ•ด 100๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ (1000,100) requires_grad = True, Gradient ๊ณ„์‚ฐ(backpropagation์„ ํ†ตํ•ด w1 ์—…๋ฐ์ดํŠธ)

w2:
  w1๊ณผ x๋ฅผ ํ–‰๋ ฌ ๊ณฑํ•œ ๊ฒฐ๊ณผ์— ๊ณ„์‚ฐํ•  ์ˆ˜์žˆ๋Š” ์ฐจ์›์ด์–ด์•ผํ•œ๋‹ค. w1๊ณผ x์˜ ๊ณฑ์€ (1,100) ์ด๋‹ค ์ด๊ฒƒ์„ (100,10) ํ–‰๋ ฌ์„ ํ†ตํ•ด Output์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋„๋ก w2 ๋ชจ์–‘ ์„ค์ •. requires_grad = True, Gradient ๊ณ„์‚ฐ(backpropagation์„ ํ†ตํ•ด w2 ์—…๋ฐ์ดํŠธ)

learning_rate = 1e-6
for t in range(1,501):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    loss = (y_pred-y).pow(2).sum()
    if t % 100 == 0:
        print('Iteration: ',t,'\t',"Loss: ",loss.item())
    loss.backward()

    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        w1.grad.zero_()
        w2.grad.zero_()

 

  learning_rate : ํ•™์Šต๋ฅ  Gradient ๊ณ„์‚ฐ ๊ฒฐ๊ด๊ฐ’์— learning_rate๋ฅผ ๊ณฑํ•ด์„œ ์—…๋ฐ์ดํŠธ ํ•œ๋‹ค.

  mm(matmul) : Input ๊ฐ’์ธ x๋ฅผ w1๊ณผ matrix multiple      x X w1
 
  clamp :
๋น„์„ ํ˜•ํ•จ์ˆ˜ ์ ์šฉ 

 $y_i$ = \begin{cases} \text{min} & \text{if } x_i < \text{min} \\ x_i & \text{if } \text{min} \leq x_i \leq \text{max} \\ \text{max} & \text{if } x_i > \text{max} \end{cases}

https://blog.clairvoyantsoft.com/the-ascent-of-gradient-descent-23356390836f

  w1.grad.zero_ : Gradient ๊ฐ’์„ ์ดˆ๊ธฐํ™”ํ•ด ๋‹ค์Œ ๋ฐ˜๋ณต๋ฌธ์— ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก Gradient ๊ฐ’์„ 0์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. 

Iteration:  100 	 Loss:  673.0462646484375
Iteration:  200 	 Loss:  8.727155685424805
Iteration:  300 	 Loss:  0.18558651208877563
Iteration:  400 	 Loss:  0.004666611552238464
Iteration:  500 	 Loss:  0.00030295629403553903

 ์ค‘์š”ํ•œ ๊ฒƒ์€ gradient ๊ฐ’์ด ์ดˆ๊ธฐํ™”๋˜๋„ backward ๋ฉ”์†Œ๋“œ๋ฅผ ํ†ตํ•ด backpropagation์„ ์ง„ํ–‰ํ•  ๋•Œ gradient ๊ฐ’์„ ์ƒˆ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค.

 
๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค