Batch Normalization
ยท
๐Ÿ‘พ Deep Learning
์‹ ๊ฒฝ๋ง์—๋Š” ๊ณผ์ ํ•ฉ๊ณผ Gradient Vanishing ์™ธ์— Internal Covariance shift ๋ฌธ์ œ๊ฐ€ ๋˜ ์žˆ๋‹ค. Internal Covariance shift๋Š” ๊ฐ Layer๋งˆ๋‹ค Input ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ํ•™์Šต ์†๋„๊ฐ€ ๋Š๋ ค์ง€๋Š” ํ˜•์ƒ์ด๋‹ค. Batch Normalization์€ ์ด๋Ÿฐ ๊ฒƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Input ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™” ์‹œ์ผœ ํ•™์Šต ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ์ง„ํ–‰ ์‹œํ‚จ๋‹ค. $BN(h;\gamma,\beta) = \beta + \gamma\tfrac{h-E(h)}{\sqrt{(Var(h)+\epsilon)}}$
GAN (Generative Adversarial Networks)
ยท
๐Ÿ‘พ Deep Learning
GAN GAN์€ ์ƒ์„ฑ์ž(Generator)์™€ ์‹๋ณ„์ž(Discriminator)๋ผ๋Š” ์‹ ๊ฒฝ๋ง 2๊ฐœ๊ฐ€ ์„œ๋กœ ๊ฒฝ์Ÿํ•˜๋ฉด์„œ ํ•™์Šตํ•˜๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์ด๋‹ค. ์ƒ์„ฑ์ž(Generator)๋Š” ๊ฐ€์งœ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๋ชจ๋ธ์ด๋‹ค. ์‹๋ณ„์ž๋ฅผ ์†์ด๋Š” ๋ฐ ๋ชฉ์ ์ด ์žˆ๋‹ค. ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ์ž…๋ ฅํ•ด ๊ฐ€์งœ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด ์‹๋ณ„์ž๊ฐ€ ์ด๋ฅผ ๊ทน๋ณตํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. ์‹๋ณ„์ž(Discriminator)๋Š” ์ƒ์„ฑ์ž๊ฐ€ ๋งŒ๋“  ๊ฐ€์งœ๋ฅผ ์‹๋ณ„ํ•œ๋‹ค๋Š” ๋ชฉ์ ์ด ์žˆ๋‹ค. ์›๋ณธ ๋ฐ์ดํ„ฐ์™€ ์ƒ์„ฑ์ž๊ฐ€ ๋งŒ๋“  ๋ฐ์ดํ„ฐ ๋ชจ๋‘๋ฅผ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ์‚ผ์•„ ์•„์ฃผ ์ž‘์€ ์ฐจ์ด๋„ ์‹๋ณ„ํ•˜๋„๋ก ํ›ˆ๋ จํ•œ๋‹ค. ์ด๋ฏธ์ง€๋ผ๋ฉด ์ƒ์„ฑ์ž๋Š” ์–ด๋–ค ๊ทธ๋ฆผ์˜ ์œ„์ž‘์„ ๋งŒ๋“ค๊ณ  ์‹๋ณ„์ž๋Š” ์œ„์ž‘์ธ์ง€๋ฅผ ๊ฐ์ •ํ•œ๋‹ค. ์ƒ์„ฑ์ž๋Š” ์‹๋ณ„์ž๊ฐ€ ์†์„ ๋งŒํ•œ ์ง„์งœ ๊ทธ๋ฆผ์„ ๊ทธ๋ฆฌ๋ ค๊ณ  ๋…ธ๋ ฅํ•˜๊ณ  ์‹๋ณ„์ž๋Š” ๊ฐ€์งœ ๊ทธ๋ฆผ์ž„์„ ์•Œ์•„๋‚ด๋ ค๊ณ  ๋…ธ๋ ฅํ•œ๋‹ค. ์ด๊ณผ์ •์—์„œ ์ ์  ์›๋ณธ๊ณผ..
์—ญ์ „ํŒŒ (backpropagtion)
ยท
๐Ÿ‘พ Deep Learning
wiki.hash.kr/index.php/%EC%A0%9C%ED%94%84%EB%A6%AC_%ED%9E%8C%ED%8A%BC ์ œํ”„๋ฆฌ ํžŒํŠผ - ํ•ด์‹œ๋„ท ์ œํ”„๋ฆฌ ํžŒํŠผ(Geoffrey Hinton) ์ œํ”„๋ฆฌ ํžŒํŠผ(Geoffrey Hinton)์€ ์ธ๊ณต์ง€๋Šฅ(AI) ๋ถ„์•ผ๋ฅผ ๊ฐœ์ฒ™ํ•œ ์˜๊ตญ ์ถœ์‹ ์˜ ์ธ์ง€์‹ฌ๋ฆฌํ•™์ž์ด์ž ์ปดํ“จํ„ฐ ๊ณผํ•™์ž์ด๋‹ค. ์˜ค๋ฅ˜ ์—ญ์ „ํŒŒ ๋ฒ•๊ณผ ๋”ฅ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์— ๊ธฐ์—ฌํ•˜๊ณ , ํžŒํŠผ ๋‹ค์ด์–ด๊ทธ wiki.hash.kr ํžŒํŠผ์€ ํ•˜๋‚˜์˜ ๋ชฉํ‘œ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํผ์…‰ํŠธ๋ก ์„ ๋™์‹œ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ธฐ์กด์˜ ์‹ ๊ฒฝ๋ง์„ ๋Œ€์ฒดํ•  ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์ž„์„ ํ™•์‹ ํ•˜๊ณ  ์—ฐ๊ตฌํ–ˆ๋‹ค. ๊ฒฐ๊ตญ ๋‹ค์ˆ˜์˜ ํผ์…‰ํŠธ๋ก ์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ๋น„์„ ํ˜• ๋ฌธ์ œ ๋˜ํ•œ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค. ๋ณ‘๋ ฌ๋กœ ๋ฐฐ์น˜๋œ ๋‘ ํผ์…‰ํŠธ๋ก ์— ๊ฐ™์€ ์ž๋ฃŒ๋ฅผ ์ž…๋ ฅํ•  ๋•Œ, ๋‘ ํผ์…‰ํŠธ๋ก ์˜ ์ถœ๋ ฅ์„ ๋ฌด์—‡์œผ๋กœ ํ•˜๋“  ๊ทธ์˜ค์ฐจ..
ํŒŒ์ด์ฌ์˜ ๋‰ด๋Ÿฐ
ยท
๐Ÿ‘พ Deep Learning
ํผ์…‰ํŠธ๋ก ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋‹ค๋ฃฐ ๋•Œ๋Š” ์ถœ๋ ฅ์„ f(x)๋กœ ํ‘œํ˜„ํ•œ๋‹ค. input*weights + bias_weights > 0.5 1? 0? -> 1 predict = 1 import numpy as np example_input = [1,.2,.05,.1,.2] example_weights = [.2,.12,.4,.6,.90] input_vector = np.array(example_input) weights = np.array(example_weights) bias_weights = .2 activation_level = np.dot(input_vector,weights)+(bias_weights * 1) activation_level # 0.684 # threshold threshold = 0.5 if a..
[Transformer] Model ์ •๋ฆฌ
ยท
๐Ÿ‘พ Deep Learning
class MultiHeadAttention(tf.keras.layers.Layer): def __init__(self,**kargs): super(MultiHeadAttention,self).__init__() self.num_heads = kargs['num_heads'] self.d_model = kargs['d_model'] assert self.d_model % self.num_heads == 0 self.depth = self.d_model // self.num_heads self.wq = tf.keras.layers.Dense(kargs['d_model']) self.wk = tf.keras.layers.Dense(kargs['d_model']) self.wv = tf.keras.layers..
VAE(Variational autoencoder) ์ข…๋ฅ˜
ยท
๐Ÿ‘พ Deep Learning
Conditional VAE (์กฐ๊ฑด๋ถ€ VAE) ์กฐ๊ฑด๋ถ€VAE(Conditional VAE)๋Š” ์ž ์žฌ ๋ณ€์ˆ˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ ˆ์ด๋ธ”๋„ ๋””์ฝ”๋”์— ์ž…๋ ฅํ•˜์—ฌ ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๋Š” ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ํ•„๊ธฐ์ฒด ์ˆซ์ž ์ด๋ฏธ์ง€๋ณ„๋กœ ๊ฐ€๋กœ์™€ ์„ธ๋กœ์˜ ์ž ์žฌ ๋ณ€์ˆ˜ 2๊ฐœ๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋ฉฐ ๊ฐ™์€ ์ˆซ์ž๋ผ๋„ ํ•„๊ธฐ์ฒด ์ˆซ์ž ์ด๋ฏธ์ง€๊ฐ€ ๋ฐ”๋€Œ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. VAE๋Š” ๋ณดํ†ต ๋น„์ง€๋„ํ•™์Šต์ด์ง€๋งŒ ์ง€๋„ํ•™์Šต ์š”์†Œ๋ฅผ ์ถ”๊ฐ€ํ•ด ๋น„์ง€๋„ ํ•™์Šต์„ ์‹คํ–‰ํ•˜๋ฉด ๋ณต์›ํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. β-VAE β-VAE๋Š” ์ด๋ฏธ์ง€์˜ 'disentanglement', ์–ฝํžŒ ๊ฒƒ์„ ํ‘ธ๋Š” ๊ฒƒ์ด ํŠน์ง•์ด๋‹ค. ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ๋ถ„๋ฆฌํ•˜๋Š” ์‘์šฉ ๊ธฐ์ˆ ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์–ผ๊ตด ์ด๋ฏธ์ง€๋Š” ์ฒซ ๋ฒˆ์จฐ ์ž ์žฌ ๋ณ€์ˆ˜์—์„œ ๋ˆˆ์˜ ๋ชจ์–‘, ๋‘ ๋ฒˆ์งธ ์ž ์žฌ ๋ณ€์ˆ˜์—์„œ ์–ผ๊ตด ๋ฐฉํ–ฅ์˜ ํŠน์ง•์„ ๋‹ด๋Š”๋‹ค. ์ž ์žฌ ๋ณ€์ˆ˜๋กœ ๋ˆˆ์˜ ๋ชจ์–‘..
[Transformer] Positional Encoding (3)
ยท
๐Ÿ‘พ Deep Learning
nlp.seas.harvard.edu/2018/04/01/attention.html#position-wise-feed-forward-networks The Annotated Transformer The recent Transformer architecture from “Attention is All You Need” @ NIPS 2017 has been instantly impactful as a new method for machine translation. It also offers a new general architecture for many NLP tasks. The paper itself is very clearly writte nlp.seas.harvard.edu class Positiona..
[Transformer] Position-wise Feed-Forward Networks (2)
ยท
๐Ÿ‘พ Deep Learning
nlp.seas.harvard.edu/2018/04/01/attention.html#position-wise-feed-forward-networks The Annotated Transformer The recent Transformer architecture from “Attention is All You Need” @ NIPS 2017 has been instantly impactful as a new method for machine translation. It also offers a new general architecture for many NLP tasks. The paper itself is very clearly writte nlp.seas.harvard.edu FFN(x)=max( 0, ..
๋‹คํ–ˆ๋‹ค
'๐Ÿ‘พ Deep Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (7 Page)