Adam
Adam: Adaptive moment estimation
Adam = RMSprop + Momentum
Momentum : gradient descent ์ ์ต์์ ์ ์ฐพ๊ธฐ ์ํด ๋ชจ๋ ์คํ ์ ๋ฐ๋ ๊ฒ์ด ์๋ ์คํ ์ ๊ฑด๋ ๋ด๋ค.
Stochastic gradient descent(SGD)
Adagrad
It makes big updates for infrequent parameters and small updates for frequent parameters. For this reason, it is well-suited for dealing with sparse data.
The main benefit of Adagrad is that we don’t need to tune the learning rate manually. Most implementations use a default value of 0.01 and leave it at that.
Disadvantage —
Its main weakness is that its learning rate is always Decreasing and decaying.
AdaDelta
It is an extension of AdaGrad which tends to remove the decaying learning Rate problem of it.
Another thing with AdaDelta is that we don’t even need to set a default learning rate.
'๐พ Deep Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Pytorch] CNN - Conv2D (0) | 2021.04.02 |
---|---|
GTX 1660 super์ ๋ง๋ tensorflow, python, CUDA, Cudnn ๋ฒ์ (4) | 2021.03.28 |
์์ค ํจ์ (loss function) (0) | 2021.03.07 |
tensorboard ์ฌ์ฉ๋ฒ, gpu ํ ๋น ๋ฉ๋ชจ๋ฆฌ ๊ด๋ฆฌ (0) | 2021.03.06 |
OSError: [WinError 127] ์ง์ ๋ ํ๋ก์์ ๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค. Error loading \\torch\\lib\\*_ops_gpu.dll or one of its dependencies. (0) | 2021.03.06 |