Optimizer ( Adam, SGD )

728x90

Adam

Adam: Adaptive moment estimation

Adam = RMSprop + Momentum

Momentum : gradient descent 시 최소점을 찾기 위해 모든 스텝을 밟는 것이 아닌 스텝을 건너 뛴다.

Stochastic gradient descent(SGD)

Adagrad

It makes big updates for infrequent parameters and small updates for frequent parameters. For this reason, it is well-suited for dealing with sparse data.

The main benefit of Adagrad is that we don’t need to tune the learning rate manually. Most implementations use a default value of 0.01 and leave it at that.

Disadvantage —

Its main weakness is that its learning rate is always Decreasing and decaying.

AdaDelta

It is an extension of AdaGrad which tends to remove the decaying learning Rate problem of it.

Another thing with AdaDelta is that we don’t even need to set a default learning rate.

'👾 Deep Learning' 카테고리의 다른 글

[Pytorch] CNN - Conv2D (0)	2021.04.02
GTX 1660 super에 맞는 tensorflow, python, CUDA, Cudnn 버전 (4)	2021.03.28
손실 함수 (loss function) (0)	2021.03.07
tensorboard 사용법, gpu 할당 메모리 관리 (0)	2021.03.06
OSError: [WinError 127] 지정된 프로시저를 찾을 수 없습니다. Error loading \\torch\\lib\\*_ops_gpu.dll or one of its dependencies. (0)	2021.03.06

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Optimizer ( Adam, SGD )

Adam

Stochastic gradient descent(SGD)

Adagrad

AdaDelta

'👾 Deep Learning' 카테고리의 다른 글

Adam

Stochastic gradient descent(SGD)

Adagrad

AdaDelta

'👾 Deep Learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

'👾 Deep Learning' 카테고리의 다른 글

'👾 Deep Learning' 카테고리의 다른 글

개인정보

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역