[main] gtx 1660 super Python = 3.7.6 tensorflow_version=2.4.0 CUDA = 11.0 cudnn = 8.0.5
Optimizer ( Adam, SGD )
ยท
๐พ Deep Learning
Adam Adam: Adaptive moment estimation Adam = RMSprop + Momentum Momentum : gradient descent ์ ์ต์์ ์ ์ฐพ๊ธฐ ์ํด ๋ชจ๋ ์คํ ์ ๋ฐ๋ ๊ฒ์ด ์๋ ์คํ ์ ๊ฑด๋ ๋ด๋ค. Stochastic gradient descent(SGD) Adagrad It makes big updates for infrequent parameters and small updates for frequent parameters. For this reason, it is well-suited for dealing with sparse data. The main benefit of Adagrad is that we don’t need to tune the learning..
OSError: [WinError 127] ์ง์ ๋ ํ๋ก์์ ๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค. Error loading \\torch\\lib\\*_ops_gpu.dll or one of its dependencies.
ยท
๐พ Deep Learning
ํด๋น ์ค๋ฅ๋ pytorch ๋ฒ์ ์ 1.5.1์ดํ๋ก ๋ฎ์ถ๋ฉด ํด๊ฒฐ๋๋ค. ๋ฒ์ ๋ณ ์ค์น ๋ฐฉ๋ฒ pytorch.org/get-started/previous-versions/ PyTorch An open source deep learning platform that provides a seamless path from research prototyping to production deployment. pytorch.org
TFBertModel parameter
ยท
๐พ Deep Learning
huggingface.co/transformers/model_doc/bert.html BERT — transformers 4.3.0 documentation past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) – Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_he huggingface.co vocab_size (int, optional, defaults to 3..