huggingface.co/transformers/model_doc/bert.html
-
vocab_size (int, optional, defaults to 30522) – Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling BertModel or TFBertModel.
-
hidden_size (int, optional, defaults to 768) – Dimensionality of the encoder layers and the pooler layer.
-
num_hidden_layers (int, optional, defaults to 12) – Number of hidden layers in the Transformer encoder.
-
num_attention_heads (int, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.
-
intermediate_size (int, optional, defaults to 3072) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
-
hidden_act (str or Callable, optional, defaults to "gelu") – The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
-
hidden_dropout_prob (float, optional, defaults to 0.1) – The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
-
attention_probs_dropout_prob (float, optional, defaults to 0.1) – The dropout ratio for the attention probabilities.
-
max_position_embeddings (int, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
-
type_vocab_size (int, optional, defaults to 2) – The vocabulary size of the token_type_ids passed when calling BertModel or TFBertModel.
-
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
-
layer_norm_eps (float, optional, defaults to 1e-12) – The epsilon used by the layer normalization layers.
-
gradient_checkpointing (bool, optional, defaults to False) – If True, use gradient checkpointing to save memory at the expense of slower backward pass.
-
position_embedding_type (str, optional, defaults to "absolute") – Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).
-
use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.
'๐พ Deep Learning' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
tensorboard ์ฌ์ฉ๋ฒ, gpu ํ ๋น ๋ฉ๋ชจ๋ฆฌ ๊ด๋ฆฌ (0) | 2021.03.06 |
---|---|
OSError: [WinError 127] ์ง์ ๋ ํ๋ก์์ ๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค. Error loading \\torch\\lib\\*_ops_gpu.dll or one of its dependencies. (0) | 2021.03.06 |
Softmax RuntimeWarning ํด๊ฒฐ (0) | 2021.03.03 |
ํ์ฑํ ํจ์(activation function) (0) | 2021.03.03 |
Batch Normalization (0) | 2021.03.02 |