TFBertModel parameter

728x90

huggingface.co/transformers/model_doc/bert.html

BERT — transformers 4.3.0 documentation

past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) – Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_he

huggingface.co

vocab_size (int, optional, defaults to 30522) – Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling BertModel or TFBertModel.
hidden_size (int, optional, defaults to 768) – Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (int, optional, defaults to 12) – Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (int, optional, defaults to 3072) – Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
hidden_act (str or Callable, optional, defaults to "gelu") – The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
hidden_dropout_prob (float, optional, defaults to 0.1) – The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (float, optional, defaults to 0.1) – The dropout ratio for the attention probabilities.
max_position_embeddings (int, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (int, optional, defaults to 2) – The vocabulary size of the token_type_ids passed when calling BertModel or TFBertModel.
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (float, optional, defaults to 1e-12) – The epsilon used by the layer normalization layers.
gradient_checkpointing (bool, optional, defaults to False) – If True, use gradient checkpointing to save memory at the expense of slower backward pass.
position_embedding_type (str, optional, defaults to "absolute") – Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).
use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.

'👾 Deep Learning' 카테고리의 다른 글

tensorboard 사용법, gpu 할당 메모리 관리 (0)	2021.03.06
OSError: [WinError 127] 지정된 프로시저를 찾을 수 없습니다. Error loading \\torch\\lib\\*_ops_gpu.dll or one of its dependencies. (0)	2021.03.06
Softmax RuntimeWarning 해결 (0)	2021.03.03
활성화 함수(activation function) (0)	2021.03.03
Batch Normalization (0)	2021.03.02

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

TFBertModel parameter

'👾 Deep Learning' 카테고리의 다른 글

'👾 Deep Learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

'👾 Deep Learning' 카테고리의 다른 글

'👾 Deep Learning' 카테고리의 다른 글

개인정보

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역