[Huggingface] Model Memory Calculator, GPU ์–ผ๋งˆ๋ฉด ๋˜๋‹ˆ?
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
Model Memory Calculator, GPU ์–ผ๋งˆ๋ฉด ๋˜๋‹ˆ?  llama3, gemma2, florence ๋“ฑ llama1(2023.2.24)์ด ๋‚˜์˜จ ์ง€ ๋ฒŒ์จ 1๋…„์ด ๋„˜์–ด๊ฐ€๋Š”๋ฐ ์•„์ง ์˜คํ”ˆ llm์˜ ์ธ๊ธฐ๋Š” ์‹์„ ์ค„ ๋ชจ๋ฅด๊ณ  ์žˆ๋‹ค. ์•„๋‹ˆ ๋” ์ธ๊ธฐ๊ฐ€ ๋Š˜๊ณ  ์žˆ๋‹ค. ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์€ ๋”์šฑ ์‰ฝ๊ณ  ๊ฒฌ๊ณ ํ•ด์ง€๊ณ  ๋ชจ๋ธ inference๋Š” ๋”์šฑ ๋ฆฌ์†Œ์Šค ์†๋„ ๋‹ค ๋ฐœ๋‹ฌํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฐ๊ณ ๋กœ ๋‚˜์˜ ๋ฆฌ์†Œ์Šค์— ๋งž๋Š” ๋ชจ๋ธ์€ ๋ฌด์—‡์ด๊ณ  ์ตœ๋Œ€์น˜๋กœ ๋Œ๋ฆด ์ˆ˜ ์žˆ๋Š” ๊ฒƒ๋“ค์ด ๊ถ๊ธˆํ•  ๊ฒƒ์ด๋‹ค.  ๋จผ์ € 2b, 7b, 9b์ด ์ˆซ์ž์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜๋ฉด ๋ชจ๋ธ์ด ํ•™์Šตํ•œ parameter์˜ ์ˆ˜์ด๋‹ค. ๊ฐ„๋‹จํžˆ ์ด์•ผ๊ธฐํ•˜๋ฉด ๋ชจ๋ธ์ด ํ‘œํ˜„ํ•  ์ˆ˜์žˆ๋Š” ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ ์ด๋งŒํผ ๋งŽ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ณผ๊ฑฐ BERT ๋ชจ๋ธ์˜ ๋‹จ์œ„๊ฐ€ 3M, 5M ๋ฐฑ๋งŒ ๋‹จ์œ„๋ผ๋ฉด ์ง€๊ธˆ์€ ์ˆ˜์‹ญ์–ต ๋‹จ์œ„๋กœ ๋„˜์–ด์™”..
Embedding Model API ํ•œ๊ตญ์–ด Token & ๋น„์šฉ ๋น„๊ต
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
“๋Œ€ํ•œ๋ฏผ๊ตญ ๋ฒ•๋ฅ  ์ „๋ฌธ”์„ ๊ฐ€์ง€๊ณ  OpenAI(ChatGPT), GOOGLE(Gemini), Antropic(Claude), Upstage(Solar)๋ฅผ ๋Œ€์ƒ์œผ๋กœ embedding ํ›„ token ์ˆ˜๋ฅผ ๋น„๊ตํ•˜๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ Goal : API๋กœ ์ œ๊ณต๋˜๋Š” LLM ์ค‘ ์–ด๋–ค ๋ชจ๋ธ์ด ํ•œ๊ตญ์–ด token์„ ๊ฐ€์žฅ ์ ๊ฒŒ ์‚ฌ์šฉํ•˜๊ณ  ๋น„์šฉ ์ €๋ ดํ•œ์ง€ ๋น„๊ต Input Text(๋Œ€ํ•œ๋ฏผ๊ตญํ—Œ๋ฒ• ์ „๋ฌธ, text length=373) ์œ ๊ตฌํ•œ ์—ญ์‚ฌ์™€ ์ „ํ†ต์— ๋น›๋‚˜๋Š” ์šฐ๋ฆฌ๋“ค ๋Œ€ํ•œ๊ตญ๋ฏผ์€ ๊ธฐ๋ฏธ ์‚ผ์ผ์šด๋™์œผ๋กœ ๋Œ€ํ•œ๋ฏผ๊ตญ์„ ๊ฑด๋ฆฝํ•˜์—ฌ ์„ธ๊ณ„์— ์„ ํฌํ•œ ์œ„๋Œ€ํ•œ ๋…๋ฆฝ์ •์‹ ์„ ๊ณ„์Šนํ•˜์—ฌ ์ด์ œ ๋ฏผ์ฃผ๋…๋ฆฝ๊ตญ๊ฐ€๋ฅผ ์žฌ๊ฑดํ•จ์— ์žˆ์–ด์„œ ์ •์˜์ธ๋„์™€ ๋™ํฌ์• ๋กœ์จ ๋ฏผ์กฑ์˜ ๋‹จ๊ฒฐ์„ ๊ณต๊ณ ํžˆ ํ•˜๋ฉฐ ๋ชจ๋“  ์‚ฌํšŒ์  ํ์Šต์„ ํƒ€ํŒŒํ•˜๊ณ  ๋ฏผ์ฃผ์ฃผ์˜์ œ์ œ๋„๋ฅผ ์ˆ˜๋ฆฝํ•˜์—ฌ ์ •์น˜, ๊ฒฝ์ œ, ์‚ฌํšŒ, ๋ฌธํ™”์˜ ๋ชจ๋“  ์˜์—ญ์— ์žˆ์–ด..
[BERT] ์™œ BERT๋Š” 15%์˜ ๋น„์œจ๋กœ ๋ชจ๋ธ๋ง ํ–ˆ์„๊นŒ?
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
"Should You Mask 15% in Masked Language Modeling?" https://arxiv.org/abs/2202.08005 Should You Mask 15% in Masked Language Modeling? Masked language models (MLMs) conventionally mask 15% of tokens due to the belief that more masking would leave insufficient context to learn good representations; this masking rate has been widely used, regardless of model sizes or masking strategies. In arxiv.org..
[Gemini] ValueError: The `response.parts` quick accessor only works for a single candidate, but none were returned. Check the `response.prompt_feedback` to see if the prompt was blocked.
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
ValueError: The `response.parts` quick accessor only works for a single candidate, but none were returned. Check the `response.prompt_feedback` to see if the prompt was blocked. ์›์ธ : ์ž˜๋ชป๋œ request๋กœ prompt ์‘๋‹ต์ด ์ฐจ๋‹จ๋œ case ํ˜น์€ safety setting error request ๋ณด๋‚ด๊ธฐ ์ „์— parameter๋ฅผ ๋‹ค ๊ธฐ์ž…ํ•ด์ค€๋‹ค. ํ˜„์žฌ๋กœ์„œ๋Š” max_tokens candidate_count : ์ถœ๋ ฅ ์ˆ˜ top_p : log_probabilities๋“ค์˜ argmax ๊ฐ’์„ ์—ญ์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ๋’ค ๋‹จ์–ด ๋‹จ์œ„๋กœ ์žฌ๊ตฌ์„ฑ (๊ธ€๋ €๋‹ค_argmax : 0.7, ํž˜๋“ค๋‹ค_a..
[Pinecone] llama-index with Pinecone
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
Llama-Index with Pinecone ์ด ๋…ธํŠธ๋ถ์—์„œ๋Š” semantic-search๋ฅผ ์œ„ํ•ด Pinecone๊ณผ llama-index(์ด์ „์˜ GPT-index) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด ๋…ธํŠธ๋ถ์€ llama-index์˜ ์˜ˆ์‹œ์ด๋ฉฐ ํ–ฅํ›„ ๋ฆด๋ฆฌ์Šค์—์„œ๋Š” Pinecone ์˜ˆ์ œ ์ €์žฅ์†Œ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 1) install packages !pip install -qU llama-index datasets pinecone-client openai transformers 2) SQuAD dataset Load Wikipedia(context-title) from datasets import load_dataset data = load_dataset('squad', split='train') d..
The Path to Achieve Ultra-Low Inference Latency With LLaMA 65B on PyTorch/XLA
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
BACKGROUND & STATE OF THE ART ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP) ์˜์—ญ์—์„œ ์–ธ์–ด ๋ชจ๋ธ์€ ๊ณผ๊ฑฐ ์ž…๋ ฅ ํ† ํฐ์˜ ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ† ํฐ(์˜ˆ: ๋‹จ์–ด)์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋Œ€์šฉ๋Ÿ‰ ์–ธ์–ด ๋ชจ๋ธ(Large Language Models, LLMs)์€ ์ด ๊ณต๊ฐ„์—์„œ์˜ ์ตœ์‹  ๋”ฅ๋Ÿฌ๋‹ ํ˜์‹ ์œผ๋กœ, ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ๋ฐฉ์‹์œผ๋กœ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ž…๋ ฅ ํ† ํฐ์˜ ํฐ ์‹œํ€€์Šค์— ๋Œ€ํ•œ ์ฃผ์˜๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด transformer๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. LLaMA๋Š” 1์กฐ ๊ฐœ ์ด์ƒ์˜ ํ† ํฐ์œผ๋กœ ํ›ˆ๋ จ๋œ ๊ฐ•๋ ฅํ•œ ๊ธฐ๋ฐ˜ LLM์œผ๋กœ, Meta AI์—์„œ ์˜คํ”ˆ ์†Œ์Šค๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. LLaMA๋Š” GPT-3, Chinchilla, PaLM๊ณผ ๊ฐ™์€ ๋งŽ์€ ์ตœ๊ณ ์˜ ๋ชจ๋ธ๊ณผ ๊ฒฝ์Ÿ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. LLaMA (13B)๋Š” GPT..
Textbooks Are All You Need
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
Textbooks Are All You Need Abstract ์šฐ๋ฆฌ๋Š” phi-1์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๊ฒฝ์Ÿ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ์ž‘์€ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. phi-1์€ 1.3B ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋กœ, ์›น์—์„œ "๊ต๊ณผ์„œ ์ˆ˜์ค€"์˜ ๋ฐ์ดํ„ฐ (6B ํ† ํฐ)์™€ GPT-3.5 (1B ํ† ํฐ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 8 A100์—์„œ 4์ผ ๋™์•ˆ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์€ ๊ทœ๋ชจ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  phi-1์€ HumanEval์—์„œ 50.6%์˜ pass@1 ์ •ํ™•๋„์™€ MBPP์—์„œ 55.5%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ฝ”๋”ฉ ์—ฐ์Šต ๋ฐ์ดํ„ฐ์…‹์—์„œ finetuning ๋‹จ๊ณ„ ์ด์ „์ธ phi-1-base ๋ชจ๋ธ๊ณผ ๊ฐ™์€ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ํ›ˆ๋ จ๋œ 350M ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ ๋” ์ž‘์€ ๋ชจ๋ธ์ธ phi-1-..
LLM Context ํ™•์žฅ ๋ถˆ๊ฐ€๋Šฅ์€ ์•„๋‹ˆ๋‹ค. (token size ๋Š˜๋ฆฌ๊ธฐ ์ •๋ฆฌ)
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
https://kaiokendev.github.io/context Extending Context is Hard pages kaiokendev.github.io kaiokendev.github.io ํ™•์žฅ ์ปจํ…์ŠคํŠธ๋Š” ์–ด๋ ต์ง€๋งŒ ๋ถˆ๊ฐ€๋Šฅํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค† ํ‘œ๋ฉด์ ์œผ๋กœ๋Š” ์‰ฌ์šด ์ž‘์—…์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” ๊ธด ์‹œํ€€์Šค ๊ธธ์ด์— ๋Œ€ํ•ด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•˜๋ฉด์„œ ์ด ๊ธ€์„ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ, ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ LLaMa์ด๋ฉฐ, ์‚ฌ์ „ ํ›ˆ๋ จ ์‹œํ€€์Šค ๊ธธ์ด๋Š” 2048์ž…๋‹ˆ๋‹ค. ๊ธด ์‹œํ€€์Šค์—์„œ ๋ชจ๋ธ์„ ๋‹จ์ˆœํžˆ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์€ ํ•ญ์ƒ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์˜€์ง€๋งŒ, ๊ฐ€๋Šฅํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์œผ๋ฏ€๋กœ ์™„์ „ํžˆ ๋„์ „ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด์ œ 1์ค„์˜ ์ฝ”๋“œ๋กœ ์ปจํ…์ŠคํŠธ๋ฅผ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋ฉฐ, ์ด์— ๋งŽ์€ ๊ด€์‹ฌ์ด ์ง‘์ค‘๋˜๊ณ  ์žˆ์Šต๋‹ˆ..
๋‹คํ–ˆ๋‹ค
'๐Ÿ—ฃ๏ธ Natural Language Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก