Embedding Model API ํ•œ๊ตญ์–ด Token & ๋น„์šฉ ๋น„๊ต
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
“๋Œ€ํ•œ๋ฏผ๊ตญ ๋ฒ•๋ฅ  ์ „๋ฌธ”์„ ๊ฐ€์ง€๊ณ  OpenAI(ChatGPT), GOOGLE(Gemini), Antropic(Claude), Upstage(Solar)๋ฅผ ๋Œ€์ƒ์œผ๋กœ embedding ํ›„ token ์ˆ˜๋ฅผ ๋น„๊ตํ•˜๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ Goal : API๋กœ ์ œ๊ณต๋˜๋Š” LLM ์ค‘ ์–ด๋–ค ๋ชจ๋ธ์ด ํ•œ๊ตญ์–ด token์„ ๊ฐ€์žฅ ์ ๊ฒŒ ์‚ฌ์šฉํ•˜๊ณ  ๋น„์šฉ ์ €๋ ดํ•œ์ง€ ๋น„๊ต Input Text(๋Œ€ํ•œ๋ฏผ๊ตญํ—Œ๋ฒ• ์ „๋ฌธ, text length=373) ์œ ๊ตฌํ•œ ์—ญ์‚ฌ์™€ ์ „ํ†ต์— ๋น›๋‚˜๋Š” ์šฐ๋ฆฌ๋“ค ๋Œ€ํ•œ๊ตญ๋ฏผ์€ ๊ธฐ๋ฏธ ์‚ผ์ผ์šด๋™์œผ๋กœ ๋Œ€ํ•œ๋ฏผ๊ตญ์„ ๊ฑด๋ฆฝํ•˜์—ฌ ์„ธ๊ณ„์— ์„ ํฌํ•œ ์œ„๋Œ€ํ•œ ๋…๋ฆฝ์ •์‹ ์„ ๊ณ„์Šนํ•˜์—ฌ ์ด์ œ ๋ฏผ์ฃผ๋…๋ฆฝ๊ตญ๊ฐ€๋ฅผ ์žฌ๊ฑดํ•จ์— ์žˆ์–ด์„œ ์ •์˜์ธ๋„์™€ ๋™ํฌ์• ๋กœ์จ ๋ฏผ์กฑ์˜ ๋‹จ๊ฒฐ์„ ๊ณต๊ณ ํžˆ ํ•˜๋ฉฐ ๋ชจ๋“  ์‚ฌํšŒ์  ํ์Šต์„ ํƒ€ํŒŒํ•˜๊ณ  ๋ฏผ์ฃผ์ฃผ์˜์ œ์ œ๋„๋ฅผ ์ˆ˜๋ฆฝํ•˜์—ฌ ์ •์น˜, ๊ฒฝ์ œ, ์‚ฌํšŒ, ๋ฌธํ™”์˜ ๋ชจ๋“  ์˜์—ญ์— ์žˆ์–ด..
[BERT] ์™œ BERT๋Š” 15%์˜ ๋น„์œจ๋กœ ๋ชจ๋ธ๋ง ํ–ˆ์„๊นŒ?
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
"Should You Mask 15% in Masked Language Modeling?" https://arxiv.org/abs/2202.08005 Should You Mask 15% in Masked Language Modeling? Masked language models (MLMs) conventionally mask 15% of tokens due to the belief that more masking would leave insufficient context to learn good representations; this masking rate has been widely used, regardless of model sizes or masking strategies. In arxiv.org..
[Python] f-string trick (2)
ยท
๐Ÿ Python
“Formatted String Literals.” 1) f-string nested in nested python ๋ฒ„์ „์ด ์˜ฌ๋ผ๊ฐ€๋ฉด์„œ ๋‹ค์–‘ํ•˜๊ณ  ํŽธ๋ฆฌํ•œ ๊ธฐ๋Šฅ์ด ๋งŽ์ด ์ƒ๊ฒผ๋‹ค. ๊ทธ ์ค‘ํ•˜๋‚˜์ธ f-string ์•ˆ์— ๋‹ค์‹œ f-string์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๋˜๋ฉด ์•ž์„œ ์†Œ๊ฐœํ•œ f-string trick(1)์˜ datetime์„ ์ž์œ ์ž์žฌ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. from datetime import datetime now : datetime = datetime.now() date_spec : str = "%d.%m.%Y" date = now | date_spec print(f"{now:{date_spec}}") # '17.03.2024' 2) file path file path๋ฅผ ๋ฌธ์ž์—ด๋กœ ์ฒ˜๋ฆฌํ•  ๋•Œ escape ..
[Network] 304 Not Modified
ยท
๐Ÿƒ Routine
304 Not Modified ํด๋ผ์ด์–ธํŠธ ๋ฆฌ๋””๋ ‰์…˜ ์‘๋‹ต ์ฝ”๋“œ 304 Not Modified ๋Š” ์š”์ฒญ๋œ ๋ฆฌ์†Œ์Šค๋ฅผ ์žฌ์ „์†กํ•  ํ•„์š”๊ฐ€ ์—†์Œ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์บ์‹œ๋œ ์ž์›์œผ๋กœ์˜ ์•”๋ฌต์ ์ธ ๋ฆฌ๋””๋ ‰์…˜์ด๋‹ค. ์ด๋Š” GET์ด๋‚˜ HEAD ์š”์ฒญ์ฒ˜๋Ÿผ ์š”์ฒญ ๋ฐฉ๋ฒ•์ด ์•ˆ์ „ํ•œ ๊ฒฝ์šฐ ๋˜๋Š” ์š”์ฒญ์ด ์กฐ๊ฑด๋ถ€๋กœ If-None-Match ๋˜๋Š” If-Modified-Since ํ—ค๋”๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์‘๋‹ต ๋œ๋‹ค. ๊ฐ„ํ˜น ์„œ๋ฒ„์— ํ•„์š” ์—†๋Š” header๋‚˜ body๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๊ฑฐ๋‚˜ https ์ฃผ์†Œ์— http๋กœ ์š”์ฒญํ•  ๋•Œ ๋ฐœ์ƒ
[rsyslog] imklog: cannot open kernel log, ERROR (Syntax error, this crontab file will be ignored), Operation not permitted.
ยท
๐Ÿง‘‍๐Ÿ’ป Develop
syslog ๋ณด๋‹ค ๋” ๋งŽ์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” rsyslog ๋กœ๊ทธ ๊ด€๋ฆฌ ํ”„๋กœ๊ทธ๋žจ์„ docker๋กœ ๋นŒ๋“œํ•ด root ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉํ•  ๋•Œ ๊ฒช๋Š” ์—๋Ÿฌ๋ฅผ ์•Œ์•„๋ณด์ž. 1) rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted. imklog : Kernel Log Input Module kernel๊ณผ ๊ด€๋ จ๋œ ๋กœ๊ทธ๋ฅผ ๊ธฐ๋กํ•˜๋Š” module์ด๋‹ค. ๋ถˆํ•„์š”ํ•˜๋‹ค๋ฉด /etc/rsyslog.conf ํŒŒ์ผ์„ ์ˆ˜์ •ํ•œ๋‹ค. sed -i '/imklog/s/^/#/' /etc/rsyslog.conf docker์˜ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•ด ์‹คํ–‰ docker --privileged https://github.com/docker/for-win/issues/8649 /pr..
[Python] ์ด๊ฑฐ ๋ชจ๋ฅด๋ฉด ๋„ˆ๋Š” ์ดˆ๋ณด
ยท
๐Ÿ Python
Python ์ฒ˜์Œ ์ ‘ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์ด ํ•˜๋Š” ํ”ํ•œ ์‹ค์ˆ˜ 1) try ~ except~ ๊ตฌ๋ฌธ ๊ตฌ๋ฌธ ์‚ฌ์šฉํ•  ๋•Œ error๋ฌธ์„ ์ง์ ‘ ๊ทธ๊ฒƒ๋„ ์ด์ƒํ•˜๊ฒŒ ์ž‘์„ฑํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. python์—์„œ ์‚ฌ์šฉํ•˜๋Š” error ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋  ์ผ์„ ๋‚˜๋งŒ์ด ์•Œ์•„๋ณด๊ฒŒ ํ•ด๋†“๋Š”๊ฒƒ์ด ํฐ ์‹ค์ˆ˜๋กœ ์ด์–ด์ง„๋‹ค. [Worse] total: float = 0 while True: user_input: str = input("Add: ") try: total += float(user_input) except: print('์ˆซ์ž๋งŒ ์ž…๋ ฅํ•ด์ฃผ์‹œ์˜ค.') print(f"Current: {total}") [Better] total: float = 0 while True: user_input: str = input("Add: ") try: total += floa..
๋‹คํ–ˆ๋‹ค
B's