[Jupyter Notebook] ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ ์…€ ์Šคํฌ๋ฆฝํŠธ ๋„ˆ๋น„ ์กฐ์ ˆ(cell script option), ํŒ๋‹ค์Šค ๋„ˆ๋น„ ์กฐ์ ˆ
ยท
๐Ÿ Python
from IPython.core.display import display, HTML display(HTML("")) import pandas as pd # ๋ณด๊ธฐ ๋„ˆ๋น„ ์ฆ๊ฐ€ pd.set_option('display.max.colwidth', 200)
[Data Crawling] Spongebob - 1
ยท
๐Ÿ Python
spongebob.fandom.com/wiki/Encyclopedia_SpongeBobia Encyclopedia SpongeBobia Encyclopedia SpongeBobia is the SpongeBob SquarePants encyclopedia that anyone can edit, and we need your help! We chronicle everything SpongeBob SquarePants, which is a show that follows SpongeBob, a little yellow sponge, whose adventures have captivated spongebob.fandom.com ์Šคํฐ์ง€๋ฐฅ์˜ ๋Œ€์‚ฌ๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด Season๋ณ„ ํƒ€์ดํ‹€๊ณผ ๊ทธ ๋Œ€์‚ฌ๊ฐ€ ๋‹ด๊ธด ์‚ฌ์ดํŠธ๋ฅผ ..
[CNN] ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง (feat. Learning Word Vectors for Sentiment Analysis)
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
ai.stanford.edu/~amaas/data/sentiment/ Sentiment Analysis Publications Using the Dataset Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (A ai.stanford.edu ์Šคํƒ ํผ๋“œ ๋Œ€ํ•™๊ต ์ธ๊ณต์ง€๋Šฅ ์—ฐ๊ตฌํŒ€์˜ ์›๋ณธ ์ž๋ฃŒ๋กœ classification ์ง„ํ–‰ import glob import os f..
[doc2vec] ๋ฌธ์„œ ์œ ์‚ฌ๋„ ์ถ”์ •
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
word2vec์˜ ๊ฐœ๋…์„ ๋ฌธ์žฅ์ด๋‚˜ ๋ฌธ์„œ ์ „์ฒด๋กœ ํ™•์žฅ์‹œ์ผœ ํ™œ์šฉํ•œ๋‹ค. ๊ธฐ์กด ๋‹จ์–ด๋“ค์— ๊ทผ๊ฑฐํ•ด์„œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•จ์œผ๋กœ์จ ๋‹จ์–ด ๋ฒกํ„ฐ๋“ค์„ ํ•™์Šตํ•œ๋‹ค๋Š” ์ฐฉ์•ˆ์„ ๋ฌธ์žฅ์ด๋‚˜ ๋ฌธ๋‹จ, ๋ฌธ์„œ ๋ฒกํ„ฐ์˜ ํ•™์Šต์œผ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. doc2vec์€ ์ ์ง„์  ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ํ›ˆ๋ จ์„ ๋งˆ์นœ ๋ชจํ˜•์— ์ƒˆ๋กœ์šด ๋ฌธ์„œ๋“ค์„ ์ž…๋ ฅํ•ด์„œ ์ƒˆ๋กœ์šด ๋ฌธ์„œ ๋ฒกํ„ฐ๋“ค์„ ์ƒ์„ฑํ•œ๋‹ค. ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋™๊ฒฐ๋œ ๋‹จ์–ด๋ฒกํ„ฐ ํ–‰๋ ฌ๊ณผ ํ•ด๋‹น ๊ฐ€์ค‘์น˜๋“ค๋กœ ์ƒˆ ๋ฌธ์„œ ๋ฒกํ„ฐ๋“ค์„ ๊ณ„์‚ฐํ•ด์„œ ๋ฌธ์„œ ํ–‰๋ ฌ์— ์ถ”๊ฐ€ํ•œ๋‹ค. ๋ฌธ์„œ ๋ฒกํ„ฐ ํ›ˆ๋ จ genism ํŒจํ‚ค์ง€์—์žˆ๋Š” doc2vec์„ ์œ„ํ•œ ํ•จ์ˆ˜๋“ค์„ ์ด์šฉํ•ด ๋ฌธ์„œ ๋ฒกํ„ฐ ์ƒ์„ฑ ์‚ฌ์šฉํ•  cpu ์ฝ”์–ด ์ˆ˜ import multiprocessing num_cores = multiprocessing.cpu_count() genism์˜ doc2vec๊ณผ ๋ง๋ญ‰์น˜ ๋ฌธ์„œ ๋ฒกํ„ฐ..
[Word2vec] ๋‹จ์–ด ๊ด€๊ณ„ ์‹œ๊ฐํ™”
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
nlpia๋ฅผ ํ†ตํ•ด ๋ฏธ๋ฆฌ ํ›ˆ๋ จ๋œ ๊ตฌ๊ธ€ ๋‰ด์Šค word2vec ๋ชจํ˜•์„ ๋ฐ›์•„์˜จ๋‹ค. ๋‹จ์–ด ์ˆ˜ 30๋งŒ๊ฐœ import os from nlpia.loaders import get_data from gensim.models.word2vec import Word2VecKeyedVectors wv = get_data('word2vec') len(wv.vocab) #3000000 n-gram ๋‹จ์–ด๋“ค์ด '_' ๋ฌธ์ž๋กœ ์—ฐ๊ฒฐ๋œ ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. import pandas as pd from tqdm import tqdm vocab = pd.Series(wv.vocab) vocab.iloc[1000000:1000006] # Starwood_Hotels_HOT Vocab(count:2000000, index:1000000) # ..
Word2vec Vs GloVe
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
word2vec์€ NLP์— ํ˜์‹ ์„ ๊ฐ€์ ธ์™”์ง€๋งŒ ๋ฐ˜๋“œ์‹œ ์—ญ์ „ํŒŒ๋ฅผ ์ด์šฉํ•ด ํ›ˆ๋ จ์„ ํ•ด์•ผํ•œ๋‹ค๋Š” ์‹ ๊ฒฝ๋ง ์˜์กด์ด ํฌ๋‹ค. ์ด๋ฅผ ์Šคํƒ ํผ๋“œ ์—ฐ๊ตฌํŒ€์€ SVD๋ฅผ ์ ์šฉํ•ด word2vec์œผ๋กœ ์‚ฐ์ถœํ•œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ๋‘๊ฐœ์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์„ ํ†ตํ•ด ์—ญ์ „ํŒŒ ์ˆ˜๋ ด์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๋ฅผ ๋ฐœ๊ฒฌ ํ–ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Glove ์ „์—ญ์— ๋Œ€ํ•œ ๋‹จ์–ด ๊ณต๋™ ์ถœํ˜„ ๋นˆ๋„๋ฅผ ์ตœ์ ํ™” ์‹œ์ผœ ํ•ด๊ฒฐํ–ˆ๋‹ค. Word2vec์ด ํฐ ๋ง๋ญ‰์น˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ๋งŒ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•œ ๋ฐ˜๋ฉด Glove๋Š” ๋” ์ž‘์€ ๋ง๋ญ‰์น˜๋กœ๋„ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. GloVe์˜ ์žฅ์  ํ›ˆ๋ จ์ด ๋น ๋ฅด๋‹ค. RAM๊ณผ CPU ํšจ์šธ์„ฑ์ด ์ข‹๋‹ค. ์ ์€ ๋ง๋ญ‰์น˜๋„ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ฐ™์€ ํ›ˆ๋ จ ์ž๋ฃŒ๋กœ ํ›ˆ๋ จํ–ˆ์„ ๋–„ word2vec ๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋‹คํ–ˆ๋‹ค
B's