Matplotlib ํ•œ๊ธ€ ํฐํŠธ ์ ์šฉ
ยท
๐Ÿ Python
import matplotlib.font_manager as fm font_location = 'C:\\Windows\\Fonts\\H2SA1M.ttf' # ํฐํŠธ ์œ„์น˜ font_name = fm.FontProperties(fname=font_location).get_name() plt.rc('font', family=font_name) # ํฐํŠธ ์ ์šฉ
์ฑ— ๋ด‡ ๋งŒ๋“ค๊ธฐ(1)
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์˜ ํ™œ์šฉ๋นˆ๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์ฑ—๋ด‡์„ ๋งŒ๋“ค์–ด ๋ณธ๋‹ค. ๋‹จ์ˆœํ•˜๊ฒŒ ๊ทœ์น™ ๊ธฐ๋ฐ˜์œผ๋กœ ์ œ์ž‘, ๋จธ์‹ ๋Ÿฌ๋‹ ์œ ์‚ฌ๋„ ํ™œ์šฉ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ์ง€๋งŒ ๋”ฅ๋Ÿฌ๋‹์„ ํ†ตํ•ด ์‹ค์Šต์„ ํ•œ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์—์„œ๋„ Sequence to sequence ๋ชจ๋ธ์„ ํ™œ์šฉํ•ด ์ฑ—๋ด‡์„ ์ œ์ž‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. Data : github.com/songys/Chatbot_data songys/Chatbot_data Chatbot_data_for_Korean. Contribute to songys/Chatbot_data development by creating an account on GitHub. github.com ( http://cafe116.daum.net/_c21_/home?grpid=1bld )์—์„œ ์ž์ฃผ ๋‚˜์˜ค๋Š” ์ด์•ผ๊ธฐ๋“ค์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ œ์ž‘ ์ž๋ฃŒ๋ฅผ ์˜ค..
MaLSTM
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
############## MaLSTM ๋ชจ๋ธ ############## LSTM๊ณ„์—ด์„ ํ™œ์šฉํ•ด ๋ฌธ์žฅ์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋‹ค. MaLSTM ๋ชจ๋ธ์€ 2016๋…„ MIT์—์„œ ์กฐ๋‚˜์Šค ๋ฎ๋Ÿฌ(Jonas Mueller)์˜ ๋…ผ๋ฌธ์—์„œ ์ฒ˜์Œ ์†Œ๊ฐœ ๋˜์—ˆ๋‹ค. ๋ฌธ์ž์˜ Sequence ํ˜•ํƒœ๋กœ ํ•™์Šต ์‹œํ‚ค๊ณ  ๊ธฐ์กด RNN๋ณด๋‹ค ์žฅ๊ธฐ์ ์ธ ํ•™์Šต์— ํšจ๊ณผ์ ์ธ ์„ฑ๋Š˜์„ ๋ณด์—ฌ์คฌ๋‹ค. MaLSTM์ด๋ž€ ๋งจํ•˜ํƒ„ ๊ฑฐ๋ฆฌ(Manhattan Distance) + LSTM์˜ ์ค„์ž„๋ง์ด๋‹ค. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๋Œ€์‹ ํ•ด ๋งจํ•˜ํƒ„ ๊ฑฐ๋ฆฌ(L1)์„ ์ด์šฉํ•œ๋‹ค. LSTM์˜ ๋งˆ์ง€๋ง‰ ์Šคํ…์ธ $LSTM_a$์˜ $h_5^{a}$ ๊ฐ’๊ณผ $LSTM_b$์˜ $h_4^{b}$ ๊ฐ’์ด ์€๋‹‰ ์ƒํƒœ ๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. ์ด ๊ฐ’์€ ๋ฌธ์žฅ์˜ ๋ชจ๋“  ๋‹จ์–ด์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋ฐ˜์˜๋œ ๊ฐ’์œผ๋กœ ์ „์ฒด ๋ฌธ์žฅ์„ ๋Œ€ํ‘œํ•˜๋Š” ๋ฒกํ„ฐ๊ฐ€ ๋œ๋‹ค. ..
[Kaggle] ๋„ค์ด๋ฒ„ ์˜ํ™” ๋ฆฌ๋ทฐ ๋ถ„๋ฅ˜(1)
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
www.kaggle.com/c/dfc615k/data DFC615K DFC615 Natural Language Processing Task 1 www.kaggle.com NSMC ๋„ค์ด๋ฒ„ ์˜ํ™” ๋ฆฌ๋ทฐ์— ๋‹ฌ๋ฆฐ ๋ณ„์ ์„ ๊ธ์ •/๋ถ€์ •์œผ๋กœ ๋ณ€ํ™˜ํ•œ binary-class ๋ฐ์ดํ„ฐ ์…‹ # kaggle-nsmc import os import zipfile def extractall(path,s_path,info=None,f_type=None): file_list = os.listdir(path) for file in file_list: try: if file.split('.')[1] in "zip": zipRef = zipfile.ZipFile(path + file, 'r') zipRef.extractall(s_path) #..
PCA, SVD ์ž ์žฌ ์˜๋ฏธ ๋ถ„์„
ยท
๐Ÿ—ฃ๏ธ Natural Language Processing
=== PCA === scikit-learn์˜ PCA๋ชจํ˜•์„ ๋ฌธ์ž ๋ฉ”์‹œ์ง€๋“ค์— ์ ์šฉ import pandas as pd from nlpia.data.loaders import get_data sms = get_data("sms-spam") sms.head() index = ['sms{}{}'.format(i,'!'*j) for (i,j) in zip(range(len(sms)), sms.spam)] sms.index = index # ๊ฐ ๋ฉ”์‹œ์ง€์˜ TF-IDF ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐ from sklearn.feature_extraction.text import TfidfVectorizer from nltk.tokenize.casual import casual_tokenize tfidf = TfidfVectorizer(t..
[DL] GRU (gated recurrent unit)
ยท
๐Ÿ‘พ Deep Learning
Gated Recurrent Unit LSTM์„ ๊ฐœ์„ ํ•œ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์˜ ๊ฒŒ์ดํŠธ ๊ตฌ์กฐ 2014๋…„ ๋‰ด์š• ๋Œ€ํ•™๊ต์˜ ์กฐ๊ฒฝํ˜„ ๊ต์ˆ˜๋‹˜ ์™ธ 6์ธ์ด ์ตœ์ดˆ ์ œ์•ˆ ํ–ˆ๋‹ค. GRU๋Š” ์ž…๋ ฅ ๊ฒŒ์ดํŠธ์™€ ๋ง๊ฐ ๊ฒŒ์ดํŠธ๋ฅผ ํ•ฉํ•œ ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ๊ฐ€ ์žˆ๋‹ค. ๊ธฐ์–ต ์…€์—๋Š” ์ถœ๋ ฅ๊ฒŒ์ดํŠธ๊ฐ€ ์—†๋Š” ๋Œ€์‹  ๊ณผ๊ฑฐ์—์„œ ์ด์–ด๋ฐ›์€ ๊ธฐ์–ต์„ ์„ ๋ณ„ํ•˜๋Š” ๋ฆฌ์…‹ ๊ฒŒ์ดํŠธ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒŒ์ดํŠธ๊ฐ€ ๋™์ž‘ํ•ด LSTM์ฒ˜๋Ÿผ ์žฅ๊ธฐ ๊ธฐ์–ต์„ ์ด์–ด ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. +: ์›์†Œ๊ฐ„์˜ ํ•ฉ x: ์›์†Œ๊ฐ„์˜ ๊ณฑ 1-: ์ „๋‹ฌ๋ฐ›์€ ๊ฐ’์„ 1์—์„œ ๋นผ๊ธฐ σ: ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜ r: ๋ฆฌ์…‹ ๊ฒŒ์ดํŠธ z: ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ h: ์ƒˆ๋กœ์šด ๊ธฐ์–ต x: t ์‹œ์ ์—์„œ ์‹ ๊ฒฝ๋ง์ธต์˜ ์ž…๋ ฅ h : t-1 ์ด์ „ ์‹œ์ ์˜ ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ 2๊ฐœ์—๋Š” ๊ฐ๊ฐ ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ๋‹ค. ๋˜ํ•œ tanh๋ฅผ ๋˜ ๋‹ค๋ฅธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•˜๋Š” ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์ด..
๋‹คํ–ˆ๋‹ค
B's