728x90
spongebob.fandom.com/wiki/Encyclopedia_SpongeBobia
Encyclopedia SpongeBobia
Encyclopedia SpongeBobia is the SpongeBob SquarePants encyclopedia that anyone can edit, and we need your help! We chronicle everything SpongeBob SquarePants, which is a show that follows SpongeBob, a little yellow sponge, whose adventures have captivated
spongebob.fandom.com
์คํฐ์ง๋ฐฅ์ ๋์ฌ๋ฅผ ์ถ์ถํ๊ธฐ ์ํด Season๋ณ ํ์ดํ๊ณผ ๊ทธ ๋์ฌ๊ฐ ๋ด๊ธด ์ฌ์ดํธ๋ฅผ ํฌ๋กค๋งํ๋ค.
import re
import pandas as pd
from urllib.request import urlopen
import glob
pd.read_html๋ก ๋ฐ์ดํฐ ์ถ์ถ
o_site = 'https://spongebob.fandom.com/wiki/List_of_transcripts'
season1 = pd.read_html(o_site,header=0)[0]
season1.columns # '#', 'Title', 'Transcript'
season13 ๊น์ง์ ์ ๋ชฉ๊ณผ ์ฃผ์๋ฅผ ๋ด๋๋ค.
for i in range(13):
total_site = []
site = pd.read_html(o_site,header=0)[i]
for title in site.Title:
temp = {'title':title,
'addr':str('https://spongebob.fandom.com/wiki/{}/transcript'.format(re.sub(' ','_',title)))}
total_site.append(temp)
globals()['season'+str(i+1)] = pd.DataFrame(total_site)
season ๋ณ ์ ๋ชฉ๊ณผ ๋ด์ฉ ์ถ์ถ ํ text๋ก ์ ์ฅ
from bs4 import BeautifulSoup
import requests
for season in range(13):
season = globals()['season'+str(season+1)]
for title,addr in zip(season.title,season.addr):
html = requests.get(addr).text
soup = BeautifulSoup(html, 'html.parser')
text = []
for ea in soup.select('.mw-parser-output > ul'):
text.append(ea.text)
with open('{}.txt'.format(re.sub('\?','',str(title).replace(' ','_'))),'w') as f:
for line in text:
f.write(line)
๋ฐ์ํ
'๐ Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Code-Server] ์ฝ๋ ์๋ฒ์์ ์ฃผํผํฐ ๋ ธํธ๋ถ ์ฌ์ฉํ๊ธฐ (0) | 2021.07.11 |
---|---|
[Code-Server] import-im6.q16: unable to open X server ์๋ฌ (0) | 2021.07.11 |
[Jupyter Notebook] ์ฃผํผํฐ ๋ ธํธ๋ถ ์ ์คํฌ๋ฆฝํธ ๋๋น ์กฐ์ (cell script option), ํ๋ค์ค ๋๋น ์กฐ์ (0) | 2021.03.21 |
[Pytorch] Autograd (0) | 2021.02.28 |
Matplotlib ํ๊ธ ํฐํธ ์ ์ฉ (0) | 2021.02.13 |