728x90

spongebob.fandom.com/wiki/Encyclopedia_SpongeBobia

 

Encyclopedia SpongeBobia

Encyclopedia SpongeBobia is the SpongeBob SquarePants encyclopedia that anyone can edit, and we need your help! We chronicle everything SpongeBob SquarePants, which is a show that follows SpongeBob, a little yellow sponge, whose adventures have captivated

spongebob.fandom.com

์Šคํฐ์ง€๋ฐฅ์˜ ๋Œ€์‚ฌ๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด Season๋ณ„ ํƒ€์ดํ‹€๊ณผ ๊ทธ ๋Œ€์‚ฌ๊ฐ€ ๋‹ด๊ธด ์‚ฌ์ดํŠธ๋ฅผ ํฌ๋กค๋งํ•œ๋‹ค.

import re
import pandas as pd
from urllib.request import urlopen
import glob

pd.read_html๋กœ ๋ฐ์ดํ„ฐ ์ถ”์ถœ

o_site = 'https://spongebob.fandom.com/wiki/List_of_transcripts'
season1 = pd.read_html(o_site,header=0)[0]
season1.columns # '#', 'Title', 'Transcript'

season13 ๊นŒ์ง€์˜ ์ œ๋ชฉ๊ณผ ์ฃผ์†Œ๋ฅผ ๋‹ด๋Š”๋‹ค. 

for i in range(13):
    total_site = []
    site = pd.read_html(o_site,header=0)[i]
    for title in site.Title:
        temp = {'title':title,
                'addr':str('https://spongebob.fandom.com/wiki/{}/transcript'.format(re.sub(' ','_',title)))}
        total_site.append(temp)
    globals()['season'+str(i+1)] = pd.DataFrame(total_site)

season ๋ณ„ ์ œ๋ชฉ๊ณผ ๋‚ด์šฉ ์ถ”์ถœ ํ›„ text๋กœ ์ €์žฅ

from bs4 import BeautifulSoup
import requests

for season in range(13):
    season = globals()['season'+str(season+1)]
    for title,addr in zip(season.title,season.addr):
        html = requests.get(addr).text
        soup = BeautifulSoup(html, 'html.parser')
        text = []
        for ea in soup.select('.mw-parser-output > ul'):
            text.append(ea.text)
        with open('{}.txt'.format(re.sub('\?','',str(title).replace(' ','_'))),'w') as f:
            for line in text:
                f.write(line)
๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค