BeautifulSoup output to .txt file

前端 未结 2 1442
情深已故
情深已故 2021-01-14 11:41

I am trying to export my data as a .txt file

from bs4 import BeautifulSoup
import requests
import os

import os

os.getcwd()
\'/home/folder\'
os.mkdir(\"Prob         


        
2条回答
  •  一整个雨季
    2021-01-14 12:05

    I was working on a webscraping project, and this issue gave me tons of problems. I tried almost every solution out there that dealt with Python encoding (convert to UTF using string.encode(), convert to ASCII, convert using the 'unicodedata' module, use .decode() and then .encode(), blood sacrifice to Tim Peters, etc etc).

    None of the solutions worked all the time, which struck me as very un-Pythonic.

    So what I ended up using was the following:

    html = bs.prettify()  #bs is your BeautifulSoup object
    with open("out.txt","w") as out:
        for i in range(0, len(html)):
            try:
                out.write(html[i])
            except Exception:
                1+1
    

    It's not perfect, but it gave me the best results. When I opened it in a browser, it was able to parse the page properly almost every time.

提交回复
热议问题