问题
I created a python 3 script that allows me to search on a search engine (DuckDuckGo), get the HTML source code and write it in a textfile.
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://duckduckgo.com/?q=test')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()
body = buffer.getvalue()
with open("output.htm", "w") as text_file:
text_file.write(str(body))
print(body.decode('iso-8859-1'))
That part of the code is working properly. However, when I try to open the output.htm
file containing the HTML source code of the search engine, I don't get anything (I get an input
with my search topic written inside). I would like to have the same HTML source code that I would get by running curl https://duckduckgo.com/?q=test
on my terminal.
回答1:
Duckduckgo's html pages uses javascript to load their search result into their html markups, so curl
or PyCurl
will not be able to get the same html content you'd see in a browser since curl
/pycurl
merely fetches internet resources but does not provide any javascript processing.
Use https://duckduckgo.com/api instead of scraping to find search results in their servers/databases.
来源:https://stackoverflow.com/questions/52550953/pycurl-javascript