Pycurl javascript

烂漫一生 提交于 2019-12-24 11:05:58

问题


I created a python 3 script that allows me to search on a search engine (DuckDuckGo), get the HTML source code and write it in a textfile.

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://duckduckgo.com/?q=test')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()

body = buffer.getvalue()
with open("output.htm", "w") as text_file:
    text_file.write(str(body))
print(body.decode('iso-8859-1'))

That part of the code is working properly. However, when I try to open the output.htm file containing the HTML source code of the search engine, I don't get anything (I get an input with my search topic written inside). I would like to have the same HTML source code that I would get by running curl https://duckduckgo.com/?q=test on my terminal.


回答1:


Duckduckgo's html pages uses javascript to load their search result into their html markups, so curl or PyCurl will not be able to get the same html content you'd see in a browser since curl/pycurl merely fetches internet resources but does not provide any javascript processing.

Use https://duckduckgo.com/api instead of scraping to find search results in their servers/databases.



来源:https://stackoverflow.com/questions/52550953/pycurl-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!