Fetch a Wikipedia article with Python

前端 未结 10 1843
余生分开走
余生分开走 2020-11-27 15:37

I try to fetch a Wikipedia article with Python\'s urllib:

f = urllib.urlopen(\"http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes\")         


        
相关标签:
10条回答
  • 2020-11-27 16:19

    requests is awesome!

    Here is how you can get the html content with requests:

    import requests
    html = requests.get('http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes').text
    

    Done!

    0 讨论(0)
  • 2020-11-27 16:21

    It is not a solution to the specific problem. But it might be intersting for you to use the mwclient library (http://botwiki.sno.cc/wiki/Python:Mwclient) instead. That would be so much easier. Especially since you will directly get the article contents which removes the need for you to parse the html.

    I have used it myself for two projects, and it works very well.

    0 讨论(0)
  • 2020-11-27 16:22

    Rather than trying to trick Wikipedia, you should consider using their High-Level API.

    0 讨论(0)
  • 2020-11-27 16:22

    The general solution I use for any site is to access the page using Firefox and, using an extension such as Firebug, record all details of the HTTP request including any cookies.

    In your program (in this case in Python) you should try to send a HTTP request as similar as necessary to the one that worked from Firefox. This often includes setting the User-Agent, Referer and Cookie fields, but there may be others.

    0 讨论(0)
提交回复
热议问题