HTTPError: HTTP Error 403: Forbidden

前端 未结 1 878
无人共我
无人共我 2020-12-03 05:19

I making a python script for personal use but it\'s not working for wikipedia...

This work:

import urllib2, sys
from bs4 import BeautifulSoup

site =         


        
相关标签:
1条回答
  • 2020-12-03 05:45

    Within the current code:

    Python 2.X

    import urllib2, sys
    from BeautifulSoup import BeautifulSoup
    
    site= "http://en.wikipedia.org/wiki/StackOverflow"
    hdr = {'User-Agent': 'Mozilla/5.0'}
    req = urllib2.Request(site,headers=hdr)
    page = urllib2.urlopen(req)
    soup = BeautifulSoup(page)
    print soup
    

    Python 3.X

    from bs4 import BeautifulSoup
    from urllib.request import Request, urlopen
    
    site= "http://en.wikipedia.org/wiki/StackOverflow"
    hdr = {'User-Agent': 'Mozilla/5.0'}
    req = Request(site,headers=hdr)
    page = urlopen(req)
    soup = BeautifulSoup(page)
    print(soup)
    

    Python 3.X with Selenium (Javascript functions execution)

    from selenium import webdriver as driver
    
    browser = driver.PhantomJS()
    p = browser.get("http://en.wikipedia.org/wiki/StackOverflow")
    assert "Stack Overflow - Wikipedia" in browser.title
    

    The reason modified version works is because Wikipedia checks for User-Agent to be of "popular browser"

    0 讨论(0)
提交回复
热议问题