Web Scraping using Python giving HTTP Error 404: Not Found

前端 未结 1 561
野的像风
野的像风 2021-02-11 04:10

I am brand new to Python and have not very good at it. I am trying to web scrape from a website called Transfermarkt (I\'m a big football fan) but its giving me HTTP Error 404 w

相关标签:
1条回答
  • 2021-02-11 04:36

    As Rup mentioned above, your user agent may have been rejected by the server.

    Try augmenting your code with the following:

    import urllib.request  # we are going to need to generate a Request object
    from bs4 import BeautifulSoup as soup
    
    my_url = "https://www.transfermarkt.com/chelsea-fc/leihspielerhistorie/verein/631/plus/1?saison_id=2018&leihe=ist"
    
    # here we define the headers for the request
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0'}
    
    # this request object will integrate your URL and the headers defined above
    req = urllib.request.Request(url=my_url, headers=headers)
    
    # calling urlopen this way will automatically handle closing the request
    with urllib.request.urlopen(req) as response:
        page_html = response.read()
    

    After the code above you can continue your analysis. The Python docs have some useful pages on this topic:

    https://docs.python.org/3/library/urllib.request.html#examples

    https://docs.python.org/3/library/urllib.request.html

    Mozilla's documentation has a load of user-agent strings to try:

    https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

    0 讨论(0)
提交回复
热议问题