Issue scraping with Beautiful Soup

后端 未结 2 525
心在旅途
心在旅途 2021-02-10 00:48

I\'ve been scraping websites before using this same technique. But with this website it seems to not work.

import urllib2
from BeautifulSoup import BeautifulSoup         


        
2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-10 01:27

    but I want to know why I am getting a gif accesing the url like that and when I access it via my browser I get the website perfectly.

    because these guys are smart and don't want their website to be accessed outside a web browser. What you need to do is to fake a known browser by adding User-agent to the header. Here is a modified example that will work

    >>> import urllib2
    >>> opener = urllib2.build_opener()
    >>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    >>> url = "http://www.weatheronline.co.uk/weather/maps/current?LANG=en&DATE=1354104000&CONT=euro&LAND=UK&KEY=UK&SORT=1&INT=06&TYP=sonne&ART=tabelle&RUBRIK=akt&R=310&CEL=C"
    >>> response = opener.open(url)
    >>> page = response.read()
    >>> from BeautifulSoup import BeautifulSoup
    >>> soup = BeautifulSoup(page)
    

提交回复
热议问题