Why can't I scrape Amazon by BeautifulSoup?

前端 未结 4 1992
谎友^
谎友^ 2021-01-17 04:10

Here is my python code:

import urllib2
from bs4 import BeautifulSoup

page = urllib2.urlopen("http://www.amazon.com/")
soup = BeautifulSoup(page)
pr         


        
相关标签:
4条回答
  • 2021-01-17 04:25

    Add a header, then it will work.

    from bs4 import BeautifulSoup
    import requests
    url = "http://www.amazon.com/"
    
    # add header
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content, "lxml")
    print soup
    
    0 讨论(0)
  • 2021-01-17 04:32

    I just ran into this and found that setting any user-agent will work. You don't need to lie about your user agent.

    response = HTTParty.get @url, headers: {'User-Agent' => 'Httparty'}
    
    0 讨论(0)
  • 2021-01-17 04:40

    Add a header

    import urllib2
    from bs4 import BeautifulSoup
    
    headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36'}
    
    page = urllib2.urlopen("http://www.amazon.com/")
    soup = BeautifulSoup(page)
    print soup
    
    0 讨论(0)
  • 2021-01-17 04:42

    You can try this:

    import urllib2
    from bs4 import BeautifulSoup
    
    page = urllib2.urlopen("http://www.amazon.com/")
    soup = BeautifulSoup(page)
    print soup
    

    In python arbitrary text is called a string and it must be enclosed in quotes(" ").

    0 讨论(0)
提交回复
热议问题