Scraper in Python gives “Access Denied”

前端 未结 3 1260
青春惊慌失措
青春惊慌失措 2020-12-19 12:35

I\'m trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:
https://www.justdial.com/Panipat/Saree-Retai

相关标签:
3条回答
  • 2020-12-19 12:42

    As was mentioned in comments, you need to specify allowable user-agent and pass it as headers:

    def extract_source(url):
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
        source=requests.get(url, headers=headers).text
        return source
    
    0 讨论(0)
  • 2020-12-19 12:59
    def extract_source(url):
        headers = {"User-Agent":"Mozilla/5.0"}
        source=requests.get(url, headers=headers).text
        return source
    

    out:

    <title>Saree Retailers in Panipat - Best Deals online - Justdial</title>
    

    Add User-Agent to your request, some site do not response to the request which dnose not has User-Agent

    0 讨论(0)
  • 2020-12-19 13:08

    Try this:

    import bs4
    import requests
    
    def extract_source(url):
         agent = {"User-Agent":"Mozilla/5.0"}
         source=requests.get(url, headers=agent).text
         return source
    
    def extract_data(source):
         soup=bs4.BeautifulSoup(source, 'lxml')
         names=soup.findAll('title')
         for i in names:
         print i
    
    extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))
    

    I added 'lxml' to potentially avoid parse error.

    0 讨论(0)
提交回复
热议问题