Scraping AJAX e-commerce site using python

前端 未结 2 495
醉话见心
醉话见心 2021-01-15 00:52

I have a problem on scraping an e-commerce site using BeautifulSoup. I did some Googling but I still can\'t solve the problem.

Please refer on the

相关标签:
2条回答
  • 2021-01-15 01:30

    Welcome to StackOverflow! You can inspect where the ajax request is being sent to and replicate that.

    In this case the request goes to this api url. You can then use requests to perform a similar request. Notice however that this api endpoint requires a correct UserAgent header. You can use a package like fake-useragent or just hardcode a string for the agent.

    import requests
    
    # fake useragent
    from fake_useragent import UserAgent
    user_agent = UserAgent().chrome
    
    # or hardcode
    user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1468.0 Safari/537.36'
    
    url = 'https://shopee.com.my/api/v2/search_items/?by=relevancy&keyword=h370m&limit=50&newest=0&order=desc&page_type=search'
    resp = requests.get(url, headers={
        'User-Agent': user_agent
    })
    data = resp.json()
    products = data.get('items')
    
    0 讨论(0)
  • 2021-01-15 01:42

    Welcome to StackOverflow! :)

    As an alternative, you can check Selenium

    See example usage from documentation:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    
    driver = webdriver.Firefox()
    driver.get("http://www.python.org")
    assert "Python" in driver.title
    elem = driver.find_element_by_name("q")
    elem.clear()
    elem.send_keys("pycon")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    driver.close()
    

    When you use requests (or libraries like Scrapy) usually JavaScript not loaded. As @dmitrybelyakov mentioned you can reply these calls or imitate normal user interaction using Selenium.

    0 讨论(0)
提交回复
热议问题