How can I scrape from a webpage that uses javascript to load in elements as you scroll?

前端 未结 1 1424
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-29 02:16

My friend asked if I could write a web scraping script to collect data of pokemon from a specific website.

I\'ve written the following code to render the javascript and

1条回答
  •  时光说笑
    2021-01-29 03:17

    The data is actually present in the page source. See view-source:https://www.smogon.com/dex/ss/pokemon/ (It is present inside on the script tag as a javascript variable).

    import requests
    import re
    import json
    
    
    response = requests.get('https://www.smogon.com/dex/ss/pokemon/')
    
    # The following regex will help you take the json string from the response text
    data = "".join(re.findall(r'dexSettings = (\{.*\})', response.text))
    
    # the above will only return a string, we need to parse that to json in order to process it as a regular json object using `json.loads()`
    data = json.loads(data)
    
    # now we can query json string like below.
    data = data.get('injectRpcs', [])[1][1].get('items', [])
    
    for row in data:
      print(row.get('name', ''))
      print(row.get('description', ''))
    
    

    See it in action here

    0 讨论(0)
提交回复
热议问题