Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

怎甘沉沦 提交于 2021-01-29 20:11:13

问题


I'm referencing this url: https://tracker.icon.foundation/block/29562412

If you scroll down to "Transactions", it shows 2 transactions with separate links, that's essentially what I'm trying to grab. If I try a simple pd.read_csv(url) command, it clearly omits the data I'm looking for, so I thought it might be JavaScript based and tried the following code instead:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://tracker.icon.foundation/block/29562412')
r.html.links
r.html.absolute_links

and I get the result "set()" even though I was expecting the following:

['https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632', 'https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e']

Is JavaScript even the right approach? I tried BeautifulSoup instead and found no cigar on that end as well.


回答1:


You're right. This page is populated asynchronously using JavaScript, so BeautifulSoup and similar tools won't be able to see the specific content you're trying to scrape.

However, if you log your browser's network traffic, you can see some (XHR) HTTP GET requests being made to a REST API, which serves its results in JSON. This JSON happens to contain the information you're looking for. It actually makes several such requests to various API endpoints, but the one we're interested in is called txList (short for "transaction list" I'm guessing):

def main():

    import requests

    url = "https://tracker.icon.foundation/v3/block/txList"

    params = {
        "height": "29562412",
        "page": "1",
        "count": "10"
    }

    response = requests.get(url, params=params)
    response.raise_for_status()

    base_url = "https://tracker.icon.foundation/transaction/"

    for transaction in response.json()["data"]:
        print(base_url + transaction["txHash"])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output:

https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632
https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e
>>> 


来源:https://stackoverflow.com/questions/65836519/webscraping-blockchain-data-seemingly-embedded-in-javascript-through-python-is

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!