问题
I'm referencing this url: https://tracker.icon.foundation/block/29562412
If you scroll down to "Transactions", it shows 2 transactions with separate links, that's essentially what I'm trying to grab. If I try a simple pd.read_csv(url) command, it clearly omits the data I'm looking for, so I thought it might be JavaScript based and tried the following code instead:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://tracker.icon.foundation/block/29562412')
r.html.links
r.html.absolute_links
and I get the result "set()" even though I was expecting the following:
['https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632', 'https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e']
Is JavaScript even the right approach? I tried BeautifulSoup instead and found no cigar on that end as well.
回答1:
You're right. This page is populated asynchronously using JavaScript, so BeautifulSoup and similar tools won't be able to see the specific content you're trying to scrape.
However, if you log your browser's network traffic, you can see some (XHR) HTTP GET requests being made to a REST API, which serves its results in JSON. This JSON happens to contain the information you're looking for. It actually makes several such requests to various API endpoints, but the one we're interested in is called txList
(short for "transaction list" I'm guessing):
def main():
import requests
url = "https://tracker.icon.foundation/v3/block/txList"
params = {
"height": "29562412",
"page": "1",
"count": "10"
}
response = requests.get(url, params=params)
response.raise_for_status()
base_url = "https://tracker.icon.foundation/transaction/"
for transaction in response.json()["data"]:
print(base_url + transaction["txHash"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632
https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e
>>>
来源:https://stackoverflow.com/questions/65836519/webscraping-blockchain-data-seemingly-embedded-in-javascript-through-python-is