I was trying to scrape the number of flights for this webpage https://www.flightradar24.com/56.16,-49.51
The number is highlighted in the picture below:
The num
So based on what @Andre has found out, I wrote this code:
import requests
from bs4 import BeautifulSoup
import time
def get_count():
url = "https://data-live.flightradar24.com/zones/fcgi/feed.js?bounds=59.09,52.64,-58.77,-47.71&faa=1&mlat=1&flarm=1&adsb=1&gnd=1&air=1&vehicles=1&estimated=1&maxage=7200&gliders=1&stats=1"
# Request with fake header, otherwise you will get an 403 HTTP error
r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
# Parse the JSON
data = r.json()
counter = 0
# Iterate over the elements to get the number of total flights
for element in data["stats"]["total"]:
counter += data["stats"]["total"][element]
return counter
while True:
print(get_count())
time.sleep(8)
The code should be self explaining, everything it does is printing the actual flight count every 8 seconds :)
Note: The values are similar to the ones on the website, but not the same. This is because it's unlikely, that the Python script and the website are sending a request at the same time. If you want to get more accurate results, just make a request every 4 seconds for example.
Use this code as you want, extend it or whatever. Hope this helps!
The problem with your approach is that the page first loads a view, then performs regular requests to refresh the page. If you look at the network tab in the developer console in Chrome (for example), you'll see the requests to https://data-live.flightradar24.com/zones/fcgi/feed.js?bounds=59.09,52.64,-58.77,-47.71&faa=1&mlat=1&flarm=1&adsb=1&gnd=1&air=1&vehicles=1&estimated=1&maxage=7200&gliders=1&stats=1
The response is regular json:
{
"full_count": 11879,
"version": 4,
"afefdca": [
"A86AB5",
56.4288,
-56.0721,
233,
38000,
420,
"0000",
"T-F5M",
"B763",
"N641UA",
1473852497,
"LHR",
"ORD",
"UA929",
0,
0,
"UAL929",
0
],
...
"aff19d9": [
"A12F78",
56.3235,
-49.3597,
251,
36000,
436,
"0000",
"F-EST",
"B752",
"N176AA",
1473852497,
"DUB",
"JFK",
"AA291",
0,
0,
"AAL291",
0
],
"stats": {
"total": {
"ads-b": 8521,
"mlat": 2045,
"faa": 598,
"flarm": 152,
"estimated": 464
},
"visible": {
"ads-b": 0,
"mlat": 0,
"faa": 6,
"flarm": 0,
"estimated": 3
}
}
}
I'm not sure if this API is protected in any way, but it seems like I can access it without any issues using curl.
More info:
You can use selenium to crawl a webpage with dynamic content added by javascript.
from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get('https://www.flightradar24.com/56.16,-49.51/3')
soup = BeautifulSoup(browser.page_source, "html.parser")
result = soup.find_all("span", {"id": "menuPlanesValue"})
for item in result:
print(item.text)
browser.quit()