问题
The following link contains the results of the marathon of Paris: http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon. I want to scrape these results, but the information lies within a frame. I know the basics of scraping with Rvest and Rselenium, but I am clueless on how to retrieve the data within such a frame. To get an idea, one of the things I tried was:
url = "http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon"
site = read_html(url)
ParisResults = site %>% html_node("iframe") %>% html_table()
ParisResults = as.data.frame(ParisResults)
Any help in solving this problem would be very welcome!
回答1:
The results are loaded by ajax from the following url :
url="http://www.aso.fr/massevents/resultats/ajax.php?v=1460995792&course=mar16&langue=us&version=3&action=search"
table <- url %>%
read_html(encoding="UTF-8") %>%
html_nodes(xpath='//table[@class="footable"]') %>%
html_table()
PS: I don't know what ajax is exactly, and I just know basics of rvest
EDIT: in order to answer the question in the comment: I don't have a lot of experience in web scraping. If you only use very basic technics with rvest or xml, you have to understand a little more the web site, and every site has its own structure. For this one, here is how I did:
As you see, in the source code you don't see any results because they are in an iframe, and when inspecting the code, you can see after "RESULTS OF 2016 EDITION":
class="iframe-xdm iframe-resultats" data-href="http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=3"
Now you can use directly this url : http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=2
But you still can get the results. You can then use Chrome developer tools > Network > XHR. When refreshing the page, you can see that the data is loaded from this url (when you choose the Men category) : http://www.aso.fr/massevents/resultats/ajax.php?course=mar16&langue=us&version=2&action=search&fields%5Bsex%5D=F&limiter=&order=
Now you can get the results !
And if you want the second page, etc. you can click on the number of the page, then use developer tool to see what happens !
来源:https://stackoverflow.com/questions/37389772/how-can-i-scrape-data-from-a-website-within-a-frame-using-r