How can I scrape data from a website within a frame using R?

后端 未结 1 1003
小鲜肉
小鲜肉 2021-01-25 01:40

The following link contains the results of the marathon of Paris: http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon. I want to scrape these resu

1条回答
  •  -上瘾入骨i
    2021-01-25 02:04

    The results are loaded by ajax from the following url :

    url="http://www.aso.fr/massevents/resultats/ajax.php?v=1460995792&course=mar16&langue=us&version=3&action=search"
      table <- url %>%
        read_html(encoding="UTF-8") %>%
        html_nodes(xpath='//table[@class="footable"]') %>%
        html_table()
    

    PS: I don't know what ajax is exactly, and I just know basics of rvest

    EDIT: in order to answer the question in the comment: I don't have a lot of experience in web scraping. If you only use very basic technics with rvest or xml, you have to understand a little more the web site, and every site has its own structure. For this one, here is how I did:

    1. As you see, in the source code you don't see any results because they are in an iframe, and when inspecting the code, you can see after "RESULTS OF 2016 EDITION":

      class="iframe-xdm iframe-resultats" data-href="http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=3"

    2. Now you can use directly this url : http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=2

    3. But you still can get the results. You can then use Chrome developer tools > Network > XHR. When refreshing the page, you can see that the data is loaded from this url (when you choose the Men category) : http://www.aso.fr/massevents/resultats/ajax.php?course=mar16&langue=us&version=2&action=search&fields%5Bsex%5D=F&limiter=&order=

    4. Now you can get the results !

    5. And if you want the second page, etc. you can click on the number of the page, then use developer tool to see what happens !

    0 讨论(0)
提交回复
热议问题