Scrape “aspx” page with R

问题

can someone help me or give me some suggestion how scrape table from this url: https://www.promet.si/portal/sl/stevci-prometa.aspx.

I tried with instructions and packages rvest, httr and html but for this particular site without any sucess. Thank you.

回答1:

This ought to help get you started:

library(RSelenium)
library(wdman)
library(seleniumPipes)
library(rvest)
library(tidyverse)

selServ <- selenium(verbose = FALSE)
selServ$log() # find the port
remDr <- remoteDr(browserName = "chrome", port = 4567L)

remDr %>% 
  go("https://www.promet.si/portal/sl/stevci-prometa.aspx")

Sys.sleep(5)

pg <- getPageSource(remDr)

html_node(pg, xpath=".//div[@id='ctl00_mainContent_ctl00_StvContainer']/table") %>% 
  html_table() %>% 
  tbl_df()
## # A tibble: 1,239 x 10
##    X1    X2            X3     X4                       X5     X6      X7     X8    X9     X10  
##    <lgl> <chr>         <chr>  <chr>                    <chr>  <chr>   <chr>  <chr> <chr>  <lgl>
##  1 NA    Lokacija      Cesta  Smer                     Pas    Števil… Hitro… Razm… Stanje NA   
##  2 NA    Ajdovščina    R2-444 vzhod - zahod            ""     60      64     81,7  Norma… NA   
##  3 NA    Ajdovščina    R2-444 zahod - vzhod            ""     12      62     371,6 Norma… NA   
##  4 NA    Ajdovščina 2  R2-444 Ajdovščina - Selo        ""     36      67     117,8 Norma… NA   
##  5 NA    Ajdovščina 2  R2-444 Ajdovščina - Selo        ""     12      60     787,1 Norma… NA   
##  6 NA    Ajdovščina AC HC-H4  Nova Gorica - Vipava     vozni  96      100    31,5  Norma… NA   
##  7 NA    Ajdovščina AC HC-H4  Nova Gorica - Vipava     prehi… 36      124    120,7 Norma… NA   
##  8 NA    Ankaran       R2-406 Križ. Moretini - Ankaran ""     96      59     29    Norma… NA   
##  9 NA    Ankaran       R2-406 Ankaran - Križ. Moretini ""     12      57     292,1 Norma… NA   
## 10 NA    Apače         R2-438 Trate - Gornja Radgona   ""     24      58     110,6 Norma… NA   
## # ... with 1,229 more rows

回答2:

The translation of right to use of site "Right to use: All information and images contained on the website www.promet.si are subject to copyright protection and other forms of intellectual property protection. The documents published on these web pages may only be reproduced for non-commercial purposes, and they must also retain all the warnings of copyright or other rights. On every reproduction, the "Traffic Information Center for State Roads" should be listed as a source."

I am not sure if that means that scraping for non-commercial purposes is allowed or not.

Anyway thank you for the warning @s_t and special for the answer with nice code @hrbrmstr.

来源：https://stackoverflow.com/questions/52855989/scrape-aspx-page-with-r

标签

web-scraping

rvest

httr