Extracting HTML table into R

陌路散爱 提交于 2019-12-23 01:14:06

问题


I've been trying to extract a table from a webpage. The data is a flight track data from live flight tracking website (https://flightaware.com/live/flight/WJA1508/history/20150814/1720Z/CYYC/KSFO/tracklog).

I've tried XML, RCurl and Curl packages, but I didn't work. I believe most likely because I couldn't figure out how to avoid the SSL as well as the columns that contains notes on the flight status (i. e., first two from the top and third from the bottom of the table).

Can any one knows how extract this table int R?


回答1:


As noted by @hrbrmstr in the comments above, this violates FlightAware's TOS, but what you do with your code is your business. :) This should get you most of the way there using the rvest package:

library(rvest)

u <- "https://flightaware.com/live/flight/WJA1508/history/20150814/1720Z/CYYC/KSFO/tracklog"

html_read <- html(u)
tbl <- html_table(
  html_nodes(html_read, "table"), 
  fill=TRUE, 
  header=FALSE, 
  trim=TRUE 
)[[2]]

##  Subset to the first row of data and remove all extra
##    columns:
tbl_o <- tbl[6:nrow(tbl), ]
tbl_o <- tbl_o[,colSums(is.na(tbl_o))!=nrow(tbl_o)]

names(tbl_o) <- c(
  "Time", "Lat", "Lon", 
  "Course", "Direction", 
  "KTS", "MPH", "Alt", 
  "Rate", "Location"
)

str(tbl_o)

Which yields:

'data.frame':   292 obs. of  10 variables:
 $ Time     : chr  "Fri 01:41:34 PM" "Fri 01:48:59 PM" "Fri 01:49:14 PM" "Fri 01:50:05 PM" ...
 $ Lat      : chr  "51.0833" "51.1551" "51.1683" "51.2235" ...
 $ Lon      : chr  "-113.9667" "-114.0209" "-114.0209" "-114.0220" ...
 $ Course   : chr  "335°" "0°" "0°" "358°" ...
 $ Direction: chr  "Northwest" "North" "North" "North" ...
 $ KTS      : chr  "20" "201" "219" "149" ...
 $ MPH      : chr  "23" "231" "252" "171" ...
 $ Alt      : chr  "3,500" "4,900" "5,200" "6,800" ...
 $ Rate     : chr  "" "222" "1,727" "1,701" ...
 $ Location : chr  "Edmonton Center" "FlightAware ADS-B  (CYYC)" "FlightAware ADS-B  (CYYC)" "FlightAware ADS-B  (CEG2)" ...


来源:https://stackoverflow.com/questions/32021051/extracting-html-table-into-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!