I am trying to extract the following information:
On the page
http://epl.squawka.com/stoke-city-vs-arsenal/01-03-2014/english-barclays-premier-league/matche
Peter as the guys indicated you can do this with Selenium. I also like to use the excellent selectr package The idea is to briefly interact with the site then do the rest elsewhere. squawkData should contain everything needed.
# RSelenium::startServer() # if needed
require(RSelenium)
remDr <- remoteDriver()
remDr$open()
remDr$setImplicitWaitTimeout(3000)
remDr$navigate("http://epl.squawka.com/stoke-city-vs-arsenal/01-03-2014/english-barclays-premier-league/matches")
squawkData <- remDr$executeScript("return new XMLSerializer().serializeToString(squawkaDp.xml);", list())
require(selectr)
example <- querySelectorAll(xmlParse(squawkData[[1]]), "crosses time_slice")
example[[1]]
<time_slice name="0 - 5" id="1">
<event player_id="531" mins="4" secs="39" minsec="279" team="44" type="Failed">
<start>73.1,87.1</start>
<end>97.9,49.1</end>
</event>
</time_slice>
DISCLAIMER I am the author of the RSelenium package. A basic vignette on operation can be viewed at RSelenium basics and RSelenium: Testing Shiny apps.
Further info can be accessed easily using selectr:
> xmlValue(querySelectorAll(xmlParse(squawkData[[1]]), "players #531 name")[[1]])
[1] "Charlie Adam"
> xmlValue(querySelectorAll(xmlParse(squawkData[[1]]), "game team#44 long_name")[[1]])
[1] "Stoke City"
UPDATE:
To process example into a dataframe you can do something like
out <- lapply(example, function(x){
# handle each event
if(length(x['event']) > 0){
res <- lapply(x['event'], function(y){
matchAttrs <- as.list(xmlAttrs(y))
matchAttrs$start <- xmlValue(y['start']$start)
matchAttrs$end <- xmlValue(y['end']$end)
matchAttrs
})
return(do.call(rbind.data.frame, res))
}
}
)
> head(do.call(rbind, out))
player_id mins secs minsec team type start end
event 531 4 39 279 44 Failed 73.1,87.1 97.9,49.1
event5 311 6 33 393 31 Failed 92.3,13.1 93.0,31.0
event1 376 8 57 537 31 Failed 97.7,6.1 96.7,16.4
event6 311 13 50 830 31 Failed 99.5,0.5 94.9,42.6
event11 311 14 11 851 31 Failed 99.5,0.5 93.1,51.0
event7 311 17 41 1061 31 Failed 99.5,99.5 92.6,50.1