I\'ve imported a csv-file to R using RStudio where I am trying to plot points per game against minutes per game. However the minutes per game is in the format mm:ss and I\'m
Some tuning of first solution:
minPerGame <- paste(sample(1:89,100000,T),sample(0:59,100000,T),sep=":")
f1 <- function(){
sapply(strsplit(minPerGame,":"),
function(x) {
x <- as.numeric(x)
x[1]+x[2]/60
}
)
}
#
f2<- function(){
w <- matrix(c(1,1/60),ncol=1)
as.vector(matrix(as.numeric(unlist(strsplit(minPerGame,":"))),ncol=2,byrow=TRUE)%*%w)
}
system.time(f1())
system.time(f2())
system.time(f1()) user system elapsed 0.88 0.00 0.86
system.time(f2()) user system elapsed 0.25 0.00 0.27
Do you need to decimalise it? If you store the data in the correct format, for example as an object of class POSIXlt
, one of R's date-time classes, R will handle the correct handling of the times in numeric fashion. Here is an example of what I mean:
First we create some dummy data for illustration purposes:
set.seed(1)
DF <- data.frame(Times = seq(as.POSIXlt("10:00", format = "%M:%S"),
length = 100, by = 10),
Points = cumsum(rpois(100, lambda = 1)))
head(DF)
Ignore the fact that there are dates here, it is effectively ignored when we do the plot as all observations have the same date part. Next we plot this using R's formula interface:
plot(Points ~ Times, data = DF, type = "o")
Which produces this:
Given that you start with a character vector, this is relatively easy :
minPerGame <- c("4:30","2:20","34:10")
sapply(strsplit(minPerGame,":"),
function(x) {
x <- as.numeric(x)
x[1]+x[2]/60
}
)
gives
[1] 4.500000 2.333333 34.166667
Make sure you checked that you used read.csv()
with the option as.is=TRUE
. Otherwise you'll have to convert using as.character()
.