问题
I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset.
## First put all file names into a list
library(data.table)
all.files <- list.files(path = "C:/Users",pattern = ".txt")
## Read data using fread
readdata <- function(fn){
dt_temp <- fread(fn, sep=",")
keycols <- c("ID", "date")
setkeyv(dt_temp,keycols) # Notice there's a "v" after setkey with multiple keys
return(dt_temp)
}
# then using
mylist <- lapply(all.files, readdata)
mydata <- do.call('rbind',mylist)
The code works fine, but the speed is not satisfactory. Each txt file has 1M observations and 12 fields.
If I use the fread
to read a single file, it's fast. But using apply
, then speed is extremely slow, and obviously take much time than reading files one by one. I wonder where went wrong here, is there any improvements for the speed gain?
I tried the llply
in plyr
package, there're not much speed gains.
Also, is there any syntax in data.table
to achieve vertical join like rbind
and union
in sql
?
Thanks.
回答1:
Use rbindlist()
which is designed to rbind
a list
of data.table
's together...
mylist <- lapply(all.files, readdata)
mydata <- rbindlist( mylist )
And as @Roland says, do not set the key in each iteration of your function!
So in summary, this is best :
l <- lapply(all.files, fread, sep=",")
dt <- rbindlist( l )
setkey( dt , ID, date )
来源:https://stackoverflow.com/questions/21156271/fast-reading-and-combining-several-files-using-data-table-with-fread