I\'ve got a dataframe dat of size 30000 x 50. I also have a separate list that contains points to groupings of rows from this dataframe, e.g.,
rows <- list(c(
One of the main issues is the matching of row names -- the default in [.data.frame
is partial matching of row names and you probably don't want that, so you're better off with match
. To speed it up even further you can use fmatch
from fastmatch
if you want. This is a minor modification with some speedup:
# naive
> system.time(res1 <- lapply(rows,function(r) dat[r,]))
user system elapsed
69.207 5.545 74.787
# match
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[match(r,rn),]))
user system elapsed
36.810 10.003 47.082
# fastmatch
> rn <- rownames(dat)
> system.time(res1 <- lapply(rows,function(r) dat[fmatch(r,rn),]))
user system elapsed
19.145 3.012 22.226
You can get further speed up by not using [
(it is slow for data frames) but splitting the data frame (using split
) if your rows
are non-overlapping and cover all rows (and thus you can map each row to one entry in rows).
Depending on your actual data you may be better off with matrices that have by far faster subsetting operators since they are native.