Using rbind() to combine multiple data frames into one larger data.frame within lapply()

淺唱寂寞╮ 提交于 2019-12-18 08:46:27

问题


I'm using R-Studio 0.99.491 and R version 3.2.3 (2015-12-10). I'm a relative newbie to R, and I'd appreciate some help. I'm doing a project where I'm trying to use the server logs on an old media server to identify which folders/files within the server are still being accessed and which aren't, so that my team knows which files to migrate. Each log is for a 24 hour period, and I have approximately a year's worth of logs, so in theory, I should be able to see all of the access over the past year.

My ideal output is to get a tree structure or plot that will show me the folders on our server that are being used. I've figured out how to read one log (one day) into R as a data.frame and then use the data.tree package in R to turn that into a tree. Now, I want to recursively go through all of the files in the directory, one by one, and add them to that original data.frame, before I create the tree. Here's my current code:

#Create the list of log files in the folder
files <- list.files(pattern = "*.log", full.names = TRUE, recursive = FALSE)
#Create a new data.frame to hold the aggregated log data
uridata <- data.frame()
#My function to go through each file, one by one, and add it to the 'uridata' df, above
lapply(files, function(x){
    uriraw <- read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
    #print(nrow(uriraw)
    uridata <- rbind(uridata, uriraw)
    #print(nrow(uridata))
})

The problem is that, no matter what I try, the value of 'uridata' within the lapply loop seems to not be saved/passed outside of the lapply loop, but is somehow being overwritten each time the loop runs. So instead of getting one big data.frame, I just get the contents of the last 'uriraw' file. (That's why there are those two commented print commands inside the loop; I was testing how many lines there were in the data frames each time the loop ran.)

Can anyone clarify what I'm doing wrong? Again, I'd like one big data.frame at the end that combines the contents of each of the (currently seven) log files in the folder.


回答1:


do.call() is your friend.

big.list.of.data.frames <- lapply(files, function(x){
    read.table(x, skip = 3, header = TRUE, stringsAsFactors = FALSE)
})

or more concisely (but less-tinkerable):

big.list.of.data.frames <- lapply(files, read.table, 
                                  skip = 3,header = TRUE,
                                  stringsAsFactors = FALSE)

Then:

big.data.frame <- do.call(rbind,big.list.of.data.frames)

This is a recommended way to do things because "growing" a data frame dynamically in R is painful. Slow and memory-expensive, because a new frame gets built at each iteration.




回答2:


You can use map_df from purrr package instead of lapply, to directly have all results combined as a data frame.

map_df(files, read.table, skip = 3, header = TRUE, stringsAsFactors = FALSE)



回答3:


Another option is fread from data.table

library(data.table)
rbindlist(lapply(files, fread, skip=3))


来源:https://stackoverflow.com/questions/36902815/using-rbind-to-combine-multiple-data-frames-into-one-larger-data-frame-within

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!