Importing many files at the same time and adding ID indicator

问题

I have 91 files - .log format:

rajectory Log File

Rock type: 2 (0: Sphere, 1: Cuboid, 2: Rock)

Nr of Trajectories: 91
Trajectory-Mode: ON
Average Slope (Degrees): 28.05 / 51.99 / 64.83

Filename: test_tschamut_Pos1.xml

Z-offset: 1.32000
Rock Position X: 696621.38
Rock Position Y: 167730.02
Rock Position Z: 1679.6400

Friction:
Overall Type: Medium

               t (s)               x (m)               y (m)               z (m)               p0 ()               p1 ()               p2 ()               p3 ()          vx (m s-1)          vy (m s-1)          vz (m s-1)        wx (rot s-1)        wy (rot s-1)        wz (rot s-1)           Etot (kJ)           Ekin (kJ)      Ekintrans (kJ)        Ekinrot (kJ)              zt (m)             Fv (kN)             Fh (kN)        Slippage (m)      mu_s (N s m-1)       v_res (m s-1)     w_res (rot s-1)           JumpH (m)        ProjDist (m)               Jc ()           JH_Jc (m)              SD (m)
               0.000          696621.380          167730.020            1680.960               1.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000            1192.526               0.000               0.000               0.000            1677.754               0.000               0.000               0.000               0.350               0.000               0.000               3.206               0.000               0.000               0.000               0.000
               0.010          696621.380          167730.020            1680.959               1.000               0.000              -0.000               0.000               0.000               0.000              -0.098               0.000               0.000               0.000            1192.526               0.010               0.010               0.000            1677.754               0.000               0.000               0.000               0.350               0.098               0.000               3.205               0.000               0.000               0.000               0.000
               0.020          696621.380          167730.020            1680.958               1.000               0.000              -0.000               0.000               0.000               0.000              -0.196               0.000               0.000               0.000            1192.526               0.039               0.039               0.000            1677.754               0.000               0.000               0.000               0.350               0.196               0.000               3.204               0.000               0.000               0.000               0.000
               0.040          696621.380          167730.020            1680.952               1.000               0.000              -0.000               0.000               0.000               0.000              -0.392               0.000               0.000               0.000            1192.526               0.158               0.158               0.000            1677.754               0.000               0.000               0.000               0.350               0.392               0.000               3.198               0.000               0.000               0.000               0.000
               0.060          696621.380          167730.020            1680.942               1.000               0.000              -0.000               0.000               0.000               0.000              -0.589               0.000               0.000               0.000            1192.526               0.355               0.355               0.000            1677.754               0.000               0.000               0.000               0.350               0.589               0.000               3.188               0.000               0.000               0.000               0.000

I have managed to import one single file, and to retain only the desired variables which are: x, y, z, Etot:

  trjct <- read.table('trajectory_test_tschamut_Pos1.log', skip = 23)
  trjct <- trjct[,c("V1","V2","V3", "V4", "V15")]
  colnames(trjct) <- c("t", "x", "y", "z", "Etot")

> str(trjct)
'data.frame':   1149 obs. of  5 variables:
 $ t   : num  0 0.01 0.02 0.04 0.06 0.08 0.11 0.13 0.15 0.16 ...
 $ x   : num  696621 696621 696621 696621 696621 ...
 $ y   : num  167730 167730 167730 167730 167730 ...
 $ z   : num  1681 1681 1681 1681 1681 ...
 $ Etot: num  1193 1193 1193 1193 1193 ...

However I have 91 of these files and would like to analyse them simultaneously. Therefore, I want to create one large dataset, that distingishes the data from every file by adding an ID - similiar question has been answered here.

I have applied the code to my data and needs and adjusted it here and there, but I always keep getting some errors.

# importing all files at the same time
  files.list <- list.files(pattern = ".log")
  trjct <- data.frame(t=numeric(),
                      x=numeric(),
                      z=numeric(),
                      Etot=numeric(),
                      stringsAsFactors=FALSE)

  for (i in 1: length(files.list)) {
    df.next <- read.table(files.list[[i]], header=F, skip = 23)
    df.next$ID <- paste0('simu', i)
    df <- rbind(df, df.next)
  }

I am getting an error:

Error in rep(xi, length.out = nvar) : 
  attempt to replicate an object of type 'closure'

QUESTIONS:

Where is the problem and how can I fix it?
Is there a better solution?

回答1:

You could also check out purrr::map_df which behaves like lapply or for loop but returns a data.frame

read_traj <- function(fi) {
    df <- read.table(fi, header=F, skip=23)
    df <- df[, c(1:4, 15)]
    colnames(df) <- c("t", "x", "y", "z", "Etot")
    return(df)
}

files.list <- list.files(pattern = ".log")
library(tidyverse)

map_df has a handy feature .id=... that creates a column, id, with numbers 1...N where N is number of files.

map_df(files.list, ~read_traj(.x), .id="id")

If you want to save the file name instead, use the id column to access files.list

map_df(files.list, ~read_traj(.x), .id="id") %>%
  mutate(id = files.list[as.numeric(id)])

回答2:

First of all, you should encapsulate the reading part in a function :

read_log_file <- function(path) {
  trjct <- read.table(path, skip = 23)
  trjct <- trjct[,c("V1","V2","V3", "V4", "V15")]
  colnames(trjct) <- c("t", "x", "y", "z", "Etot")
  return(trjct)
}

Then, you can create a list of data.frame using mapply (kind of apply which can take two parameters, go to datacamp article on apply family if you want to know more).

files.list <- list.files(pattern = ".log")
ids <- 1:length(files.list)

df_list = mapply(function(path, id) {
    df = read_log_file(path)
    df$ID = id
    return(df)
}, files.list, ids, SIMPLIFY=FALSE)

Note the SIMPLIFY=FALSE part, it avoids mapply to return a kind of data.frame and return a raw list of data.frame instead.

Finally, you can concatenate all your data.frame in one with bind_rows from dplyr package :