I have a large number of csv files that I want to read into R. All the Column headings in the csvs are the same. At first I thought I would need to create a loop based on th
Kinda messy but works:
filenames <- c("foo.csv","bar.csv")
import.list <- list(matrix(,4,4),matrix(6,6))
source <- unlist(sapply(1:length(filenames),function(i)rep(gsub(".csv","",filenames[i]),nrow(import.list[[i]]))))
source
[1] "foo" "foo" "foo" "foo" "bar" "bar" "bar" "bar" "bar" "bar"
combined$source <- source
Found this one working for me, which creates new column plus merging whole folder csv files.
Using setNames():
file.list <- list.files(pattern = '*.csv')
file.list <- setNames(file.list, file.list)
df.list <- lapply(file.list, read_csv)
df.list <- Map(function(df, name) {
df$issue <- name
df
}, df.list, names(df.list))
df <- rbindlist(df.list,use.names = TRUE, fill = TRUE, idcol = "Issue")
This one creates new column of the source file, and merge them.
data.table solution
Update: here is a complete data.table solution for this, using the keep.rownames. Assuming all your CSVs are in one folder:
library(data.table)
my.path <- "C:/some/path/to/your/folder" #set the path
filenames <- paste(my.path, list.files(path=my.path), sep="/") #list of files
#this will create a rn column with the path in it
my.dt<- data.table(do.call("rbind", sapply(filenames, read.csv,
sep=";")), keep.rownames = T)
Basic syntax solution
I used Grothendieck's solution and added a line to create a column from the row names. As simple as that:
something <- do.call("rbind", sapply(filenames, read.csv, sep=";", simplify = FALSE))
something$mycolumn <- row.names(something)
If you only want a part of the filename, replace the 2nd line by something like this:
something$mycolumn <- substring(row.names(something),1,3)
This will use the 1st 3 characters from the filename as the value in the new column.
Try this:
do.call("rbind", sapply(filenames, read.csv, simplify = FALSE))
The row names will indicate the source and line number.
Here is a solution using the import_list()
function from rio, which is designed exactly for this purpose.
# setup some example files to import
rio::export(mtcars, "mtcars1.csv")
rio::export(mtcars, "mtcars2.csv")
rio::export(mtcars, "mtcars3.csv")
The default behavior of import_list()
is to get a list of data frames:
str(rio::import_list(dir(pattern = "mtcars")), 1)
## List of 3
## $ :'data.frame': 32 obs. of 11 variables:
## $ :'data.frame': 32 obs. of 11 variables:
## $ :'data.frame': 32 obs. of 11 variables:
But you can use the rbind
argument to instead construct a single data frame (note the _file
column at the end):
str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE))
## 'data.frame': 96 obs. of 12 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : int 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : int 0 0 1 1 0 1 0 1 1 1 ...
## $ am : int 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : int 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : int 4 4 1 1 2 1 4 2 2 4 ...
## $ _file: chr "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...
and the rbind_label
argument to specify the name of the column that identifies each file:
str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE, rbind_label = "source"))
## 'data.frame': 96 obs. of 12 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : int 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : int 0 0 1 1 0 1 0 1 1 1 ...
## $ am : int 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : int 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : int 4 4 1 1 2 1 4 2 2 4 ...
## $ source: chr "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...
For full disclosure: I am the maintainer of rio.
You have already done all the hard work. With a fairly small modification this should be straight-forward.
The logic is:
The following should work:
read_csv_filename <- function(filename){
ret <- read.csv(filename)
ret$Source <- filename #EDIT
ret
}
import.list <- ldply(filenames, read_csv_filename)
Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().