When importing CSV into R how to generate column with name of the CSV?

后端未结

关注

 6  572

I have a large number of csv files that I want to read into R. All the Column headings in the csvs are the same. At first I thought I would need to create a loop based on th

相关标签:

6条回答

终归单人心

2020-12-01 07:16

Kinda messy but works:

filenames <- c("foo.csv","bar.csv")
import.list <- list(matrix(,4,4),matrix(6,6))

source <- unlist(sapply(1:length(filenames),function(i)rep(gsub(".csv","",filenames[i]),nrow(import.list[[i]]))))

source
[1] "foo" "foo" "foo" "foo" "bar" "bar" "bar" "bar" "bar" "bar"

combined$source <- source

0 讨论(0)

走了就别回头了

2020-12-01 07:19

Found this one working for me, which creates new column plus merging whole folder csv files.

Using setNames():

file.list <- list.files(pattern = '*.csv')
file.list <- setNames(file.list, file.list)

df.list <- lapply(file.list, read_csv)
df.list <- Map(function(df, name) {
  df$issue <- name
  df
}, df.list, names(df.list))
df <- rbindlist(df.list,use.names = TRUE, fill = TRUE, idcol = "Issue")

This one creates new column of the source file, and merge them.

0 讨论(0)

北恋

2020-12-01 07:22

data.table solution

Update: here is a complete data.table solution for this, using the keep.rownames. Assuming all your CSVs are in one folder:

library(data.table)
my.path <- "C:/some/path/to/your/folder" #set the path
filenames <- paste(my.path, list.files(path=my.path), sep="/") #list of files

#this will create a rn column with the path in it
my.dt<- data.table(do.call("rbind", sapply(filenames, read.csv,     
                  sep=";")), keep.rownames = T)

Basic syntax solution

I used Grothendieck's solution and added a line to create a column from the row names. As simple as that:

something <- do.call("rbind", sapply(filenames, read.csv, sep=";", simplify = FALSE)) 
something$mycolumn <- row.names(something)

If you only want a part of the filename, replace the 2nd line by something like this:

something$mycolumn <- substring(row.names(something),1,3)

This will use the 1st 3 characters from the filename as the value in the new column.

0 讨论(0)

独厮守ぢ

2020-12-01 07:24
Try this:
```
do.call("rbind", sapply(filenames, read.csv, simplify = FALSE))
```
The row names will indicate the source and line number.
0 讨论(0)
发布评论:

提交评论
- 加载中...

没有蜡笔的小新

2020-12-01 07:27

Here is a solution using the import_list() function from rio, which is designed exactly for this purpose.

# setup some example files to import
rio::export(mtcars, "mtcars1.csv")
rio::export(mtcars, "mtcars2.csv")
rio::export(mtcars, "mtcars3.csv")

The default behavior of import_list() is to get a list of data frames:

str(rio::import_list(dir(pattern = "mtcars")), 1)
## List of 3
##  $ :'data.frame':       32 obs. of  11 variables:
##  $ :'data.frame':       32 obs. of  11 variables:
##  $ :'data.frame':       32 obs. of  11 variables:

But you can use the rbind argument to instead construct a single data frame (note the _file column at the end):

str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE))
## 'data.frame':   96 obs. of  12 variables:
##  $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl  : int  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp : num  160 160 108 258 360 ...
##  $ hp   : int  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec : num  16.5 17 18.6 19.4 17 ...
##  $ vs   : int  0 0 1 1 0 1 0 1 1 1 ...
##  $ am   : int  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear : int  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb : int  4 4 1 1 2 1 4 2 2 4 ...
##  $ _file: chr  "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...

and the rbind_label argument to specify the name of the column that identifies each file:

str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE, rbind_label = "source"))
## 'data.frame':   96 obs. of  12 variables:
##  $ mpg   : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl   : int  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp  : num  160 160 108 258 360 ...
##  $ hp    : int  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat  : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt    : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec  : num  16.5 17 18.6 19.4 17 ...
##  $ vs    : int  0 0 1 1 0 1 0 1 1 1 ...
##  $ am    : int  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear  : int  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb  : int  4 4 1 1 2 1 4 2 2 4 ...
##  $ source: chr  "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...

For full disclosure: I am the maintainer of rio.

0 讨论(0)

清酒与你

2020-12-01 07:34
You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:
1. Create a small helper function that reads an individual csv and adds a column with the file name.
2. Call this helper function in llply()
The following should work:
```
read_csv_filename <- function(filename){
    ret <- read.csv(filename)
    ret$Source <- filename #EDIT
    ret
}

import.list <- ldply(filenames, read_csv_filename)
```
Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().
0 讨论(0)
发布评论:

提交评论
- 加载中...