Read CSV in R and filter columns by name

问题

Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable.

I want something like usecols from pandas' read_csv.

Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory).

回答1:

I will use package data.table and then with fread() specify columns to keep/drop by arguments selector drop. From ?fread

select Vector of column names or numbers to keep, drop the rest.

drop Vector of column names or numbers to drop, keep the rest.

Best!

回答2:

One way is to use package sqldf. If you know SQL, it is possible to read in large files filtering only the parts you want.

I will use built-in dataset iris to make the example reproducible, saving it to disk first.

write.csv(iris, "iris.csv", row.names = FALSE)

Now the problem.
Argument row.names is like in the write.csv instruction.
Note the backticks around Sepal.Length. This is due to the dot character in the column name.

library(sqldf)

sql <- "select `Sepal.Length`, Species from file"
sub_iris <- read.csv.sql("iris.csv", sql = sql, row.names = FALSE)

head(sub_iris)
#  Sepal.Length  Species
#1          5.1 "setosa"
#2          4.9 "setosa"
#3          4.7 "setosa"
#4          4.6 "setosa"
#5          5.0 "setosa"
#6          5.4 "setosa"

And final clean up.

unlink("iris.csv")

来源：https://stackoverflow.com/questions/54611277/read-csv-in-r-and-filter-columns-by-name

标签

csv

readr