Read CSV in R and filter columns by name

℡╲_俬逩灬. 提交于 2020-01-24 07:07:06

问题


Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable.

I want something like usecols from pandas' read_csv.

Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory).


回答1:


I will use package data.table and then with fread() specify columns to keep/drop by arguments selector drop. From ?fread

select Vector of column names or numbers to keep, drop the rest.

drop Vector of column names or numbers to drop, keep the rest.

Best!




回答2:


One way is to use package sqldf. If you know SQL, it is possible to read in large files filtering only the parts you want.

I will use built-in dataset iris to make the example reproducible, saving it to disk first.

write.csv(iris, "iris.csv", row.names = FALSE)

Now the problem.
Argument row.names is like in the write.csv instruction.
Note the backticks around Sepal.Length. This is due to the dot character in the column name.

library(sqldf)

sql <- "select `Sepal.Length`, Species from file"
sub_iris <- read.csv.sql("iris.csv", sql = sql, row.names = FALSE)

head(sub_iris)
#  Sepal.Length  Species
#1          5.1 "setosa"
#2          4.9 "setosa"
#3          4.7 "setosa"
#4          4.6 "setosa"
#5          5.0 "setosa"
#6          5.4 "setosa"

And final clean up.

unlink("iris.csv")


来源:https://stackoverflow.com/questions/54611277/read-csv-in-r-and-filter-columns-by-name

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!