Skip all leading empty lines in read.csv

筅森魡賤 提交于 2019-12-10 13:59:44

问题


I am wishing to import csv files into R, with the first non empty line supplying the name of data frame columns. I know that you can supply the skip = 0 argument to specify which line to read first. However, the row number of the first non empty line can change between files.

How do I work out how many lines are empty, and dynamically skip them for each file?

As pointed out in the comments, I need to clarify what "blank" means. My csv files look like:

,,,
w,x,y,z
a,b,5,c
a,b,5,c
a,b,5,c
a,b,4,c
a,b,4,c
a,b,4,c

which means there are rows of commas at the start.


回答1:


read.csv automatically skips blank lines (unless you set blank.lines.skip=FALSE). See ?read.csv

After writing the above, the poster explained that blank lines are not actually blank but have commas in them but nothing between the commas. In that case use fread from the data.table package which will handle that. The skip= argument can be set to any character string found in the header:

library(data.table)
DT <- fread("myfile.csv", skip = "w") # assuming w is in the header
DF <- as.data.frame(DT)

The last line can be omitted if a data.table is ok as the returned value.




回答2:


Depending on your file size, this may be not the best solution but will do the job.

Strategy here is, instead of reading file with delimiter, will read as lines, and count the characters and store into temp. Then, while loop will search for first non-zero character length in the list, then will read the file, and store as data_filename.

flist = list.files()
for (onefile in flist) {
  temp = nchar(readLines(onefile))
  i = 1
  while (temp[i] == 0) {
    i = i + 1
  }
  temp = read.table(onefile, sep = ",", skip = (i-1))
  assign(paste0(data, onefile), temp)
}

If file contains headers, you can start i from 2.




回答3:


If the first couple of empty lines are truly empty, then read.csv should automatically skip to the first line. If they have commas but no values, then you can use:

df = read.csv(file = 'd.csv')
df = read.csv(file = 'd.csv',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1]))

It's not efficient if you have large files (since you have to import twice), but it works.

If you want to import a tab-delimited file with the same problem (variable blank lines) then use:

df = read.table(file = 'd.txt',sep='\t')
df = read.table(file = 'd.txt',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1]))


来源:https://stackoverflow.com/questions/26456814/skip-all-leading-empty-lines-in-read-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!