Read csv file in R with double quotes

前端 未结 3 463
借酒劲吻你
借酒劲吻你 2021-01-17 22:36

Suppose I have a csv file looks like this:

Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,\"\",\"I have comma, ha!\",I have open double quotes\",A,\"\"


        
相关标签:
3条回答
  • 2021-01-17 23:18

    fread from data.table handles this just fine:

    library(data.table)
    
    fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
    A,3,"","I have comma, ha!",I have open double quotes",A,""')
    #   Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
    #1:    A  3      I have comma, ha! I have open double quotes"     A       
    
    0 讨论(0)
  • 2021-01-17 23:20

    I'm not too sure about the structure of CSV files, but you said the author had escaped the comma in the text under content.

    This works to read the text as is with the " at the end.

    read.csv2("Test.csv", header = T,sep = ",", quote="")
    
    0 讨论(0)
  • 2021-01-17 23:25

    This is not valid CSV, so you'll have to do your own parsing. But, assuming the convention is as follows, you can just toggle with scan to take advantage of most of its abilities:

    1. If the field starts with a quote, it is quoted.
    2. If the field does not start with a quote, it is raw

    next_field<-function(stream) {
      p<-seek(stream)
      d<-readChar(stream,1)
      seek(stream,p)
      if(d=="\"")    
        field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)   
      else
        field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
      return(field)
    }
    

    Assuming the above convention, this sufficient to parse as follows

    s<-file("example.csv",open="rt")
    header<-readLines(s,1)
    header<-scan(what="",text=header,sep=",")
    line<-replicate(length(header),next_field(s))
    
    setNames(as.data.frame(lapply(line,type.convert)),header)
    
      Type ID NAME           CONTENT                   RESPONSE GRADE SOURCE
    1    A  3   NA I have comma, ha! I have open double quotes"     A     NA
    

    However, in practice you might want to first write back the fields, quoting each, to another file, so you can just read.csv on the corrected format.

    0 讨论(0)
提交回复
热议问题