Reading a csv file with embedded quotes into R

落爺英雄遲暮 提交于 2019-12-01 06:53:49

问题


I have to work with a .csv file that comes like this:

"IDEA ID,""IDEA TITLE"",""VOTE VALUE"""
"56144,""Net Present Value PLUS (NPV+)"",1"
"56144,""Net Present Value PLUS (NPV+)"",1"

If I use read.csv, I obtain a data frame with one variable. What I need is a data frame with three columns, where columns are separated by commas. How can I handle the quotes at the beginning of the line and the end of the line?


回答1:


I don't think there's going to be an easy way to do this without stripping the initial and terminal quotation marks first. If you have sed on your system (Unix [Linux/MacOS] or Windows+Cygwin?) then

read.csv(pipe("sed -e 's/^\"//' -e 's/\"$//' qtest.csv"))

should work. Otherwise

read.csv(text=gsub("(^\"|\"$)","",readLines("qtest.csv")))

is a little less efficient for big files (you have to read in the whole thing before processing it), but should work anywhere.

(There may be a way to do the regular expression for sed in the same, more-compact form using parentheses that the second example uses, but I got tired of trying to sort out where all the backslashes belonged.)




回答2:


I suggest both removing the initial/terminal quotes and turning the back-to-back double quotes into single double quotes. The latter is crucial in case some of the strings contain commas themselves, as in

"1,""A mostly harmless string"",11"
"2,""Another mostly harmless string"",12"
"3,""These, commas, cause, trouble"",13"

Removing only the initial/terminal quotes while keeping the back-to-back quote leads the read.csv() function to produce 6 variables, as it interprets all commas in the last row as value separators. So the complete code might look like this:

data.text <- readLines("fullofquotes.csv")  # Reads data from file into a character vector.
data.text <- gsub("^\"|\"$", "", data.text) # Removes initial/terminal quotes.
data.text <- gsub("\"\"", "\"", data.text)  # Replaces "" by ".
data <- read.csv(text=data.text, header=FALSE)

Or, of course, all in a single line

data <- read.csv(text=gsub("\"\"", "\"", gsub("^\"|\"$", "", readLines("fullofquotes.csv", header=FALSE))))


来源:https://stackoverflow.com/questions/24647108/reading-a-csv-file-with-embedded-quotes-into-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!