Reading a csv file with embedded quotes into R

ε祈祈猫儿з 提交于 2019-12-01 08:56:37

I don't think there's going to be an easy way to do this without stripping the initial and terminal quotation marks first. If you have sed on your system (Unix [Linux/MacOS] or Windows+Cygwin?) then

read.csv(pipe("sed -e 's/^\"//' -e 's/\"$//' qtest.csv"))

should work. Otherwise

read.csv(text=gsub("(^\"|\"$)","",readLines("qtest.csv")))

is a little less efficient for big files (you have to read in the whole thing before processing it), but should work anywhere.

(There may be a way to do the regular expression for sed in the same, more-compact form using parentheses that the second example uses, but I got tired of trying to sort out where all the backslashes belonged.)

I suggest both removing the initial/terminal quotes and turning the back-to-back double quotes into single double quotes. The latter is crucial in case some of the strings contain commas themselves, as in

"1,""A mostly harmless string"",11"
"2,""Another mostly harmless string"",12"
"3,""These, commas, cause, trouble"",13"

Removing only the initial/terminal quotes while keeping the back-to-back quote leads the read.csv() function to produce 6 variables, as it interprets all commas in the last row as value separators. So the complete code might look like this:

data.text <- readLines("fullofquotes.csv")  # Reads data from file into a character vector.
data.text <- gsub("^\"|\"$", "", data.text) # Removes initial/terminal quotes.
data.text <- gsub("\"\"", "\"", data.text)  # Replaces "" by ".
data <- read.csv(text=data.text, header=FALSE)

Or, of course, all in a single line

data <- read.csv(text=gsub("\"\"", "\"", gsub("^\"|\"$", "", readLines("fullofquotes.csv", header=FALSE))))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!