I am using read.csv.sql
to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size <
The problem is that sqldf provides text preprocessing faciliities but the code shown in the question does not use them making it overly complex.
1) Regarding text substitution, use fn$
(from gsubfn which sqldf automatically loads) as discussed on the github page for sqldf. Assuming that we used quote = FALSE in the write.csv since sqlite does not handle quotes natively:
spec <- 'setosa'
out <- fn$read.csv.sql("iris.csv", "select * from file where Species = '$spec' ")
spec <- c("setosa", "versicolor")
string <- toString(sprintf("'%s'", spec)) # add quotes and make comma-separated
out <- fn$read.csv.sql("iris.csv", "select * from file where Species in ($string) ")
2) Regarding deleting double quotes, a simpler way would be to use the following filter=
argument:
read.csv.sql("iris.csv", filter = "tr -d \\042") # Windows
or
read.csv.sql("iris.csv", filter = "tr -d \\\\042") # Linux / bash
depending on your shell. The first one worked for me on Windows (with Rtools installed and on the PATH) and the second worked for me on Linux with bash. It is possible that other variations could be needed for other shells.
2a) Another possibility for removing quotes is to install the free csvfix utility (available on Windows, Linux and Mac) on your system and then use the following filter=
argument which should work in all shells since it does not involve any characters that are typically interpreted specially by either R or most shells. Thus the following should work on all platforms.
read.csv.sql("iris.csv", filter = "csvfix echo -smq")
2b) Another cross platform utility that could be used is xsv. The eol=
argument is only needed on Windows since xsv
produces UNIX style line endings but won't hurt on other platforms so the following line should work on all platforms.
read.csv.sql("iris.csv", eol = "\n", filter = "xsv fmt")
2c) sqldf also includes an awk program (csv.awk) that can be used. It outputs UNIX style newlines so specify eol = "\n" on Windows. On other platforms it won't hurt if you specify it but you can omit it if you wish since that is the default on those platforms.
csv.awk <- system.file("csv.awk", package = "sqldf")
rm_quotes_cmd <- sprintf('gawk -f "%s"', csv.awk)
read.csv.sql("iris.csv", eol = "\n", filter = rm_quotes_cmd)
3) Regarding general tips, note that the verbose=TRUE
argument to read.csv.sql
can be useful to see what it is going on.
read.csv.sql("iris.csv", verbose = TRUE)