问题
I need to do a loop over sqldf statements and for that I need to call the loop variable inside the sqldf code:
My table "data", might be:
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE)
then I format loaddate as a date:
data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')
Let's say I have a vector "loaddates":
loaddates<- unique(sort(data$loaddate))
And I need to run the following code for each loaddate:
for (i in loaddates) {
sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
") }
But I get the followng error:
Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: i
Is there a way to keep the variable value and use it inside the loop?
Thanks.
EDITION:
I tried:
sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))
But I got:
> [1] loaddate DaysRange DaysRangeNext clientes <0 rows> > (or 0-length row.names)
回答1:
Variable i
will not be replaced in the query as it is. You need sprintf
to assign a value to it. (I also don't know if you need to account for break lines but just to be sure I provide it below. Maybe you don't need for sqldf; in which case just remove strwrap).
#let's assume loaddates is the following:
loaddates <- 'something'
One way to get the query as you want, i.e. with no break lines and with i
taking the values of loaddates you need:
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
which will output:
[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"
Which is what you need in one line with no break lines or with the variable i
unassigned.
In your loop it should be:
for (i in loaddates) {
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
}
With your dataset:
library(sqldf)
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE,stringsAsFactors=F)
loaddates<- unique(sort(data$loaddate))
for (i in loaddates) {
print(sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000) ))
}
Output:
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
回答2:
You might get this working by defining your SQL statement within the loop but outside the function call.
for (i in loaddates) {
statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )
sqldf(statement)
}
回答3:
First create a new data frame. Then join to it:
num_Pcode <- as.numeric("3550")
df_Pcode_0 <- as.data.frame(num_Pcode)
df_Pcode_0
...
returns num_Pcode
.
回答4:
fn$sqldf
allows the use of $
within the sql statement to interpolate R variables. See Example 5 on the sqldf github home page and see the bottom of the help page ?fn
for many more examples. We could reduce setNames(loaddates, loaddates)
to just loaddates
if we don't need names on the output.
Map(function(i)
fn$sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = $i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
"), setNames(loaddates, loaddates))
giving:
$`2014-03-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
$`2014-04-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
来源:https://stackoverflow.com/questions/27925052/r-call-variable-inside-sqldf