Unable to use rank() over functions in R using sqldf

问题

arm<-as.data.frame(matrix(c(1,1,1,2,2,6,7,4,9,10),ncol=2))

colnames(arm)<-c("a","b")

Hi guys, this is a dataset I created in R. Now I want to rank the column b and group by column a. The following piece of code is throwing this error, no matter what changes I make to the syntax(like adding [], "", etc...)

*****Error in sqliteSendQuery(con, statement, bind.data) : error in statement: near "(": syntax error*****

I was using "sqldf" package.

arm2<-sqldf("select a,
         b,
         rank() over (partition by a order by b) as rank1 
         from arm")

Then I installed the RH2 package and it started to throw the following error:

Error in .verify.JDBC.result(s, "Unable to execute JDBC statement ", statement) : Unable to execute JDBC statement select a, b, rank() over (partition by a order by b) as rank1 from arm (Function "rank" not found; SQL statement: select a, b, rank() over (partition by a order by b) as rank1 from arm [90022-175])

Could you please tell me how to use rank() over function of sql in sqldf package of R. Thanks.

回答1:

sqldf uses SQLite which does not support the rank() function - see here. From the error message you got from H2, it does not either, though it is currently planned.

sqldf has capability to use PostgreSQL rather than SQLite, which does support rank(): see here for an example. Your code as posted should then work.

If you don;t want to use PostgreSQL, you can get the data out in the right order with SQLite and sqldf using:

sqldf("select a, b from arm 
          order by a, b", drv = "SQLite")

but the ranking column is more difficult - see some related answers: 1, 2, 3

Since you are already in R, you could use dplyr, a native R package:

library(dplyr)
arm %>% group_by(a) %>%
        mutate(rank = rank(b))

Or data.table, a faster alternative:

library(data.table)
setDT(arm)[ , rank := rank(b), by = a]

来源：https://stackoverflow.com/questions/32364351/unable-to-use-rank-over-functions-in-r-using-sqldf

标签

sql

syntax

sqldf