how to generate SQL from dbplyr without a database connection?

别说谁变了你拦得住时间么 提交于 2020-02-22 04:10:28

问题


I currently have access to an Apache Hive database via the beeline CLI. We are still negotiating with IT to get R on the server. Until that time, I would like to (ab)use the R dbplyr package to generate SQL queries on another machine, copy them over, and run them as raw SQL. I have used sql_render in dbplyr in the past in instances where I had a valid database connection, but I do not know how to do this without a valid database connection. The ideal case, for me would be something like:

con <- dummy_connection('hive')   # this does not exist, I think
qry <- tbl(con,'mytable') %>%     # complex logic to build a query
  select(var1,var2) %>%
  filter(var1 > 0)   # etc...
sql_render(qry) %>%               # cat it to a file to be used on another machine.
  as.character() %>%
  cat() 

Is there a way to make this 'dummy' connection? And can it be done in such a way that I can specify the variant of SQL?


回答1:


You can generate an in-memory SQLite database using just R:

library(DBI)
library(odbc)
library(RSQLite)
library(tidyverse)
library(dbplyr)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

data("diamonds")

dbWriteTable(con, "diamonds", diamonds)

With an in-memory SQL database & db connection, you should be able to (ab)use dbplyr connection to the database to get R to write SQL for you.

This is only SQLite, rather than Hive. But hopefully it is still an accelerator to go from R to SQLite to Hive (or your preferred SQL version).

Also see the following links:

  • SQLite vingette
  • Bradley's demo (source of above code)


来源:https://stackoverflow.com/questions/49078185/how-to-generate-sql-from-dbplyr-without-a-database-connection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!