How to get SQL database connection compatible with DBI::dbGetQuery function when converting between R script and databricks R notebook?

爷,独闯天下 提交于 2019-12-13 03:48:38

问题


I have an R script that uses odbc::dbConnect to connect to an SQL database (some databases are Azure, some are on-premise but connected to the Azure VPNs via the company's network, though I don't have any understanding of the network infrastructure itself) and then uses DBI::dbGetQuery to run a series of fairly complicated SQL queries and store the results as R dataframes which can be manipulated and fed into my models.

Because of insufficient memory on my local PC to run the script, I am having to transfer the script to a Databricks notebook and run it on a cluster with a more powerful driver node. However I am running out of time and am not able or willing to completely rewrite everything to be sparkR/sparklyr compatible or parallelisable; I just want to run my standard R script as closely to the script I have already written as possible with minimal edits for compatibility with Spark.

I am aware that the odbc, RODBC and RJDBC packages do not work on Databricks notebooks. I have looked into using SparkR::read.jdbc and Sparklyr::spark_read_jdbc() but these directly read from a jdbc source into a SparkR/sparklyr dataframe rather than creating a connection which can be accessed via DBI::dbGetQuery, which is not what I'm trying to do. I can't find any resources anywhere on how to do this, though it seems like it ought to be possible.

Is there any way I can create using sparklyr (or another package if necessary) a connection object to a database that can be accessed using DBI::dbGetQuery?

来源:https://stackoverflow.com/questions/56701533/how-to-get-sql-database-connection-compatible-with-dbidbgetquery-function-when

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!