Joining across databases with dbplyr

不羁岁月 提交于 2019-12-08 04:25:58

问题


I am working with database tables with dbplyr

I have a local table and want to join it with a large (150m rows) table on the database

The database PRODUCTION is read only

# Set up the connection and point to the table

library(odbc); library(dbplyr)    

my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")

t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string)

order_line <- tbl(t2690, "order_line") #150m rows

I also have a local table, let's call it orders

# fill df with random data

orders <- data.frame(matrix(rexp(50), nrow = 100000, ncol = 5))

names(orders) <- c("customer_id", paste0(rep("variable_", 4), 1:4))

let's say I wanted to join these two tables, I get the following error:

complete_orders <- orders %>% left_join(order_line)

> Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

The issue is, if I were to set copy = TRUE, it would try to download the whole of order_line and my computer would quickly run out of memory

Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.

The only solution I have found is to upload into the writable database and use sql to join them, which is far from ideal


回答1:


I have found the answer, you can use the in_schema() function within the tbl pointer to work across schemas within the same connection

# Connect without specifying a database
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")    

# Upload the local table to the TEMP db then point to it
orders <- tbl(t2690, in_schema("TEMP", "orders"))

order_line <-  tbl(t2690, in_schema("PRODUCTION", "order_line"))

complete_orders <- orders %>% left_join(order_line)


来源:https://stackoverflow.com/questions/50327140/joining-across-databases-with-dbplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!