parlapply on sqlQuery from RODBC

こ雲淡風輕ζ 提交于 2019-12-06 10:01:28

问题


R Version : 2.14.1 x64 Running on Windows 7 Connecting to a database on a remote Microsoft SQL Server 2012

I have an unordered vectors of names, say:

names<-c(“A”, “B”, “A”, “C”,”C”)

each of which have an id in a table in my db. I need to convert the names to their corresponding ids.

I currently have the following code to do it.

###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects

nameToID<-function(name, dbConn){
                #dbConn : active db connection formed via odbcDriverConnect
                #name     : a char string

                sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
}
sapply(names, nameToID, dbConn=dbConn)
###

Barring better ways to do this, which could involve loading the table into R then working with the problem there (which is possible), I understand why the following doesn’t work, but I cannot seem to find a solution. Attempting to use parallelization via the package ‘parallel’ :

###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects

nameToID<-function(name, dbConn){
                #dbConn : active db connection formed via odbcDriverConnect
                #name     : a char string

                sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
}

mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c(“sqlQuery”, “dbConn”))
parSapply(cl, names, nameToID, dbConn=dbConn)    #incorrect passing of nameToID’s second argument
###

As in the comment, this is not the correct way to assign the second argument to nameToID.

I have also tried the following:

parSapply(cl, names, function(x) nameToID(x, dbConn))

in place of the previous parSapply call, but that also does not work, with the error being thrown saying “the first parameter is not an open RODBC connection”, presumably referring to the first parameter of the sqlQuery(). dbConn remains open though

The following code does work with parallization.

###
names<-c(“A”, “B”, “A”, “C”,”C”)
dbConn<-odbcDriverConnect(connection=”connection string”) #successfully connects
nameToID<-function(name){
                #name     : a char string
                dbConn<-odbcDriverConnect(connection=”string”)
                result<-sqlQuery(dbConn, paste(“select id from table where name=’”, name, “’”, sep=””))
                odbcClose(dbConn)
                result
}

mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c(“sqlQuery”, “odbcDriverConnect”, “odbcClose”, “dbConn”, “nameToID”))      #throwing everything in
parSapply(cl, names, nameToID)
###

But the constant opening and closing of the connection ruins the gains from parallelization, and seems just a bit silly.

So the overall question would be how to pass the second parameter (the open db connection) to the function within parSapply, in much the same way as it is done in the regular apply? In general, how does one pass a second, third, nth parameter to a function within a parallel routine?

Thanks and if you need any more information let me know.

-DT


回答1:


Database connection objects can't be exported or passed as function arguments because they contain socket connections. If you try, it will be serialized, sent to the workers and deserialized, but it won't work correctly since the socket connection won't be valid.

The solution is to create the database connection on each worker before calling parSapply. I often do that using clusterEvalQ:

clusterEvalQ(cl, {
    library(RODBC)
    dbConn <- odbcDriverConnect(connection="connection string")
    NULL
})

Now the worker function can be written as:

nameToID <- function(name) {
    sqlQuery(dbConn, paste("select id from table where name='", name, "'", sep=""))
}

and called with:

parSapply(cl, names, nameToID)  

Also note that since RODBC is loaded on each of the workers you don't have to export functions defined in it, which I think is good programming practice.



来源:https://stackoverflow.com/questions/24541822/parlapply-on-sqlquery-from-rodbc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!