I am operating on chunks of panel data with about 3*10^6 rows and 11 columns. This data is stored in a SQL database (the data I use is a subset of an even larger dataset). I want to process chunks of the data (each date seperately) without importing the entire thing all at once, but the operations I have to run on each chunk are computationally intensive, so I would like to use the foreach package.
However, I am running into some problems with running the sql call in parallel. If I do a simple call such as (sqlQuery(channel, "select 1")
) I can run that in parallel. If I do a more complex statement I get SIGPIPE errors. Has anyone else run into similar problems?
You did not tell us what database you were using. On a SQLServer connected with RODBC, if have successfully done this by using transactions.
channel = odbcConnect(database)
odbcSetAutoCommit(channel,FALSE)
# Make requests here
odbcEndTran(channel,TRUE)
I am not sure if this will also help when you have only read request as in your case, but it might be worth giving a try because it is only twp additional lines. I asked about the server, because transactions work nicely with MSSqlServer under ODBC, and I had some mixed results with others (but did not try hard).
来源:https://stackoverflow.com/questions/8245246/parallel-database-calls-with-rodbc