问题
I am working with large datasets and tidyr's spread
usually gives me error messages suggesting failure to obtain memory
to perform the operation.
Therefore, I have been exploring dbplyr. However, as it says here, and also shown below, dbplyr::spread()
does not work.
My question here is whether there is another way to accomplish what tidyr::spread
does while working with tbl_dbi
and tbl_sql
data without downloading to local memory.
Using sample data from here, below I present what I get and what I would like to do and get.
#sample tbl_dbi and tbl_sql data
df_sample <- tribble(~group1, ~group2, ~group3, ~identifier, ~value,
8, 24, 6, 'mt_0',
12, 18, 24, 6, 'mt_1', 4)
con <- DBI::dbConnect(RSQLite::SQLite(), "")
df_db <- copy_to(con, df_sample, 'df_sample')
#attempting to spread tbl_dbi and tbl_sql without downloading to local memory
//this does not work
df_db %>% spread(identifier, value)
Error in UseMethod("spread_") :
no applicable method for 'spread_' applied to an object of class "c('tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
#attempting to spread tbl_dbi and tbl_sql after downloading to local memory
//this spreads the data but the output is in memory
//I would like to keep the output as 'tbl_dbi', 'tbl_sql', and 'tbl_lazy'
df_db %<>% collect() %>% spread(identifier, value)
class(df_db)
[1] "tbl_df" "tbl" "data.frame"
Thanks in advance for any help
来源:https://stackoverflow.com/questions/54838395/how-to-spread-tbl-dbi-and-tbl-sql-data-without-downloading-to-local-memory