How to load large datasets to R from BigQuery?

ぃ、小莉子 提交于 2019-12-08 07:17:03

问题


I have tried two ways with Bigrquery package such that

library(bigrquery)
library(DBI)

con <- dbConnect(
  bigrquery::bigquery(),
  project = "YOUR PROJECT ID HERE",
  dataset = "YOUR DATASET"
)
test<- dbGetQuery(con, sql, n = 10000, max_pages = Inf)

and

sql <- `YOUR LARGE QUERY HERE` #long query saved to View and its select here
tb <- bigrquery::bq_project_query(project, sql)
bq_table_download(tb, max_results = 1000)

but failing to the error "Error: Requested Resource Too Large to Return [responseTooLarge]", potentially related issue here, but I am interested in any tool to get the job done: I tried already the solutions outlined here but they failed.

How can I load large datasets to R from BigQuery?


回答1:


As @hrbrmstr kind of suggested you, the documentation mentions specifically:

> #' @param page_size The number of rows returned per page. Make this smaller
> #'   if you have many fields or large records and you are seeing a
> #'   'responseTooLarge' error.

In this documentation from r-project.org you will read a different advise in the explanation of this function (page 13):

This retrieves rows in chunks of page_size. It is most suitable for results of smaller queries (<100 MB, say). For larger queries, it is better to export the results to a CSV file stored on google cloud and use the bq command line tool to download locally.




回答2:


I just started using BigQuery too. I think it should be something like this.

The current bigrquery release can be installed from CRAN:

install.packages("bigrquery")

The newest development release can be installed from GitHub:

install.packages('devtools')
devtools::install_github("r-dbi/bigrquery")

Usage Low-level API

library(bigrquery)
billing <- bq_test_project() # replace this with your project ID 
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"

tb <- bq_project_query(billing, sql)
#> Auto-refreshing stale OAuth token.
bq_table_download(tb, max_results = 10)

DBI

library(DBI)

con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = billing
)
con 
#> <BigQueryConnection>
#>   Dataset: publicdata.samples
#>   Billing: bigrquery-examples

dbListTables(con)
#> [1] "github_nested"   "github_timeline" "gsod"            "natality"       
#> [5] "shakespeare"     "trigrams"        "wikipedia"

dbGetQuery(con, sql, n = 10)



library(dplyr)

natality <- tbl(con, "natality")

natality %>%
  select(year, month, day, weight_pounds) %>% 
  head(10) %>%
  collect()


来源:https://stackoverflow.com/questions/52138048/how-to-load-large-datasets-to-r-from-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!