How to download all sheets in a google sheet in R

我的未来我决定 提交于 2021-02-10 06:14:18

问题


I'm looking to download all of the sheets in a single google sheet in R. I'm currently using the gsheet package by [maxconway][1], which allows me to download a sheet using its URL, but it only works on individual sheets, which are differentiated by a gid. The set of google sheets I'm trying to download has over 100 sheets, which makes downloading them one by one with gsheet massively inconvenient - does anyone know of any R packages that automate this or of any way to loop through all of the sheets in a single google sheet?
Here is the code I currently have which downloads just the first of over 100 sheets as a tibble:

all_rolls <- gsheet2tbl('https://docs.google.com/spreadsheets/d/1OEg29XbL_YpO0m5JrLQpOPYTnxVsIg8iP67EYUrtRJg/edit#gid=26346344')

> head(all_rolls)
# A tibble: 6 x 14
  Episode Time   Character `Type of Roll` `Total Value` `Natural Value` `Crit?` `Damage Dealt` `# Kills`
    <int> <drtn> <chr>     <chr>          <chr>         <chr>           <chr>   <chr>              <int>
1       1 37'53" Vex'ahlia Intelligence   20            18              <NA>    <NA>                  NA
2       1 41'48" Grog      Persuasion     19            18              <NA>    <NA>                  NA
3       1 43'25" Keyleth   Persuasion     2             2               <NA>    <NA>                  NA
4       1 46'35" Tiberius  Persuasion     12            3               <NA>    <NA>                  NA
5       1 46'35" Tiberius  Persuasion     27            18              <NA>    <NA>                  NA
6       1 46'35" Percy     Assist         21            15              <NA>    <NA>                  NA
# … with 5 more variables: Notes <chr>, `Non-Roll Kills` <chr>, X12 <chr>, X13 <chr>, X14 <chr>

Note: I've tried removing the #gid field, but then it just downloads the first sheet.


回答1:


UPDATE 2021-01-31: updated code to use new functions that replaced sheets_find() and sheets_sheets() as of googlesheets4 version 0.2.0.

The googlesheets4 package includes a function to list all sheets associated with an account's Google Drive: sheets_find(). From the list of sheets one can use the sheet IDs to read the sheets into R.

library(googlesheets4)
sheets_auth()
theSheets <- gs4_find()
theSheets

My test account on Google has one Google sheet, a spreadsheet of Pokémon Stats.

> theSheets
# A tibble: 1 x 3
  name         id                                           drive_resource   
* <chr>        <chr>                                        <list>           
1 PokemonStats 13rGxY7ScDUl7bFJ9NipO7QUafEACYTH4MagFjcj4pVw <named list [34]>

We can use the ID field to download the sheet.

pokemonData <- sheets_read(theSheets$id[1])
head(pokemonData)


> head(pokemonData)
# A tibble: 6 x 13
  Number Name  Type1 Type2 Total    HP Attack Defense SpecialAtk SpecialDef Speed
   <dbl> <chr> <chr> <chr> <dbl> <dbl>  <dbl>   <dbl>      <dbl>      <dbl> <dbl>
1      1 Bulb… Grass Pois…   318    45     49      49         65         65    45
2      2 Ivys… Grass Pois…   405    60     62      63         80         80    60
3      3 Venu… Grass Pois…   525    80     82      83        100        100    80
4      3 Venu… Grass Pois…   625    80    100     123        122        120    80
5      4 Char… Fire  NA      309    39     52      43         60         50    65
6      5 Char… Fire  NA      405    58     64      58         80         65    80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
> 

One could use the vector theSheets$id with lapply() to read a group of sheets from Google Drive as follows:

sheetList <- lapply(theSheets$id,sheet_read)

To read multiple worksheets within a Google Sheets spreadsheet, we add the sheet= argument to sheet_read(). Here we read the Pokémon Types from the second worksheet in the Pokémon Stats spreadsheet.

pokemonTypes <- sheets_read(theSheets$id[1],sheet = 2)
head(pokemonTypes)

...and the output:

> head(pokemonTypes)
# A tibble: 6 x 1
  Type    
  <chr>   
1 Fire    
2 Grass   
3 Poison  
4 Water   
5 Bug     
6 Fighting
> 

Reading all worksheets in a spreadsheet

We can automate the process of reading multiple tabs from a single spreadsheet. The sheets_sheets() function is useful for this purpose.

# technique where we read multiple worksheets by worksheet name
# using functions from googlesheets4 version 0.2.0. 
theSheets <-gs4_find()
# get metadata from first sheet
sheetMetadata <- gs4_get(theSheets$id[1])
# get worksheet tab names
sheetNames <- sheet_names(theSheets$id[1])
sheetNames

At this point we can see that there are two worksheet tabs in the Pokémon Stats spreadsheet. We use the vector sheetNames with lapply() to read all the worksheets within the main spreadsheet.

theWorksheets <- lapply(sheetNames, function(x){             
     sheets_read(theSheets$id[1],sheet = x) 
})
# use the `names()` function to name the data frames stored in the list
names(theWorksheets) <- sheetNames
lapply(theWorksheets,head)

...and the output:

> lapply(theWorksheets,head)
$Pokemon
# A tibble: 6 x 13
  Number Name  Type1 Type2 Total    HP Attack Defense SpecialAtk SpecialDef Speed
   <dbl> <chr> <chr> <chr> <dbl> <dbl>  <dbl>   <dbl>      <dbl>      <dbl> <dbl>
1      1 Bulb… Grass Pois…   318    45     49      49         65         65    45
2      2 Ivys… Grass Pois…   405    60     62      63         80         80    60
3      3 Venu… Grass Pois…   525    80     82      83        100        100    80
4      3 Venu… Grass Pois…   625    80    100     123        122        120    80
5      4 Char… Fire  NA      309    39     52      43         60         50    65
6      5 Char… Fire  NA      405    58     64      58         80         65    80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>

$Metadata
# A tibble: 6 x 1
  Type    
  <chr>   
1 Fire    
2 Grass   
3 Poison  
4 Water   
5 Bug     
6 Fighting

> 

At this point individual worksheets can be accessed with the $ form of the extract operator, as theWorksheets$Pokemon or theWorksheets$Metadata.



来源:https://stackoverflow.com/questions/61422767/how-to-download-all-sheets-in-a-google-sheet-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!