问题
I'm looking to download all of the sheets in a single google sheet in R.
I'm currently using the gsheet
package by [maxconway][1]
, which allows me to download a sheet using its URL, but it only works on individual sheets, which are differentiated by a gid
.
The set of google sheets I'm trying to download has over 100 sheets, which makes downloading them one by one with gsheet
massively inconvenient - does anyone know of any R packages that automate this or of any way to loop through all of the sheets in a single google sheet?
Here is the code I currently have which downloads just the first of over 100 sheets as a tibble:
all_rolls <- gsheet2tbl('https://docs.google.com/spreadsheets/d/1OEg29XbL_YpO0m5JrLQpOPYTnxVsIg8iP67EYUrtRJg/edit#gid=26346344')
> head(all_rolls)
# A tibble: 6 x 14
Episode Time Character `Type of Roll` `Total Value` `Natural Value` `Crit?` `Damage Dealt` `# Kills`
<int> <drtn> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 1 37'53" Vex'ahlia Intelligence 20 18 <NA> <NA> NA
2 1 41'48" Grog Persuasion 19 18 <NA> <NA> NA
3 1 43'25" Keyleth Persuasion 2 2 <NA> <NA> NA
4 1 46'35" Tiberius Persuasion 12 3 <NA> <NA> NA
5 1 46'35" Tiberius Persuasion 27 18 <NA> <NA> NA
6 1 46'35" Percy Assist 21 15 <NA> <NA> NA
# … with 5 more variables: Notes <chr>, `Non-Roll Kills` <chr>, X12 <chr>, X13 <chr>, X14 <chr>
Note: I've tried removing the #gid
field, but then it just downloads the first sheet.
回答1:
UPDATE 2021-01-31: updated code to use new functions that replaced sheets_find()
and sheets_sheets()
as of googlesheets4
version 0.2.0.
The googlesheets4
package includes a function to list all sheets associated with an account's Google Drive: sheets_find()
. From the list of sheets one can use the sheet IDs to read the sheets into R.
library(googlesheets4)
sheets_auth()
theSheets <- gs4_find()
theSheets
My test account on Google has one Google sheet, a spreadsheet of Pokémon Stats.
> theSheets
# A tibble: 1 x 3
name id drive_resource
* <chr> <chr> <list>
1 PokemonStats 13rGxY7ScDUl7bFJ9NipO7QUafEACYTH4MagFjcj4pVw <named list [34]>
We can use the ID field to download the sheet.
pokemonData <- sheets_read(theSheets$id[1])
head(pokemonData)
> head(pokemonData)
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
>
One could use the vector theSheets$id
with lapply()
to read a group of sheets from Google Drive as follows:
sheetList <- lapply(theSheets$id,sheet_read)
To read multiple worksheets within a Google Sheets spreadsheet, we add the sheet=
argument to sheet_read()
. Here we read the Pokémon Types from the second worksheet in the Pokémon Stats spreadsheet.
pokemonTypes <- sheets_read(theSheets$id[1],sheet = 2)
head(pokemonTypes)
...and the output:
> head(pokemonTypes)
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting
>
Reading all worksheets in a spreadsheet
We can automate the process of reading multiple tabs from a single spreadsheet. The sheets_sheets()
function is useful for this purpose.
# technique where we read multiple worksheets by worksheet name
# using functions from googlesheets4 version 0.2.0.
theSheets <-gs4_find()
# get metadata from first sheet
sheetMetadata <- gs4_get(theSheets$id[1])
# get worksheet tab names
sheetNames <- sheet_names(theSheets$id[1])
sheetNames
At this point we can see that there are two worksheet tabs in the Pokémon Stats spreadsheet. We use the vector sheetNames
with lapply()
to read all the worksheets within the main spreadsheet.
theWorksheets <- lapply(sheetNames, function(x){
sheets_read(theSheets$id[1],sheet = x)
})
# use the `names()` function to name the data frames stored in the list
names(theWorksheets) <- sheetNames
lapply(theWorksheets,head)
...and the output:
> lapply(theWorksheets,head)
$Pokemon
# A tibble: 6 x 13
Number Name Type1 Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Bulb… Grass Pois… 318 45 49 49 65 65 45
2 2 Ivys… Grass Pois… 405 60 62 63 80 80 60
3 3 Venu… Grass Pois… 525 80 82 83 100 100 80
4 3 Venu… Grass Pois… 625 80 100 123 122 120 80
5 4 Char… Fire NA 309 39 52 43 60 50 65
6 5 Char… Fire NA 405 58 64 58 80 65 80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
$Metadata
# A tibble: 6 x 1
Type
<chr>
1 Fire
2 Grass
3 Poison
4 Water
5 Bug
6 Fighting
>
At this point individual worksheets can be accessed with the $
form of the extract operator, as theWorksheets$Pokemon
or theWorksheets$Metadata
.
来源:https://stackoverflow.com/questions/61422767/how-to-download-all-sheets-in-a-google-sheet-in-r