This is a commonly asked question.
The scenario is:-
folderA____ folderA1____folderA1a
\\____folderA2____folderA2a
\\___fo
Depending on your requirements this may work. I shared the root folder I wanted to search with a specific email address, then added "'sharedEmailAddress' in readers" to the query. This limited the results to that sub tree only.
EDIT: April 2020 Google have announced that multi-parent files is being disabled from September 2020. This alters the narrative below and means option 2 is no longer an option. It might be possible to implement Option 2 using shortcuts. I will update this answer further as I test the new restrictions/features
We are all used to the idea of folders (aka directories) in Windows/nix etc. In the real world, a folder is a container, into which documents are placed. It is also possible to place smaller folders inside bigger folders. Thus the big folder can be thought of as containing all of the documents inside its smaller children folders.
However, in Google Drive, a Folder is NOT a container, so much so that in the first release of Google Drive, they weren't even called Folders, they were called Collections. A Folder is simply a File with (a) no contents, and (b) a special mime-type (application/vnd.google-apps.folder). The way Folders are used is exactly the same way that tags (aka labels) are used. The best way to understand this is to consider GMail. If you look at the top of an open mail item, you see two icons. A folder with the tooltip "Move to" and a label with the tooltip "Labels". Click on either of these and the same dialogue box appears and is all about labels. Your labels are listed down the left hand side, in a tree display that looks a lot like folders. Importantly, a mail item can have multiple labels, or you could say, a mail item can be in multiple folders. Google Drive's Folders work in exactly the same way that GMail labels work.
Having established that a Folder is simply a label, there is nothing stopping you from organising your labels in a hierarchy that resembles a folder tree, in fact this is the most common way of doing so.
It should now be clear that a file (let's call it MyFile) in folderA2b is NOT a child or grandchild of folderA. It is simply a file with a label (confusingly called a Parent) of "folderA2b". OK, so how DO I get all the files "under" folderA?
Alternative 1. Recursion
The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-
Alternative 2. The common parent
This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents
.
Alternative 3. Folders first
Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents...
. Using this technique you can get everything in two http calls.
The pseudo code for option 3 is a bit like...
// get all folders from Drive files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name // store in a Map, keyed by ID // find the entry for folderA and note the ID // find any entries where the ID is in the parents, note their IDs // for each such entry, repeat recursively // use all of the IDs noted above to construct a ... // files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...
Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.
Sharing a javascript solution using recursion to build an array of folders, starting with the first level folder and moving down the hierarchy. This array is composed by recursively cycling through the parent Id's of the file in question.
The extract below makes 3 separate queries to the gapi:
the code iterates through the list of files, then creating an array of folder names.
const { google } = require('googleapis')
const gOAuth = require('./googleOAuth')
// resolve the promises for getting G files and folders
const getGFilePaths = async () => {
//update to use Promise.All()
let gRootFolder = await getGfiles().then(result => {return result[2][0]['parents'][0]})
let gFolders = await getGfiles().then(result => {return result[1]})
let gFiles = await getGfiles().then(result => {return result[0]})
// create the path files and create a new key with array of folder paths, returning an array of files with their folder paths
return pathFiles = gFiles
.filter((file) => {return file.hasOwnProperty('parents')})
.map((file) => ({...file, path: makePathArray(gFolders, file['parents'][0], gRootFolder)}))
}
// recursive function to build an array of the file paths top -> bottom
let makePathArray = (folders, fileParent, rootFolder) => {
if(fileParent === rootFolder){return []}
else {
let filteredFolders = folders.filter((f) => {return f.id === fileParent})
if(filteredFolders.length >= 1 && filteredFolders[0].hasOwnProperty('parents')) {
let path = makePathArray(folders, filteredFolders[0]['parents'][0])
path.push(filteredFolders[0]['name'])
return path
}
else {return []}
}
}
// get meta-data list of files from gDrive, with query parameters
const getGfiles = () => {
try {
let getRootFolder = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(name, parents)',
q: "'root' in parents and trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFolders = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents), nextPageToken',
q: "trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFiles = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents, mimeType, fullFileExtension, webContentLink, exportLinks, modifiedTime), nextPageToken',
q: "trashed = false and mimeType != 'application/vnd.google-apps.folder'"})
return Promise.all([getFiles, getFolders, getRootFolder])
}
catch(error) {
return `Error in retriving a file reponse from Google Drive: ${error}`
}
}
// make call out gDrive to get meta-data files. Code adds all files in a single array which are returned in pages
const getGdriveList = async (params) => {
const gKeys = await gOAuth.get()
const drive = google.drive({version: 'v3', auth: gKeys})
let list = []
let nextPgToken
do {
let res = await drive.files.list(params)
list.push(...res.data.files)
nextPgToken = res.data.nextPageToken
params.pageToken = nextPgToken
}
while (nextPgToken)
return list
}
Sharing a Python solution to the excellent Alternative 3 by @pinoyyid, above, in case it's useful to anyone. I'm not a developer so it's probably hopelessly un-pythonic... but it works, only makes 2 API calls, and is pretty quick.
'<folder-id>' in parents
segment per subfolder found.Interestingly, Google Drive seems to have a hard limit of 599 '<folder-id>' in parents
segments per query, so if your folder-to-search has more subfolders than this, you need to chunk the list.
FOLDER_TO_SEARCH = '123456789' # ID of folder to search
DRIVE_ID = '654321' # ID of shared drive in which it lives
MAX_PARENTS = 500 # Limit set safely below Google max of 599 parents per query.
def get_all_folders_in_drive():
"""
Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the
drive itself if a top-level folder). That is, flatten the entire folder structure.
"""
folders_in_drive_dict = {}
page_token = None
max_allowed_page_size = 1000
just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_folders).execute()
folders = results.get('files', [])
page_token = results.get('nextPageToken', None)
for folder in folders:
folders_in_drive_dict[folder['id']] = folder['parents'][0]
if page_token is None:
break
return folders_in_drive_dict
def get_subfolders_of_folder(folder_to_search, all_folders):
"""
Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator.
:param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`.
"""
temp_list = [k for k, v in all_folders.items() if v == folder_to_search] # Get all subfolders
for sub_folder in temp_list: # For each subfolder...
yield sub_folder # Return it
yield from get_subfolders_of_folder(sub_folder, all_folders) # Get subsubfolders etc
def get_relevant_files(self, relevant_folders):
"""
Get files under the folder-to-search and all its subfolders.
"""
relevant_files = {}
chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in
range(0, len(relevant_folders), MAX_PARENTS)]
for folder_list in chunked_relevant_folders_list:
query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents'
relevant_files.update(get_all_files_in_folders(query_term))
return relevant_files
def get_all_files_in_folders(self, parent_folders):
"""
Return a dictionary of file IDs mapped to file names for the specified parent folders.
"""
files_under_folder_dict = {}
page_token = None
max_allowed_page_size = 1000
just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_files).execute()
files = results.get('files', [])
page_token = results.get('nextPageToken', None)
for file in files:
files_under_folder_dict[file['id']] = file['name']
if page_token is None:
break
return files_under_folder_dict
if __name__ == "__main__":
all_folders_dict = get_all_folders_in_drive() # Flatten folder structure
relevant_folders_list = [FOLDER_TO_SEARCH] # Start with the folder-to-archive
for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict):
relevant_folders_list.append(folder) # Recursively search for subfolders
relevant_files_dict = get_relevant_files(relevant_folders_list) # Get the files