Google Drive API: list files with no parent

前端 未结 6 950
旧时难觅i
旧时难觅i 2020-12-30 08:47

The files in Google domain that I administer have gotten into a bad state; there are thousands of files residing in the root directory. I want to identify these files and mo

相关标签:
6条回答
  • 2020-12-30 09:08

    Brute, but simple and it works..

        do {
            try {
                FileList files = request.execute();
    
                for (File f : files.getItems()) {
                    if (f.getParents().size() == 0) {
                            System.out.println("Orphan found:\t" + f.getTitle());
    
                    orphans.add(f);
                    }
                }
    
                request.setPageToken(files.getNextPageToken());
            } catch (IOException e) {
                System.out.println("An error occurred: " + e);
                request.setPageToken(null);
            }
        } while (request.getPageToken() != null
                && request.getPageToken().length() > 0);
    
    0 讨论(0)
  • 2020-12-30 09:11

    The documentation recommends following query: is:unorganized owner:me.

    0 讨论(0)
  • 2020-12-30 09:11

    The premise is:

    • List all files.
    • If a file has no 'parents' field, it means it's an orphan file.
    • So, the script deletes them.

    Before to start you need:

    • To create an OAuth id
    • Then you need to add the permissions '../auth/drive' to your OAuth id, and validating your app against google, so you have delete permissions.

    Ready for copy paste demo

    from __future__ import print_function
    import pickle
    import os.path
    from googleapiclient.discovery import build
    from google_auth_oauthlib.flow import InstalledAppFlow
    from google.auth.transport.requests import Request
    
    # If modifying these scopes, delete the file token.pickle.
    SCOPES = ['https://www.googleapis.com/auth/drive']
    
    def callback(request_id, response, exception):
        if exception:
            print("Exception:", exception)
    
    def main():
        """
       Description:
       Shows basic usage of the Drive v3 API to delete orphan files.
       """
    
        """ --- CHECK CREDENTIALS --- """
        creds = None
        # The file token.pickle stores the user's access and refresh tokens, and is
        # created automatically when the authorization flow completes for the first
        # time.
        if os.path.exists('token.pickle'):
            with open('token.pickle', 'rb') as token:
                creds = pickle.load(token)
        # If there are no (valid) credentials available, let the user log in.
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                creds.refresh(Request())
            else:
                flow = InstalledAppFlow.from_client_secrets_file(
                    'credentials.json', SCOPES)
                creds = flow.run_local_server(port=0)
            # Save the credentials for the next run
            with open('token.pickle', 'wb') as token:
                pickle.dump(creds, token)
    
        """ --- OPEN CONNECTION --- """
        service = build('drive', 'v3', credentials=creds)
    
        page_token = ""
        files = None
        orphans = []
        page_size = 100
        batch_counter = 0
    
        print("LISTING ORPHAN FILES")
        print("-----------------------------")
        while (True):
            # List
            r = service.files().list(pageToken=page_token,
                                     pageSize=page_size,
                                     fields="nextPageToken, files"
                                     ).execute()
            page_token = r.get('nextPageToken')
            files = r.get('files', [])
    
            # Filter orphans
            # NOTE: (If the file has no 'parents' field, it means it's orphan)
            for file in files:
                try:
                    if file['parents']:
                        print("File with a parent found.")
                except Exception as e:
                    print("Orphan file found.")
                    orphans.append(file['id'])
    
            # Exit condition
            if page_token is None:
                break
    
        print("DELETING ORPHAN FILES")
        print("-----------------------------")
        batch_size = min(len(orphans), 100)
        while(len(orphans) > 0):
            batch = service.new_batch_http_request(callback=callback)
            for i in range(batch_size):
                print("File with id {0} queued for deletion.".format(orphans[0]))
                batch.add(service.files().delete(fileId=orphans[0]))
                del orphans[0]
            batch.execute()
            batch_counter += 1
            print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
                                                                 batch_size))
    
    
    if __name__ == '__main__':
        main()
    

    This method won't delete files in the root directory, as they have the 'root' value for the field 'parents'. If not all your orphan files are listed, it means they are being automatically deleted by google. This process might take up to 24h.

    0 讨论(0)
  • 2020-12-30 09:24

    Try to use this in your query:

    'root' in parents 
    
    0 讨论(0)
  • 2020-12-30 09:28

    In Java:

    List<File> result = new ArrayList<File>();
    Files.List request = drive.files().list();
    request.setQ("'root'" + " in parents");
    
    FileList files = null;
    files = request.execute();
    
    for (com.google.api.services.drive.model.File element : files.getItems()) {
        System.out.println(element.getTitle());
    }
    

    'root' is the parent folder, if the file or folder is in the root

    0 讨论(0)
  • 2020-12-30 09:28

    Adreian Lopez, thanks for your script. It really saved me a lot of manual work. Below are the steps that I followed to implement your script:

    1. Created a folder c:\temp\pythonscript\ folder

    2. Created OAuth 2.0 Client ID using https://console.cloud.google.com/apis/credentials and downloaded the credentials file to c:\temp\pythonscript\ folder.

    3. Renamed the above client_secret_#######-#############.apps.googleusercontent.com.json as credentials.json

    4. Copied the Adreian Lopez's python's script and saved it as c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py

    5. Go to "Microsoft Store" on Windows 10 and install Python 3.8

    6. Open the Command Prompt and enter: cd c:\temp\pythonscript\

    7. run pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

    8. run python deleteGoogleDriveOrphanFiles.py and follow the steps on the screen to create c:\temp\pythonscript\token.pickle file and start deleting the orphan files. This step can take quite a while.

    9. Verify the https://one.google.com/u/1/storage

    10. Rerun step 8 again as necessary.

    0 讨论(0)
提交回复
热议问题