How to import data from Google Cloud Storage to Google Colab

前端 未结 2 1975
忘了有多久
忘了有多久 2021-02-07 13:08

Currently I am working on a data set that is of 10 GB. I have uploaded it on google cloud storage but I don\'t know how to import it in google colab.

2条回答
  •  孤街浪徒
    2021-02-07 13:44

    Using a dedicated service account and Python:

    from google.oauth2 import service_account
    from google.cloud.storage import client
    import io
    import pandas as pd
    from io import BytesIO
    import json
    import filecmp
    

    Using the service account token as str:

    SERVICE_ACCOUNT = json.loads(r"""{
      "type": "service_account",
      "project_id": "[REPLACE WITH YOUR FILE]",
      "privat_sae_key_id": "[REPLACE WITH YOUR FILE]",
      "private_key": "[REPLACE WITH YOUR FILE]",
      "client_email": "[REPLACE WITH YOUR FILE]",
      "client_id": "[REPLACE WITH YOUR FILE]",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "[REPLACE WITH YOUR FILE]"
    }""")
    
    BUCKET = "[NAME OF YOUR BUCKET TO READ/WITE YOUR DATA]"
    

    Using the service token to create the client:

    credentials = service_account.Credentials.from_service_account_info(
        SERVICE_ACCOUNT,
        scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )
    
    client = client.Client(
        credentials=credentials,
        project=credentials.project_id,
    )
    

    Save and download functions:

    def save_file(local_filename, remote_filename):
        bucket = client.get_bucket(BUCKET)
        blob = bucket.blob(remote_filename)
        blob.upload_from_filename(local_filename)
    
    def download_file(local_filename, remote_filename):
        bucket = client.get_bucket(BUCKET)
        blob = bucket.blob(remote_filename)
        blob.download_to_filename(local_filename)
    

    Let's check with a CSV file generated by Pandas:

    df_test = pd.DataFrame(
        {"col1": [1,2,3],
         "col2": [4,5,6]}
    ).to_csv(path_or_buf="/tmp/test.csv")
    
    save_file("/tmp/test.csv","test.csv")
    download_file("/tmp/test2.csv","test.csv")
    assert filecmp.cmp('/tmp/test.csv', '/tmp/test2.csv')
    

提交回复
热议问题