How to import data from Google Cloud Storage to Google Colab

前端未结

关注

 2  2009

忘了有多久 2021-02-07 13:08

Currently I am working on a data set that is of 10 GB. I have uploaded it on google cloud storage but I don\'t know how to import it in google colab.

2条回答

孤街浪徒 (楼主)

2021-02-07 13:44

Using a dedicated service account and Python:

from google.oauth2 import service_account
from google.cloud.storage import client
import io
import pandas as pd
from io import BytesIO
import json
import filecmp

Using the service account token as str:

SERVICE_ACCOUNT = json.loads(r"""{
  "type": "service_account",
  "project_id": "[REPLACE WITH YOUR FILE]",
  "privat_sae_key_id": "[REPLACE WITH YOUR FILE]",
  "private_key": "[REPLACE WITH YOUR FILE]",
  "client_email": "[REPLACE WITH YOUR FILE]",
  "client_id": "[REPLACE WITH YOUR FILE]",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "[REPLACE WITH YOUR FILE]"
}""")

BUCKET = "[NAME OF YOUR BUCKET TO READ/WITE YOUR DATA]"

Using the service token to create the client:

credentials = service_account.Credentials.from_service_account_info(
    SERVICE_ACCOUNT,
    scopes=["https://www.googleapis.com/auth/cloud-platform"],
)

client = client.Client(
    credentials=credentials,
    project=credentials.project_id,
)

Save and download functions:

def save_file(local_filename, remote_filename):
    bucket = client.get_bucket(BUCKET)
    blob = bucket.blob(remote_filename)
    blob.upload_from_filename(local_filename)

def download_file(local_filename, remote_filename):
    bucket = client.get_bucket(BUCKET)
    blob = bucket.blob(remote_filename)
    blob.download_to_filename(local_filename)

Let's check with a CSV file generated by Pandas:

df_test = pd.DataFrame(
    {"col1": [1,2,3],
     "col2": [4,5,6]}
).to_csv(path_or_buf="/tmp/test.csv")

save_file("/tmp/test.csv","test.csv")
download_file("/tmp/test2.csv","test.csv")
assert filecmp.cmp('/tmp/test.csv', '/tmp/test2.csv')

0 讨论(0)

查看其它2个回答