Currently I am working on a data set that is of 10 GB. I have uploaded it on google cloud storage but I don\'t know how to import it in google colab.
Using a dedicated service account and Python:
from google.oauth2 import service_account
from google.cloud.storage import client
import io
import pandas as pd
from io import BytesIO
import json
import filecmp
Using the service account token as str:
SERVICE_ACCOUNT = json.loads(r"""{
"type": "service_account",
"project_id": "[REPLACE WITH YOUR FILE]",
"privat_sae_key_id": "[REPLACE WITH YOUR FILE]",
"private_key": "[REPLACE WITH YOUR FILE]",
"client_email": "[REPLACE WITH YOUR FILE]",
"client_id": "[REPLACE WITH YOUR FILE]",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "[REPLACE WITH YOUR FILE]"
}""")
BUCKET = "[NAME OF YOUR BUCKET TO READ/WITE YOUR DATA]"
Using the service token to create the client:
credentials = service_account.Credentials.from_service_account_info(
SERVICE_ACCOUNT,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = client.Client(
credentials=credentials,
project=credentials.project_id,
)
Save and download functions:
def save_file(local_filename, remote_filename):
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(remote_filename)
blob.upload_from_filename(local_filename)
def download_file(local_filename, remote_filename):
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(remote_filename)
blob.download_to_filename(local_filename)
Let's check with a CSV file generated by Pandas:
df_test = pd.DataFrame(
{"col1": [1,2,3],
"col2": [4,5,6]}
).to_csv(path_or_buf="/tmp/test.csv")
save_file("/tmp/test.csv","test.csv")
download_file("/tmp/test2.csv","test.csv")
assert filecmp.cmp('/tmp/test.csv', '/tmp/test2.csv')