Hello and thanks for your time and consideration. I am developing a Jupyter Notebook in the Google Cloud Platform / Datalab. I have created a Pandas DataFrame and would like
I have a little bit simpler solution for the task using Dask. You can convert your DataFrame to Dask DataFrame, which can be written to csv on Cloud Storage
import dask.dataframe as dd
import pandas
df # your Pandas DataFrame
ddf = dd.from_pandas(df,npartitions=1, sort=True)
dd.to_csv('gs://YOUR_BUCKET/ddf-*.csv', index=False, sep=',', header=False,
storage_options={'token': gcs.session.credentials})
Update on @Anthonios Partheniou's answer.
The code is a bit different now - as of Nov. 29 2017
Pass a tuple containing project_id
and dataset_id
to bq.Dataset
.
# define a BigQuery dataset
bigquery_dataset_name = ('project_id', 'dataset_id')
dataset = bq.Dataset(name = bigquery_dataset_name)
Pass a tuple containing project_id
, dataset_id
and the table name to bq.Table
.
# define a BigQuery table
bigquery_table_name = ('project_id', 'dataset_id', 'table_name')
table = bq.Table(bigquery_table_name)
# Create BigQuery dataset
if not dataset.exists():
dataset.create()
# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_data(dataFrame_name)
table.create(schema = table_schema, overwrite = True)
# Write the DataFrame to a BigQuery table
table.insert(dataFrame_name)