Write a Pandas DataFrame to Google Cloud Storage or BigQuery

前端未结

关注

 8  1170

Hello and thanks for your time and consideration. I am developing a Jupyter Notebook in the Google Cloud Platform / Datalab. I have created a Pandas DataFrame and would like

相关标签:

8条回答

执笔经年

2020-12-02 14:13

I have a little bit simpler solution for the task using Dask. You can convert your DataFrame to Dask DataFrame, which can be written to csv on Cloud Storage

import dask.dataframe as dd
import pandas
df # your Pandas DataFrame
ddf = dd.from_pandas(df,npartitions=1, sort=True)
dd.to_csv('gs://YOUR_BUCKET/ddf-*.csv', index=False, sep=',', header=False,  
                               storage_options={'token': gcs.session.credentials})

0 讨论(0)

时光取名叫无心

2020-12-02 14:17

Writing a Pandas DataFrame to BigQuery

Update on @Anthonios Partheniou's answer.
The code is a bit different now - as of Nov. 29 2017

To define a BigQuery dataset

Pass a tuple containing project_id and dataset_id to bq.Dataset.

# define a BigQuery dataset    
bigquery_dataset_name = ('project_id', 'dataset_id')
dataset = bq.Dataset(name = bigquery_dataset_name)

To define a BigQuery table

Pass a tuple containing project_id, dataset_id and the table name to bq.Table.

# define a BigQuery table    
bigquery_table_name = ('project_id', 'dataset_id', 'table_name')
table = bq.Table(bigquery_table_name)

Create the dataset/ table and write to table in BQ

# Create BigQuery dataset
if not dataset.exists():
    dataset.create()

# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_data(dataFrame_name)
table.create(schema = table_schema, overwrite = True)

# Write the DataFrame to a BigQuery table
table.insert(dataFrame_name)

0 讨论(0)

上一页 1 2