Write a Pandas DataFrame to Google Cloud Storage or BigQuery

前端 未结 8 1166
予麋鹿
予麋鹿 2020-12-02 13:16

Hello and thanks for your time and consideration. I am developing a Jupyter Notebook in the Google Cloud Platform / Datalab. I have created a Pandas DataFrame and would like

相关标签:
8条回答
  • 2020-12-02 14:13

    I have a little bit simpler solution for the task using Dask. You can convert your DataFrame to Dask DataFrame, which can be written to csv on Cloud Storage

    import dask.dataframe as dd
    import pandas
    df # your Pandas DataFrame
    ddf = dd.from_pandas(df,npartitions=1, sort=True)
    dd.to_csv('gs://YOUR_BUCKET/ddf-*.csv', index=False, sep=',', header=False,  
                                   storage_options={'token': gcs.session.credentials})  
    
    0 讨论(0)
  • 2020-12-02 14:17

    Writing a Pandas DataFrame to BigQuery

    Update on @Anthonios Partheniou's answer.
    The code is a bit different now - as of Nov. 29 2017

    To define a BigQuery dataset

    Pass a tuple containing project_id and dataset_id to bq.Dataset.

    # define a BigQuery dataset    
    bigquery_dataset_name = ('project_id', 'dataset_id')
    dataset = bq.Dataset(name = bigquery_dataset_name)
    

    To define a BigQuery table

    Pass a tuple containing project_id, dataset_id and the table name to bq.Table.

    # define a BigQuery table    
    bigquery_table_name = ('project_id', 'dataset_id', 'table_name')
    table = bq.Table(bigquery_table_name)
    

    Create the dataset/ table and write to table in BQ

    # Create BigQuery dataset
    if not dataset.exists():
        dataset.create()
    
    # Create or overwrite the existing table if it exists
    table_schema = bq.Schema.from_data(dataFrame_name)
    table.create(schema = table_schema, overwrite = True)
    
    # Write the DataFrame to a BigQuery table
    table.insert(dataFrame_name)
    
    0 讨论(0)
提交回复
热议问题