Create sql table from dask dataframe using map_partitions and pd.df.to_sql

后端 未结 2 1962
北荒
北荒 2020-12-31 10:48

Dask doesn\'t have a df.to_sql() like pandas and so I am trying to replicate the functionality and create an sql table using the map_partitions method to do so.

相关标签:
2条回答
  • 2020-12-31 11:16

    Simply, you have created a dataframe which is a prescription of the work to be done, but you have not executed it. To execute, you need to call .compute() on the result.

    Note that the output here is not really a dataframe, each partition evaluates to None (because to_sql has no output), so it might be cleaner to express this with df.to_delayed, something like

    dto_sql = dask.delayed(pd.DataFrame.to_sql)
    out = [dto_sql(d, 'table_name', db_url, if_exists='append', index=True)
           for d in ddf.to_delayed()]
    dask.compute(*out)
    

    Also note, that whether you get good parallelism will depend on the database driver and the data system itself.

    0 讨论(0)
  • 2020-12-31 11:30

    UPDATE : Dask to_sql() is now available https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.to_sql

    0 讨论(0)
提交回复
热议问题