Transfer and write Parquet with python and pandas got timestamp error

前端 未结 5 2292
春和景丽
春和景丽 2021-02-19 00:44

I tried to concat() two parquet file with pandas in python .
It can work , but when I try to write and save the Data frame to a parquet file ,it display the error :

<
5条回答
  •  盖世英雄少女心
    2021-02-19 01:08

    I think this is a bug and you should do what Wes says. However, if you need working code now, I have a workaround.

    The solution that worked for me was to specify the timestamp columns to be millisecond precision. If you need nanosecond precision, this will ruin your data... but if that's the case, it may be the least of your problems.

    import pandas as pd
    
    table1 = pd.read_parquet(path=('path1.parquet'))
    table2 = pd.read_parquet(path=('path2.parquet'))
    
    table1["Date"] = table1["Date"].astype("datetime64[ms]")
    table2["Date"] = table2["Date"].astype("datetime64[ms]")
    
    table = pd.concat([table1, table2], ignore_index=True) 
    table.to_parquet('./file.gzip', compression='gzip')
    

提交回复
热议问题