What's the fastest way to pickle a pandas DataFrame?

后端 未结 2 1899
北荒
北荒 2021-02-19 19:25

Which is better, using Pandas built-in method or pickle.dump?

The standard pickle method looks like this:

pickle.dump(my_dataframe, open(\'t         


        
相关标签:
2条回答
  • 2021-02-19 19:28

    Thanks to @qwwqwwq I discovered that pandas has a built-in to_pickle method for dataframes. I did a quick time test:

    In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
    10 loops, best of 3: 91.8 ms per loop
    
    In [2]: %timeit df.to_pickle('testpickle.p')
    10 loops, best of 3: 88 ms per loop
    

    So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!

    0 讨论(0)
  • 2021-02-19 19:36

    Easy benchmark, right?

    Not difference at all, in fact I expect that Pandas implements getstate so that calling pickle.dump(df) is actually the same as calling df.to_pickle().

    If you search for example __getstate__ on the Pandas source code, you will find that it is implemented on several objects.

    0 讨论(0)
提交回复
热议问题