Python Pandas MemoryError

前端未结

关注

 2  1059

余生分开走 2021-01-14 22:14

I have those packages installed:

python: 2.7.3.final.0
python-bits: 64
OS: Linux
machine: x86_64
processor: x86_64
byteorder: little
pandas: 0.13.1

2条回答

醉梦人生 (楼主)

2021-01-14 22:54

I can also reproduce it on 0.13.1, but the issue does not occur in 0.12 or in 0.14 (released yesterday), so it seems a bug in 0.13.
So, maybe try to upgrade your pandas version, as the vectorized way is much faster as the apply (5s vs >1min on my machine), and using less peak memory (200Mb vs 980Mb, with %memit) on 0.14

Using your sample data repeated 50000 times (leading to a df of 450k rows), and using the apply_id function of @jsalonen:

In [23]: pd.__version__ 
Out[23]: '0.14.0'

In [24]: %timeit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
1 loops, best of 3: 5.42 s per loop

In [25]: %timeit df_train.apply(apply_id, 1)
1 loops, best of 3: 1min 11s per loop

In [26]: %load_ext memory_profiler

In [27]: %memit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
peak memory: 201.75 MiB, increment: 0.01 MiB

In [28]: %memit df_train.apply(apply_id, 1)
peak memory: 982.56 MiB, increment: 780.79 MiB

0 讨论(0)

查看其它2个回答