Python Pandas MemoryError

前端 未结 2 1052
余生分开走
余生分开走 2021-01-14 22:14

I have those packages installed:

python: 2.7.3.final.0
python-bits: 64
OS: Linux
machine: x86_64
processor: x86_64
byteorder: little
pandas: 0.13.1
         


        
2条回答
  •  醉梦人生
    2021-01-14 22:54

    I can also reproduce it on 0.13.1, but the issue does not occur in 0.12 or in 0.14 (released yesterday), so it seems a bug in 0.13.
    So, maybe try to upgrade your pandas version, as the vectorized way is much faster as the apply (5s vs >1min on my machine), and using less peak memory (200Mb vs 980Mb, with %memit) on 0.14

    Using your sample data repeated 50000 times (leading to a df of 450k rows), and using the apply_id function of @jsalonen:

    In [23]: pd.__version__ 
    Out[23]: '0.14.0'
    
    In [24]: %timeit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
    1 loops, best of 3: 5.42 s per loop
    
    In [25]: %timeit df_train.apply(apply_id, 1)
    1 loops, best of 3: 1min 11s per loop
    
    In [26]: %load_ext memory_profiler
    
    In [27]: %memit df_train['Store'].astype(str) +'_' + df_train['Dept'].astype(str)+'_'+ df_train['Date_Str'].astype(str)
    peak memory: 201.75 MiB, increment: 0.01 MiB
    
    In [28]: %memit df_train.apply(apply_id, 1)
    peak memory: 982.56 MiB, increment: 780.79 MiB
    

提交回复
热议问题