pandas efficient dataframe set row

后端 未结 2 1360
感动是毒
感动是毒 2021-02-18 18:30

First I have the following empty DataFrame preallocated:

df=DataFrame(columns=range(10000),index=range(1000))

Then I want to update the d

2条回答
  •  滥情空心
    2021-02-18 19:08

    Here's 3 methods, only 100 columns, 1000 rows

    In [5]: row = np.random.randn(100)
    

    Row wise assignment

    In [6]: def method1():
       ...:     df = DataFrame(columns=range(100),index=range(1000))
       ...:     for i in xrange(len(df)):
       ...:         df.iloc[i] = row
       ...:     return df
       ...: 
    

    Build up the arrays in a list, create the frame all at once

    In [9]: def method2():
       ...:     return DataFrame([ row for i in range(1000) ])
       ...: 
    

    Columnwise assignment (with transposes at both ends)

    In [13]: def method3():
       ....:     df = DataFrame(columns=range(100),index=range(1000)).T
       ....:     for i in xrange(1000):
       ....:         df[i] = row
       ....:     return df.T
       ....: 
    

    These all have the same output frame

    In [22]: (method2() == method1()).all().all()
    Out[22]: True
    
    In [23]: (method2() == method3()).all().all()
    Out[23]: True
    
    
    In [8]: %timeit method1()
    1 loops, best of 3: 1.76 s per loop
    
    In [10]: %timeit method2()
    1000 loops, best of 3: 7.79 ms per loop
    
    In [14]: %timeit method3()
    1 loops, best of 3: 1.33 s per loop
    

    It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.

提交回复
热议问题