First I have the following empty DataFrame preallocated:
df=DataFrame(columns=range(10000),index=range(1000))
Then I want to update the d
Here's 3 methods, only 100 columns, 1000 rows
In [5]: row = np.random.randn(100)
Row wise assignment
In [6]: def method1():
...: df = DataFrame(columns=range(100),index=range(1000))
...: for i in xrange(len(df)):
...: df.iloc[i] = row
...: return df
...:
Build up the arrays in a list, create the frame all at once
In [9]: def method2():
...: return DataFrame([ row for i in range(1000) ])
...:
Columnwise assignment (with transposes at both ends)
In [13]: def method3():
....: df = DataFrame(columns=range(100),index=range(1000)).T
....: for i in xrange(1000):
....: df[i] = row
....: return df.T
....:
These all have the same output frame
In [22]: (method2() == method1()).all().all()
Out[22]: True
In [23]: (method2() == method3()).all().all()
Out[23]: True
In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop
In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop
In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop
It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.