Pandas DataFrame concat vs append

后端未结

关注

 4  1273

I have a list of 4 pandas dataframes containing a day of tick data that I want to merge into a single data frame. I cannot understand the behavior of concat on my timestamps

相关标签:

4条回答

忘掉有多难

2020-11-28 04:07

So what are you doing is with append and concat is almost equivalent. The difference is the empty DataFrame. For some reason this causes a big slowdown, not sure exactly why, will have to look at some point. Below is a recreation of basically what you did.

I almost always use concat (though in this case they are equivalent, except for the empty frame); if you don't use the empty frame they will be the same speed.

In [17]: df1 = pd.DataFrame(dict(A = range(10000)),index=pd.date_range('20130101',periods=10000,freq='s'))

In [18]: df1
Out[18]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10000 entries, 2013-01-01 00:00:00 to 2013-01-01 02:46:39
Freq: S
Data columns (total 1 columns):
A    10000  non-null values
dtypes: int64(1)

In [19]: df4 = pd.DataFrame()

The concat

In [20]: %timeit pd.concat([df1,df2,df3])
1000 loops, best of 3: 270 us per loop

This is equavalent of your append

In [21]: %timeit pd.concat([df4,df1,df2,df3])
10 loops, best of 

 3: 56.8 ms per loop

0 讨论(0)

佛祖请我去吃肉

2020-11-28 04:11

I have implemented a tiny benchmark (please find the code on Gist) to evaluate the pandas' concat and append. I updated the code snippet and the results after the comment by ssk08 - thanks alot!

The benchmark ran on a Mac OS X 10.13 system with Python 3.6.2 and pandas 0.20.3.

+--------+---------------------------------+---------------------------------+
|        | ignore_index=False              | ignore_index=True               |
+--------+---------------------------------+---------------------------------+
| size   | append | concat | append/concat | append | concat | append/concat |
+--------+--------+--------+---------------+--------+--------+---------------+
| small  | 0.4635 | 0.4891 | 94.77 %       | 0.4056 | 0.3314 | 122.39 %      |
+--------+--------+--------+---------------+--------+--------+---------------+
| medium | 0.5532 | 0.6617 | 83.60 %       | 0.3605 | 0.3521 | 102.37 %      |
+--------+--------+--------+---------------+--------+--------+---------------+
| large  | 0.9558 | 0.9442 | 101.22 %      | 0.6670 | 0.6749 | 98.84 %       |
+--------+--------+--------+---------------+--------+--------+---------------+

Using ignore_index=False append is slightly faster, with ignore_index=True concat is slightly faster.

tl;dr No significant difference between concat and append.

0 讨论(0)

佛祖请我去吃肉

2020-11-28 04:12
Pandas concat vs append vs join vs merge
- Concat gives the flexibility to join based on the axis( all rows or all columns)
- Append is the specific case(axis=0, join='outer') of concat
- Join is based on the indexes (set by set_index) on how variable =['left','right','inner','couter']
- Merge is based on any particular column each of the two dataframes, this columns are variables on like 'left_on', 'right_on', 'on'
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-11-28 04:23

One more thing you have to keep in mind that the APPEND() method in Pandas doesn't modify the original object. Instead it creates a new one with combined data. Because of involving creation and data buffer, its performance is not well. You'd better use CONCAT() function when doing multi-APPEND operations.

0 讨论(0)
发布评论:

提交评论
- 加载中...

Pandas DataFrame concat vs append

Pandas concat vs append vs join vs merge