How do I concatenate pandas DataFrames without copying the data?

前端未结

关注

 1  1551

I want to concatenate two pandas DataFrames without copying the data. That is, I want the concatenated DataFrame to be a view on the data in the two original DataFrames. I t

相关标签:

1条回答

误落风尘

2021-01-06 04:22
You can't (at least easily). When you call concat, ultimately np.concatenate gets called.

See this answer explaining why you can't concatenate arrays without copying. The short of it is that the arrays are not guaranteed to be contiguous in memory.

Here's a simple example
```
a = rand(2, 10)
x, y = a
z = vstack((x, y))
print 'x.base is a and y.base is a ==', x.base is a and y.base is a
print 'x.base is z or y.base is z ==', x.base is z or y.base is z
```
Output:
```
x.base is a and y.base is a == True
x.base is z or y.base is z == False
```
Even though x and y share the same base, namely a, concatenate (and thus vstack) cannot assume that they do since one often wants to concatenate arbitrarily strided arrays.

You easily generate two arrays with different strides sharing the same memory like so:
```
a = arange(10)
b = a[::2]
print a.strides
print b.strides
```
Output:
```
(8,)
(16,)
```
This is why the following happens:
```
In [214]: a = arange(10)

In [215]: a[::2].view(int16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-215-0366fadb1128> in <module>()
----> 1 a[::2].view(int16)

ValueError: new type not compatible with array.

In [216]: a[::2].copy().view(int16)
Out[216]: array([0, 0, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 8, 0, 0, 0], dtype=int16)
```
EDIT: Using pd.merge(df1, df2, copy=False) (or df1.merge(df2, copy=False)) when df1.dtype != df2.dtype will not make a copy. Otherwise, a copy is made.
0 讨论(0)
发布评论:

提交评论
- 加载中...