I want to concatenate two pandas DataFrames without copying the data. That is, I want the concatenated DataFrame to be a view on the data in the two original DataFrames. I t
You can't (at least easily). When you call concat
, ultimately np.concatenate
gets called.
See this answer explaining why you can't concatenate arrays without copying. The short of it is that the arrays are not guaranteed to be contiguous in memory.
Here's a simple example
a = rand(2, 10)
x, y = a
z = vstack((x, y))
print 'x.base is a and y.base is a ==', x.base is a and y.base is a
print 'x.base is z or y.base is z ==', x.base is z or y.base is z
Output:
x.base is a and y.base is a == True
x.base is z or y.base is z == False
Even though x
and y
share the same base
, namely a
, concatenate
(and thus vstack
) cannot assume that they do since one often wants to concatenate arbitrarily strided arrays.
You easily generate two arrays with different strides sharing the same memory like so:
a = arange(10)
b = a[::2]
print a.strides
print b.strides
Output:
(8,)
(16,)
This is why the following happens:
In [214]: a = arange(10)
In [215]: a[::2].view(int16)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-215-0366fadb1128> in <module>()
----> 1 a[::2].view(int16)
ValueError: new type not compatible with array.
In [216]: a[::2].copy().view(int16)
Out[216]: array([0, 0, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 8, 0, 0, 0], dtype=int16)
EDIT: Using pd.merge(df1, df2, copy=False)
(or df1.merge(df2, copy=False)
) when df1.dtype != df2.dtype
will not make a copy. Otherwise, a copy is made.