How do I concatenate pandas DataFrames without copying the data?

前端 未结 1 1550
小蘑菇
小蘑菇 2021-01-06 03:25

I want to concatenate two pandas DataFrames without copying the data. That is, I want the concatenated DataFrame to be a view on the data in the two original DataFrames. I t

相关标签:
1条回答
  • 2021-01-06 04:22

    You can't (at least easily). When you call concat, ultimately np.concatenate gets called.

    See this answer explaining why you can't concatenate arrays without copying. The short of it is that the arrays are not guaranteed to be contiguous in memory.

    Here's a simple example

    a = rand(2, 10)
    x, y = a
    z = vstack((x, y))
    print 'x.base is a and y.base is a ==', x.base is a and y.base is a
    print 'x.base is z or y.base is z ==', x.base is z or y.base is z
    

    Output:

    x.base is a and y.base is a == True
    x.base is z or y.base is z == False
    

    Even though x and y share the same base, namely a, concatenate (and thus vstack) cannot assume that they do since one often wants to concatenate arbitrarily strided arrays.

    You easily generate two arrays with different strides sharing the same memory like so:

    a = arange(10)
    b = a[::2]
    print a.strides
    print b.strides
    

    Output:

    (8,)
    (16,)
    

    This is why the following happens:

    In [214]: a = arange(10)
    
    In [215]: a[::2].view(int16)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-215-0366fadb1128> in <module>()
    ----> 1 a[::2].view(int16)
    
    ValueError: new type not compatible with array.
    
    In [216]: a[::2].copy().view(int16)
    Out[216]: array([0, 0, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 8, 0, 0, 0], dtype=int16)
    

    EDIT: Using pd.merge(df1, df2, copy=False) (or df1.merge(df2, copy=False)) when df1.dtype != df2.dtype will not make a copy. Otherwise, a copy is made.

    0 讨论(0)
提交回复
热议问题