How to duplicate Python dataframe one by one?

后端 未结 5 1559
面向向阳花
面向向阳花 2021-01-05 13:01

I have a pandas.DataFrame as follows:

df1 = 
    a    b
0   1    2
1   3    4

I\'d like to make this three times to become:

相关标签:
5条回答
  • 2021-01-05 13:11

    I do not know if it is more efficient than your loop, but it easy enough to construct as:

    Code:

    pd.concat([df] * 3).sort_index()
    

    Test Code:

    df = pd.DataFrame([[1, 2], [3, 4]], columns=list('ab'))
    print(pd.concat([df] * 3).sort_index())
    

    Results:

       a  b
    0  1  2
    0  1  2
    0  1  2
    1  3  4
    1  3  4
    1  3  4
    
    0 讨论(0)
  • 2021-01-05 13:12

    You can use numpy.repeat with parameter scalar 3 and then add columns parameter to DataFrame constructor:

    df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
    print (df)
       a  b
    0  1  2
    1  1  2
    2  1  2
    3  3  4
    4  3  4
    5  3  4
    

    If really want duplicated index what can complicated some pandas functions like reindex which failed:

    r = np.repeat(np.arange(len(df.index)), 3)
    df = pd.DataFrame(df.values[r], df.index[r], df.columns)
    print (df)
       a  b
    0  1  2
    0  1  2
    0  1  2
    1  3  4
    1  3  4
    1  3  4
    
    0 讨论(0)
  • 2021-01-05 13:22

    You can use np.repeat

    df = pd.DataFrame(np.repeat(df.values,[3,3], axis = 0), columns = df.columns)
    

    You get

        a   b
    0   1   2
    1   1   2
    2   1   2
    3   3   4
    4   3   4
    5   3   4
    

    Time testing:

    %timeit pd.DataFrame(np.repeat(df.values,[3,3], axis = 0))
    1000 loops, best of 3: 235 µs per loop
    
    %timeit pd.concat([df] * 3).sort_index()
    best of 3: 1.26 ms per loop
    

    Numpy is definitely faster in most cases so no surprises there

    EDIT: I am not sure if you would be looking for repeating indices but incase you do,

    pd.DataFrame(np.repeat(df.values,3, axis = 0), index = np.repeat(df.index, 3), columns = df.columns)
    
    0 讨论(0)
  • 2021-01-05 13:25

    Build a one dimensional indexer to slice both the the values array and index. You must take care of the index as well to get your desired results.

    • use np.repeat on an np.arange to get the indexer
    • construct a new dataframe using this indexer on both values and the index

    r = np.arange(len(df)).repeat(3)
    pd.DataFrame(df.values[r], df.index[r], df.columns)
    
       a  b
    0  1  2
    0  1  2
    0  1  2
    1  3  4
    1  3  4
    1  3  4
    
    0 讨论(0)
  • 2021-01-05 13:29

    Not the fastest (not the slowest either) but the shortest solution so far.

    #Build a index array and extract the rows to build the desired new df. This handles index and data all at once.    
    df.iloc[np.repeat(df.index,3)]
    
    Out[270]: In [271]: 
       a  b
    0  1  2
    0  1  2
    0  1  2
    1  3  4
    1  3  4
    1  3  4
    
    0 讨论(0)
提交回复
热议问题