Replicating rows in a pandas data frame by a column value

后端 未结 3 1714
北海茫月
北海茫月 2020-11-28 09:35

I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

import pandas as pd

what_i_have = pd.D         


        
相关标签:
3条回答
  • 2020-11-28 10:11

    Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():

    df.reindex(df.index.repeat(df.n)).drop('n', axis=1)
    

    Output:

    
       id   v
    0   A   10
    1   B   13
    1   B   13
    2   C   8
    2   C   8
    2   C   8
    

    You can further append .reset_index(drop=True) to reset the .index.

    0 讨论(0)
  • 2020-11-28 10:24

    You could use set_index and repeat

    In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
    Out[1057]:
      id   v
    0  A  10
    1  B  13
    2  B  13
    3  C   8
    4  C   8
    5  C   8
    

    Details

    In [1058]: df
    Out[1058]:
      id  n   v
    0  A  1  10
    1  B  2  13
    2  C  3   8
    
    0 讨论(0)
  • 2020-11-28 10:28

    You could use np.repeat to get the repeated indices and then use that to index into the frame:

    >>> df2 = df.loc[np.repeat(df.index.values,df.n)]
    >>> df2
      id  n   v
    0  A  1  10
    1  B  2  13
    1  B  2  13
    2  C  3   8
    2  C  3   8
    2  C  3   8
    

    After which there's only a bit of cleaning up to do:

    >>> df2 = df2.drop("n",axis=1).reset_index(drop=True)
    >>> df2
      id   v
    0  A  10
    1  B  13
    2  B  13
    3  C   8
    4  C   8
    5  C   8
    

    Note that if you might have duplicate indices to worry about, you could use .iloc instead:

    In [86]: df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
    Out[86]: 
      id   v
    0  A  10
    1  B  13
    2  B  13
    3  C   8
    4  C   8
    5  C   8
    

    which uses the positions, and not the index labels.

    0 讨论(0)
提交回复
热议问题