Python Pandas replicate rows in dataframe

前端 未结 5 1087
逝去的感伤
逝去的感伤 2020-11-29 20:07

If the data look like:

Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,201         


        
相关标签:
5条回答
  • 2020-11-29 20:40

    Other way is using concat() function:

    import pandas as pd
    
    In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))
    
    In [604]: df
    Out[604]: 
      col1  col2
    0    a     0
    1    b     1
    2    c     2
    
    In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
    Out[605]: 
      col1  col2
    0    a     0
    1    b     1
    2    c     2
    3    a     0
    4    b     1
    5    c     2
    6    a     0
    7    b     1
    8    c     2
    
    In [606]: pd.concat([df]*3)
    Out[606]: 
      col1  col2
    0    a     0
    1    b     1
    2    c     2
    0    a     0
    1    b     1
    2    c     2
    0    a     0
    1    b     1
    2    c     2
    
    0 讨论(0)
  • 2020-11-29 20:49

    You can put df_try inside a list and then do what you have in mind:

    >>> df.append([df_try]*5,ignore_index=True)
    
        Store  Dept       Date  Weekly_Sales IsHoliday
    0       1     1 2010-02-05      24924.50     False
    1       1     1 2010-02-12      46039.49      True
    2       1     1 2010-02-19      41595.55     False
    3       1     1 2010-02-26      19403.54     False
    4       1     1 2010-03-05      21827.90     False
    5       1     1 2010-03-12      21043.39     False
    6       1     1 2010-03-19      22136.64     False
    7       1     1 2010-03-26      26229.21     False
    8       1     1 2010-04-02      57258.43     False
    9       1     1 2010-02-12      46039.49      True
    10      1     1 2010-02-12      46039.49      True
    11      1     1 2010-02-12      46039.49      True
    12      1     1 2010-02-12      46039.49      True
    13      1     1 2010-02-12      46039.49      True
    
    0 讨论(0)
  • 2020-11-29 20:53

    This is an old question, but since it still comes up at the top of my results in Google, here's another way.

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))
    

    Say you want to replicate the rows where col1="b".

    reps = [3 if val=="b" else 1 for val in df.col1]
    df.loc[np.repeat(df.index.values, reps)]
    

    You could replace the 3 if val=="b" else 1 in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

    0 讨论(0)
  • 2020-11-29 21:02
    df = df_try
    for i in range(4):
       df = df.append(df_try)
    
    # Here, we have df_try times 5
    
    df = df.append(df)
    
    # Here, we have df_try times 10
    
    0 讨论(0)
  • 2020-11-29 21:05

    Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).

    import pandas as pd
    
    df = pd.DataFrame([
    [1,1,'2010-02-05',24924.5,False],
    [1,1,'2010-02-12',46039.49,True],
    [1,1,'2010-02-19',41595.55,False],
    [1,1,'2010-02-26',19403.54,False],
    [1,1,'2010-03-05',21827.9,False],
    [1,1,'2010-03-12',21043.39,False],
    [1,1,'2010-03-19',22136.64,False],
    [1,1,'2010-03-26',26229.21,False],
    [1,1,'2010-04-02',57258.43,False]
    ], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])
    
    temp_df = []
    for row in df.itertuples(index=False):
        if row.IsHoliday:
            temp_df.extend([list(row)]*5)
        else:
            temp_df.append(list(row))
    
    df = pd.DataFrame(temp_df, columns=df.columns)
    
    0 讨论(0)
提交回复
热议问题