Inserting new rows in pandas data frame at specific indices

前端 未结 2 982
名媛妹妹
名媛妹妹 2020-12-09 20:12

I have a following data frame df with two columns \"identifier\", \"values\" and \"subid\":

     identifier   values    subid
0      1              


        
相关标签:
2条回答
  • 2020-12-09 20:51

    Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.

    x = [2,8,12]
    rows = []
    cur = {}
    
    for i in df.index:
        if i in x:
            cur['index'] = i
            cur['identifier'] = df.iloc[i].identifier
            cur['values'] = df.iloc[i]['values']
            cur['subid'] = df.iloc[i].subid - 1
            rows.append(cur)
            cur = {}
    

    Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.

    offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position
    
    for d in rows:
        df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
        offset+=1
    
    
    df.reset_index(inplace=True)
    df.drop('index', axis=1, inplace=True)
    df
    
        level_0 identifier  subid   values
    0         0          1      1      101
    1         1          1      1      102
    2         0          1      1      103
    3         2          1      2      103
    4         3          1      2      104
    5         4          1      2      105
    6         5          2      3      106
    7         6          2      3      107
    8         7          2      3      108
    9         0          2      3      109
    10        8          2      4      109
    11        9          2      4      110
    12       10          3      5      111
    13       11          3      5      112
    14        0          3      5      113
    15       12          3      6      113
    
    0 讨论(0)
  • 2020-12-09 21:07

    subtract where the prior row is different than the current row

    # edit in place
    df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
    df
    
        identifier  values
    0            1     101
    1            1     102
    2            1     103
    3            1     104
    4            1     105
    5            2     105
    6            2     107
    7            2     108
    8            2     109
    9            2     110
    10           3     110
    11           3     112
    12           3     113
    

    Or

    # new dataframe
    df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))
    
        identifier  values
    0            1     101
    1            1     102
    2            1     103
    3            1     104
    4            1     105
    5            2     105
    6            2     107
    7            2     108
    8            2     109
    9            2     110
    10           3     110
    11           3     112
    12           3     113
    
    0 讨论(0)
提交回复
热议问题