Attach a calculated column to an existing dataframe

前端 未结 1 1049
隐瞒了意图╮
隐瞒了意图╮ 2020-12-14 19:11

I am starting to learn Pandas, and I was following the question here and could not get the solution proposed to work for me and I get an indexing error. This is what I have

相关标签:
1条回答
  • 2020-12-14 19:31

    The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df.

    The index of df is a simple index:

    In [8]: df.index
    Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
    

    while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it new_column:

    In [15]: new_column.index
    Out[15]: 
    MultiIndex
    [(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]
    

    For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword as_index=False should ensure the column L1 is not added to the index.

    SOLUTION for 0.12:
    Remove the first level of the MultiIndex, so you get back the original index:

    In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
    In [14]: df["new"] = new_column.reset_index(level=0, drop=True)
    

    In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the as_index=False is used in the groupby call, so the column L1 (fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_index keyword is ignored in 0.12 when using apply.

    0 讨论(0)
提交回复
热议问题