I am starting to learn Pandas, and I was following the question here and could not get the solution proposed to work for me and I get an indexing error. This is what I have
The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df
.
The index of df
is a simple index:
In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it new_column
:
In [15]: new_column.index
Out[15]:
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]
For this reason, you cannot insert it into the frame. However, this is a bug in 0.12, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword as_index=False
should ensure the column L1
is not added to the index.
SOLUTION for 0.12:
Remove the first level of the MultiIndex, so you get back the original index:
In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)
In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the as_index=False
is used in the groupby call, so the column L1
(fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the as_index
keyword is ignored in 0.12 when using apply
.