问题
Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index value is the only 'unique' column to perform the merge back into. Does anyone know how I can achieve this?
My DataFrame is quite large. My groupby looks like this:
df.groupby(['col1', 'col2']).agg({'col3': 'count'}).reset_index()
This drops my original indexes from my original dataframe, which I want to keep.
回答1:
I think you are are looking for transform in this situation:
df['count'] = df.groupby(['col1', 'col2'])['col3'].transform('count')
回答2:
You can elevate your index to a column via reset_index
. Then aggregate your index to a tuple via agg
, together with your count
aggregation.
Below is a minimal example.
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.randint(0, 4, (50, 5)),
index=np.random.randint(0, 4, 50))
df = df.reset_index()
res = df.groupby([0, 1]).agg({2: 'count', 'index': lambda x: tuple(x)}).reset_index()
# 0 1 2 index
# 0 0 0 4 (2, 0, 0, 2)
# 1 0 1 4 (0, 3, 1, 1)
# 2 0 2 1 (1,)
# 3 0 3 1 (3,)
# 4 1 0 4 (1, 2, 1, 3)
# 5 1 1 2 (1, 3)
# 6 1 2 4 (2, 1, 2, 2)
# 7 1 3 1 (2,)
# 8 2 0 5 (0, 3, 0, 2, 2)
# 9 2 1 2 (0, 2)
# 10 2 2 5 (1, 1, 3, 3, 2)
# 11 2 3 2 (0, 1)
# 12 3 0 4 (0, 3, 3, 3)
# 13 3 1 4 (1, 3, 0, 1)
# 14 3 2 3 (3, 2, 1)
# 15 3 3 4 (3, 3, 2, 1)
回答3:
You should not use 'reset_index()' if you want to keep your original indexes
来源:https://stackoverflow.com/questions/49216357/how-to-keep-original-index-of-a-dataframe-after-groupby-2-columns