问题
My dataframe looks like this:
ID Class
0 9
1 8
1 6
2 6
2 2
3 15
3 1
3 8
What I would like to do is merging rows with same ID value in a way below:
ID Class1 Class2 Class3
0 9
1 8 6
2 6 2
3 15 1 8
So for each ID which exists more than once, I want to create new column(s) and move values from rows to those columns. What is the fastest way to do this? I tried using groupby
but it didn't give me appriopate results.
回答1:
Use set_index with cumcount for new columns, reshape by unstack and last rename columns by add_prefix:
df = df.set_index(['ID', df.groupby('ID').cumcount()])['Class']
.unstack()
.add_prefix('Class')
.reset_index()
print (df)
ID Class0 Class1 Class2
0 0 9.0 NaN NaN
1 1 8.0 6.0 NaN
2 2 6.0 2.0 NaN
3 3 15.0 1.0 8.0
Another solution is create list
per groups and then new DataFrame
by constructor:
s = df.groupby('ID')['Class'].apply(list)
df = pd.DataFrame(s.values.tolist(), index=s.index)
.add_prefix('Class')
.reset_index()
print (df)
ID Class0 Class1 Class2
0 0 9 NaN NaN
1 1 8 6.0 NaN
2 2 6 2.0 NaN
3 3 15 1.0 8.0
EDIT:
df = df.set_index('ID')
df1=pd.get_dummies(df['Class']).reindex(columns=range(17), fill_value=0).add_prefix('Class')
df1 = df1.groupby(level=0).max().reset_index()
print (df1)
ID Class0 Class1 Class2 Class3 Class4 Class5 Class6 Class7 Class8 \
0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1 0 1
2 2 0 0 1 0 0 0 1 0 0
3 3 0 1 0 0 0 0 0 0 1
Class9 Class10 Class11 Class12 Class13 Class14 Class15 Class16
0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
回答2:
Or you can try
df.groupby('ID').Class.apply(lambda x : x.tolist()).to_frame()['Class'].apply(pd.Series).add_prefix('Class_').fillna(' ')
Out[602]:
Class_0 Class_1 Class_2
ID
0 9.0
1 8.0 6
2 6.0 2
3 15.0 1 8
来源:https://stackoverflow.com/questions/45918559/python-merging-rows-with-same-value-in-one-column