pandas group by and assign a group id then ungroup

匿名 (未验证) 提交于 2019-12-03 08:33:39

问题:

I have a large data set in the following format:

id, socialmedia 1, facebook 2, facebook 3, google 4, google 5, google 6, twitter 7, google 8, twitter 9, snapchat 10, twitter 11, facebook 

I want to group by then and assign a group_id column and then ungroup (expand) back to individual records.

id, socialmedia, groupId 1, facebook, 1 2, facebook, 1 3, google, 2 4, google, 2 5, google, 2 6, twitter, 3 7, google, 2 8, twitter, 3 9, snapchat, 4 10, twitter, 3 11, facebook, 1 

I tried following but end up with 'DataFrameGroupBy' object does not support item assignment.

x['grpId'] = x.groupby('socialmedia')['socialmedia'].rank(method='dense').astype(int) 

回答1:

By using ngroup

df['grpId']=df.groupby(' socialmedia').ngroup().add(1) df Out[354]:      id  socialmedia  grpId 0    1     facebook      1 1    2     facebook      1 2    3       google      2 3    4       google      2 4    5       google      2 5    6      twitter      4 6    7       google      2 7    8      twitter      4 8    9     snapchat      3 9   10      twitter      4 10  11     facebook      1 

Or pd.factorize and 'categroy'

df['grpId']=pd.factorize(df[' socialmedia'])[0]+1  df Out[358]:      id  socialmedia  grpId 0    1     facebook      1 1    2     facebook      1 2    3       google      2 3    4       google      2 4    5       google      2 5    6      twitter      3 6    7       google      2 7    8      twitter      3 8    9     snapchat      4 9   10      twitter      3 10  11     facebook      1 

df['grpId']=df[' socialmedia'].astype('category').cat.codes.add(1) df Out[356]:      id  socialmedia  grpId 0    1     facebook      1 1    2     facebook      1 2    3       google      2 3    4       google      2 4    5       google      2 5    6      twitter      4 6    7       google      2 7    8      twitter      4 8    9     snapchat      3 9   10      twitter      4 10  11     facebook      1 


回答2:

You can use sklearn.preprocessing.LabelEncoder method:

In [79]: from sklearn.preprocessing import LabelEncoder  In [80]: le = LabelEncoder()  In [81]: df['groupId'] = le.fit_transform(df['socialmedia'])+1  In [82]: df Out[82]:     id socialmedia  groupId 0    1    facebook        1 1    2    facebook        1 2    3      google        2 3    4      google        2 4    5      google        2 5    6     twitter        4 6    7      google        2 7    8     twitter        4 8    9    snapchat        3 9   10     twitter        4 10  11    facebook        1 


回答3:

We could also create a dictionary and map it:

import pandas as pd  df = pd.DataFrame(dict(id=range(1,5),social=["Facebook","Twitter","Facebook","Google"]))  d = dict((k,v) for v,k in enumerate(df['social'].unique(),1)) df['groupid'] = df['social'].map(m)  print(df) 

Returns

   id    social  groupid 0   1  Facebook        1 1   2   Twitter        2 2   3  Facebook        1 3   4    Google        3 

Or one-line like this:

df['groupid'] = df['social'].map({k:v for v,k in enumerate(df['social'].unique(),1)}) 

Timings:

%timeit df['grpId']=df.groupby('social').ngroup().add(1) %timeit df['grpId']=pd.factorize(df['social'])[0]+1 %timeit df['grpId']=df['social'].astype('category').cat.codes.add(1) %timeit df['groupid'] = df['social'].map(dict((k,v) for v,k in enumerate(df['social'].unique(),1))) 

Returns

100 loops, best of 3: 1.5 ms per loop   


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!