pandas group by and assign a group id then ungroup

前端 未结 3 536
我寻月下人不归
我寻月下人不归 2020-12-03 20:10

I have a large data set in the following format:

id, socialmedia
1, facebook
2, facebook
3, google
4, google
5, google
6, twitter
7, google
8, twitter
9, sn         


        
相关标签:
3条回答
  • 2020-12-03 20:45

    You can use sklearn.preprocessing.LabelEncoder method:

    In [79]: from sklearn.preprocessing import LabelEncoder
    
    In [80]: le = LabelEncoder()
    
    In [81]: df['groupId'] = le.fit_transform(df['socialmedia'])+1
    
    In [82]: df
    Out[82]:
        id socialmedia  groupId
    0    1    facebook        1
    1    2    facebook        1
    2    3      google        2
    3    4      google        2
    4    5      google        2
    5    6     twitter        4
    6    7      google        2
    7    8     twitter        4
    8    9    snapchat        3
    9   10     twitter        4
    10  11    facebook        1
    
    0 讨论(0)
  • 2020-12-03 20:57

    By using ngroup

    df['grpId']=df.groupby(' socialmedia').ngroup().add(1)
    df
    Out[354]: 
        id  socialmedia  grpId
    0    1     facebook      1
    1    2     facebook      1
    2    3       google      2
    3    4       google      2
    4    5       google      2
    5    6      twitter      4
    6    7       google      2
    7    8      twitter      4
    8    9     snapchat      3
    9   10      twitter      4
    10  11     facebook      1
    

    Or pd.factorize and 'categroy'

    df['grpId']=pd.factorize(df[' socialmedia'])[0]+1
    
    df
    Out[358]: 
        id  socialmedia  grpId
    0    1     facebook      1
    1    2     facebook      1
    2    3       google      2
    3    4       google      2
    4    5       google      2
    5    6      twitter      3
    6    7       google      2
    7    8      twitter      3
    8    9     snapchat      4
    9   10      twitter      3
    10  11     facebook      1
    

    df['grpId']=df[' socialmedia'].astype('category').cat.codes.add(1)
    df
    Out[356]: 
        id  socialmedia  grpId
    0    1     facebook      1
    1    2     facebook      1
    2    3       google      2
    3    4       google      2
    4    5       google      2
    5    6      twitter      4
    6    7       google      2
    7    8      twitter      4
    8    9     snapchat      3
    9   10      twitter      4
    10  11     facebook      1
    
    0 讨论(0)
  • 2020-12-03 21:02

    We could also create a dictionary and map it:

    import pandas as pd
    
    df = pd.DataFrame(dict(id=range(1,5),social=["Facebook","Twitter","Facebook","Google"]))
    
    d = dict((k,v) for v,k in enumerate(df['social'].unique(),1))
    df['groupid'] = df['social'].map(m)
    
    print(df)
    

    Returns

       id    social  groupid
    0   1  Facebook        1
    1   2   Twitter        2
    2   3  Facebook        1
    3   4    Google        3
    

    Or one-line like this:

    df['groupid'] = df['social'].map({k:v for v,k in enumerate(df['social'].unique(),1)})
    

    Timings:

    %timeit df['grpId']=df.groupby('social').ngroup().add(1)
    %timeit df['grpId']=pd.factorize(df['social'])[0]+1
    %timeit df['grpId']=df['social'].astype('category').cat.codes.add(1)
    %timeit df['groupid'] = df['social'].map(dict((k,v) for v,k in enumerate(df['social'].unique(),1)))
    

    Returns

    100 loops, best of 3: 1.5 ms per loop   <- Wen1
    1000 loops, best of 3: 493 µs per loop  <- Wen2
    1000 loops, best of 3: 990 µs per loop  <- Wen3
    1000 loops, best of 3: 802 µs per loop  <- Antonvbr
    
    0 讨论(0)
提交回复
热议问题