Python Pandas: How to split a sorted dictionary in a column of a dataframe

后端 未结 2 1799
[愿得一人]
[愿得一人] 2021-01-15 01:45

I have a dataFrame like this:

id  asn      orgs
0   3320    {\'Deutsche Telekom AG\': 2288}
1   47886   {\'Joyent\': 16, \'Equinix (Netherlands) B.V.\': 7}
2         


        
相关标签:
2条回答
  • 2021-01-15 01:51

    Another approach define a function that just calls min on the dict and return a Series so you can assign to multiple columns (function body taken from @Alex Martelli's answer):

    In [17]:
    
    def func(x):
        k = min(x, key=x.get)
        return pd.Series([k, x[k]])
    df[['orgs', 'value']] = df['orgs'].apply(func)
    df
    
    Out[17]:
         asn  id                        orgs  value
    0   3320   0         Deutsche Telekom AG   2288
    1  47886   1  Equinix (Netherlands) B.V.      7
    2  47601   2             fusion services   1024
    3  33438   3     Highwinds Network Group    893
    

    EDIT

    If your data has empty dicss, then you can just test the len:

    In [34]:
    
    df = pd.DataFrame({'id':[0,1,2,3,4],
                       'asn':[3320,47886,47601,33438,56],
                       'orgs':[{'Deutsche Telekom AG': 2288},
                               {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7},
                               {'fusion services': 1024, 'GCE Global Maritime':16859},
                               {'Highwinds Network Group': 893},{}]})
    df
    Out[34]:
         asn  id                                               orgs
    0   3320   0                      {'Deutsche Telekom AG': 2288}
    1  47886   1    {'Equinix (Netherlands) B.V.': 7, 'Joyent': 16}
    2  47601   2  {'GCE Global Maritime': 16859, 'fusion service...
    3  33438   3                   {'Highwinds Network Group': 893}
    4     56   4                                                 {}
    In [36]:
    
    def func(x):
        if len(x) > 0:
            k = min(x, key=x.get)
            return pd.Series([k, x[k]])
        return pd.Series([np.NaN, np.NaN])
    
    df[['orgs', 'value']] = df['orgs'].apply(func)
    df
    
    Out[36]:
         asn  id                        orgs  value
    0   3320   0         Deutsche Telekom AG   2288
    1  47886   1  Equinix (Netherlands) B.V.      7
    2  47601   2             fusion services   1024
    3  33438   3     Highwinds Network Group    893
    4     56   4                         NaN    NaN
    
    0 讨论(0)
  • 2021-01-15 02:09

    This should work:

    In [1]: import pandas as pd  
    In [2]: import operator
    In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3],
       ...:                      'asn' : [3320, 47886, 47601, 33438],
       ...:                      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {'Highwinds Network Group': 893}]
       ...:                    })
    
    In [4]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0]))
    
    In [5]: df
    Out[5]:
         asn  id                     orgs  value
    0   3320   0      Deutsche Telekom AG   2288
    1  47886   1                   Joyent     16
    2  47601   2      GCE Global Maritime  16859
    3  33438   3  Highwinds Network Group    893
    

    I used zip(* <first element of sorted dict items>) and assigned them to df.orgs and df.value.

    For empty dictionaries:

    In [3]: df = pd.DataFrame({ 'id' : [0,1,2,3],
       ...:                      'asn' : [3320, 47886, 47601, 33438],
       ...:                      'orgs' : [{'Deutsche Telekom AG': 2288}, {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}, {'fusion services': 1024, 'GCE Global Maritime':16859}, {}]
       ...:                    })
    In [4]: df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('',''))
    Out[4]:
    0     (Deutsche Telekom AG, 2288)
    1                    (Joyent, 16)
    2    (GCE Global Maritime, 16859)
    3                            (, )
    Name: orgs, dtype: object
    
    In [5]: df.orgs, df['value'] = zip(*df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True)[0] if len(x) else ('','')))
    
    In [6]: df
    Out[6]:
         asn  id                 orgs  value
    0   3320   0  Deutsche Telekom AG   2288
    1  47886   1               Joyent     16
    2  47601   2  GCE Global Maritime  16859
    3  33438   3
    
    0 讨论(0)
提交回复
热议问题