parsing a dictionary in a pandas dataframe cell into new row cells (new columns)

前端 未结 2 1659
孤城傲影
孤城傲影 2020-12-17 20:47

I have a Pandas Dataframe that contains one column containing cells containing a dictionary of key:value pairs, like this:

{\"name\":\"Test Thorton\",\"compa         


        
相关标签:
2条回答
  • 2020-12-17 21:24

    consider df

    df = pd.DataFrame([
            ['a', 'b', 'c', 'd', dict(F='y', G='v')],
            ['a', 'b', 'c', 'd', dict(F='y', G='v')],
        ], columns=list('ABCDE'))
    
    df
    
       A  B  C  D                     E
    0  a  b  c  d  {'F': 'y', 'G': 'v'}
    1  a  b  c  d  {'F': 'y', 'G': 'v'}
    

    Option 1
    Use pd.Series.apply, assign new columns in place

    df.E.apply(pd.Series)
    
       F  G
    0  y  v
    1  y  v
    

    Assign it like this

    df[['F', 'G']] = df.E.apply(pd.Series)
    df.drop('E', axis=1)
    
       A  B  C  D  F  G
    0  a  b  c  d  y  v
    1  a  b  c  d  y  v
    

    Option 2
    Pipeline the whole thing using the pd.DataFrame.assign method

    df.drop('E', 1).assign(**pd.DataFrame(df.E.values.tolist()))
    
       A  B  C  D  F  G
    0  a  b  c  d  y  v
    1  a  b  c  d  y  v
    
    0 讨论(0)
  • 2020-12-17 21:34

    I think you can use concat:

    df = pd.DataFrame({1:['a','h'],2:['b','h'], 5:[{6:'y', 7:'v'},{6:'u', 7:'t'}] })
    
    print (df)
       1  2                 5
    0  a  b  {6: 'y', 7: 'v'}
    1  h  h  {6: 'u', 7: 't'}
    
    print (df.loc[:,5].values.tolist())
    [{6: 'y', 7: 'v'}, {6: 'u', 7: 't'}]
    
    df1 = pd.DataFrame(df.loc[:,5].values.tolist())
    print (df1)
       6  7
    0  y  v
    1  u  t
    
    print (pd.concat([df, df1], axis=1))
       1  2                 5  6  7
    0  a  b  {6: 'y', 7: 'v'}  y  v
    1  h  h  {6: 'u', 7: 't'}  u  t
    

    Timings (len(df)=2k):

    In [2]: %timeit (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))
    100 loops, best of 3: 2.99 ms per loop
    
    In [3]: %timeit (pir(df))
    1 loop, best of 3: 625 ms per loop
    
    df = pd.concat([df]*1000).reset_index(drop=True)
    
    print (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))
    
    
    def pir(df):
        df[['F', 'G']] = df[5].apply(pd.Series)
        df.drop(5, axis=1)
        return df
    
    print (pir(df))    
    
    0 讨论(0)
提交回复
热议问题