Splitting dictionary/list inside a Pandas Column and convert as new dataframe

问题

I have data saved in a excel. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. i have a column called category in my dataframe.It has a dictionary (or list?) of values within it. The DataFrame looks like this:

[1] df
ID                                          CATEGORY
1                                       {60: 'SHOES'}
2                                    {46: 'HARDWARE'}
3                                       {60: 'SHOES'}
4                          {219: 'GOVERNMENT OFFICE'}
5                         {87: 'ARCADES',60: 'SHOES'}

I need to split this column into separate columns so that the DataFrame looks like this:

[2] df2
CATEGORY_ID                   CATEGORY_NAME
60                                    SHOES
46                                 HARDWARE
219                       GOVERNMENT OFFICE
87                                  ARCADES

I need to add a new column in my dataframe

[3] df
ID           CATEGORY_id         
1                    60
2                    46
3                    60
4                   219 
5                 87,60

Anyone please help me on this..

回答1:

I think you need:

ast for convert strings to dictionaries
reshape by stack
convert index to column by reset_index
remove duplicates by drop_duplicates

import ast
df = (pd.DataFrame(df['CATEGORY'].apply(ast.literal_eval).values.tolist())
       .stack()
       .reset_index(level=0, drop=True)
       .reset_index()
       .drop_duplicates()
       .rename(columns={'index':'CATEGORY_ID', 0:'CATEGORY_NAME'}))
print (df)
   CATEGORY_ID      CATEGORY_NAME
0           60              SHOES
1           46           HARDWARE
3          219  GOVERNMENT OFFICE
5           87            ARCADES

EDIT: Solution is a bit simplify, for joining dupplicates CATEGORY_ID use groupby with join:

import ast
df = (pd.DataFrame(df['CATEGORY'].apply(ast.literal_eval).values.tolist(), index=df['ID'])
       .stack()
       .reset_index()
       .rename(columns={'level_1':'CATEGORY_ID', 0:'CATEGORY_NAME'})
       )
print (df)
   ID  CATEGORY_ID      CATEGORY_NAME
0   1           60              SHOES
1   2           46           HARDWARE
2   3           60              SHOES
3   4          219  GOVERNMENT OFFICE
4   5           60              SHOES
5   5           87            ARCADES


df1 = df.groupby('ID')['CATEGORY_ID'].apply(lambda x: ', '.join(x.astype(str))).reset_index()
print (df1)
   ID CATEGORY_ID
0   1          60
1   2          46
2   3          60
3   4         219
4   5      60, 87

来源：https://stackoverflow.com/questions/49335168/splitting-dictionary-list-inside-a-pandas-column-and-convert-as-new-dataframe

标签

python

pandas

dictionary

dataframe

data-analysis