问题
I have data saved in a excel. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. i have a column called category in my dataframe.It has a dictionary (or list?) of values within it. The DataFrame looks like this:
[1] df
ID CATEGORY
1 {60: 'SHOES'}
2 {46: 'HARDWARE'}
3 {60: 'SHOES'}
4 {219: 'GOVERNMENT OFFICE'}
5 {87: 'ARCADES',60: 'SHOES'}
I need to split this column into separate columns so that the DataFrame looks like this:
[2] df2
CATEGORY_ID CATEGORY_NAME
60 SHOES
46 HARDWARE
219 GOVERNMENT OFFICE
87 ARCADES
I need to add a new column in my dataframe
[3] df
ID CATEGORY_id
1 60
2 46
3 60
4 219
5 87,60
Anyone please help me on this..
回答1:
I think you need:
ast
for convertstring
s todictionaries
- reshape by stack
- convert
index
to column by reset_index - remove duplicates by drop_duplicates
import ast
df = (pd.DataFrame(df['CATEGORY'].apply(ast.literal_eval).values.tolist())
.stack()
.reset_index(level=0, drop=True)
.reset_index()
.drop_duplicates()
.rename(columns={'index':'CATEGORY_ID', 0:'CATEGORY_NAME'}))
print (df)
CATEGORY_ID CATEGORY_NAME
0 60 SHOES
1 46 HARDWARE
3 219 GOVERNMENT OFFICE
5 87 ARCADES
EDIT: Solution is a bit simplify, for joining dupplicates CATEGORY_ID
use groupby
with join
:
import ast
df = (pd.DataFrame(df['CATEGORY'].apply(ast.literal_eval).values.tolist(), index=df['ID'])
.stack()
.reset_index()
.rename(columns={'level_1':'CATEGORY_ID', 0:'CATEGORY_NAME'})
)
print (df)
ID CATEGORY_ID CATEGORY_NAME
0 1 60 SHOES
1 2 46 HARDWARE
2 3 60 SHOES
3 4 219 GOVERNMENT OFFICE
4 5 60 SHOES
5 5 87 ARCADES
df1 = df.groupby('ID')['CATEGORY_ID'].apply(lambda x: ', '.join(x.astype(str))).reset_index()
print (df1)
ID CATEGORY_ID
0 1 60
1 2 46
2 3 60
3 4 219
4 5 60, 87
来源:https://stackoverflow.com/questions/49335168/splitting-dictionary-list-inside-a-pandas-column-and-convert-as-new-dataframe