Python pandas dataframe pivot only works with pivot_table() but not with set_index() and unstack()

问题

I am trying to pivot following type of sample data in Pandas dataframe in Python. I came across couple of other stackoverflow answers that discussed how to do the pivot: pivot_table No numeric types to aggregate

However, when I use pivot_table(), I am able to pivot the data. But when I use set_index() and unstack(), I get following error:

AttributeError: 'NoneType' object has no attribute 'unstack'

Sample Data:

id  responseTime    label   answers
ABC 2018-06-24  Category_1  [3]
ABC 2018-06-24  Category_2  [10]
ABC 2018-06-24  Category_3  [10]
DEF 2018-06-25  Category_1  [7]
DEF 2018-06-25  Category_8  [10]
GHI 2018-06-28  Category_3  [7]

Desired Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

This works:

 df=pdDF.pivot_table(index=['items_id','responseTime'], columns='label', values='answers', aggfunc='first')

This doesn't work:

pdDF.set_index(['items_id','responseTime','label'], append=True, inplace=True).unstack('label')

I also used pdDF[pdDF.isnull().any(axis=1)] to make sure I don't have any NULL data in answers column. I also used append=False but same error happened.

From other threads, it seems set_index() and unstack() are more efficient than pivot_table(). I also don't want to use pivot_table() because it requires aggregation function and my answers column doesn't contain numeric data. I didn't want to use default (mean()) so I ended up using first(). Any insights on why one method works and another doesn't?

回答1:

AttributeError: 'NoneType' object has no attribute 'unstack'

When you use inplace = True in set_index it modified the dataframe in place. It doesn't return anything(None). So you can't use unstack on None object.

inplace : boolean, default False

Modify the DataFrame in place (do not create a new object)

Use:

df1 = pdDF.set_index(['items_id','responseTime','label']).unstack('label')    
print(df1)

# Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

来源：https://stackoverflow.com/questions/52230219/python-pandas-dataframe-pivot-only-works-with-pivot-table-but-not-with-set-ind

标签

python

pandas

pivot-table

attributeerror