Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error

前端 未结 2 2060
有刺的猬
有刺的猬 2020-12-01 11:09

There is a good number of questions about this error, but after looking around I\'m still not able to find/wrap my mind around a solution yet. I\'m trying to pivot a data fr

相关标签:
2条回答
  • 2020-12-01 11:48

    There are several ways.

    1

    df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()
    df1.columns=df1.columns.tolist()
    print (df1)
    

    2

    df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()
    df1.columns=df1.columns.tolist()
    print (df1)
    

    3

    df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()
    df1.columns=df1.columns.tolist()
    print (df1)
    

    4

    df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')
    df1.columns = df1.columns.droplevel()
    df1 = df1.reset_index()
    df1.columns=df1.columns.tolist()
    print (df1)
    

    Same ans.

        id  contact_id  Network_Name       City State Trip_End_Location
    0   16      137519          2206       None    Ca              None
    1   17      137520          2206       None    Ca              None
    2   18      137521          2206       None    Ca              None
    3   19      137522          2206       None    Ca              None
    4   20      137523          2208  Lancaster  None              None
    5   21      137524          2208  Lancaster  None              None
    6   22      137525          2208  Lancaster  None              None
    7   23      137526          2208  Lancaster  None              None
    8   24      137527          2208       None  None              Home
    9   25      137528          2208       None  None              Home
    10  26      137529          2208       None  None              Home
    11  27      137530          2208       None  None              Home
    
    0 讨论(0)
  • 2020-12-01 12:00

    The default aggfunc in pivot_table is np.sum and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:

    pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
                                         columns='question', 
                                         values='response_answer',
                                         aggfunc=lambda x: ' '.join(x))
    

    This explicitly sets one row per id, contact_id pair and pivots the set of response_answer values on question. The aggfunc just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table might vary depending on your pandas version.

    Here's a quick example:

    In [24]: import pandas as pd
    
    In [25]: import random
    
    In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})
    
    In [27]: df.head()
    Out[27]:
         id question response
    0  3100        1      116
    1  4500        2      113
    2  5000        1      120
    3  3900        2      103
    4  4300        0      117
    
    In [28]: df.info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 100 entries, 0 to 99
    Data columns (total 3 columns):
    id          100 non-null int64
    question    100 non-null object
    response    100 non-null object
    dtypes: int64(1), object(2)
    memory usage: 3.1+ KB
    
    In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
    Out[29]:
    question        0        1    2        3
    id
    1000      110 120      NaN  100      NaN
    1100          NaN  106 108  104      NaN
    1200      104 113      119  NaN      101
    1300          102      NaN  116  108 120
    1400          NaN      NaN  116      NaN
    
    0 讨论(0)
提交回复
热议问题