Pandas pivot table ValueError: Index contains duplicate entries, cannot reshape

后端 未结 2 928
死守一世寂寞
死守一世寂寞 2021-01-03 03:37

I have a dataframe as shown below (top 3 rows):

Sample_Name Sample_ID   Sample_Type IS  Component_Name  IS_Name Component_Group_Name    Outlier_Reasons Actua         


        
相关标签:
2条回答
  • 2021-01-03 04:08

    You should be able to accomplish what you are looking to do by using the the pandas.pivot_table() functionality as documented here.

    With your dataframe stored as df use the following code:

    import pandas as pd
    df = pd.read_table('table_from_which_to_read')
    
    new_df = pd.pivot_table(df,index=['Simple Name'], columns = 'Component_Name', values = "Calculated_Concentration")
    

    If you want something other than the mean of the concentration value, you will need to change the aggfunc parameter.

    EDIT

    Since you don't want to aggregate over the values, you can reshape the data by using the set_index function on your DataFrame with documentation found here.

    import pandas as pd
    df = pd.DataFrame({'NonUniqueLabel':['Item1','Item1','Item1','Item2'],
         'SemiUniqueValue':['X','Y','Z','X'], 'Value':[1.0,100,5,None])
    
    new_df = df.set_index(['NonUniqueLabel','SemiUniqueLabel'])
    

    The resulting table should look like what you expect the results to be and will have a multi-index.

    0 讨论(0)
  • 2021-01-03 04:24

    You can use groupby() and unstack() to get around the error you're seeing with pivot().

    Here's some example data, with a few edge cases added, and some column values removed or substituted for MCVE:

    # df
          Sample_Name  Sample_ID     IS Component_Name Calculated_Concentration Outlier_Reasons
    Index                                                                    
    1             foo        NaN   True              x                  NaN              NaN  
    1             foo        NaN   True              y                  NaN              NaN 
    2             foo        NaN   False             z            125.92766              NaN 
    2             bar        NaN   False             x                 1.00              NaN  
    2             bar        NaN   False             y                 2.00              NaN  
    2             bar        NaN   False             z                  NaN              NaN  
    
    (df.groupby(['Sample_Name','Component_Name'])
       .Calculated_Concentration
       .first()
       .unstack()
    )
    

    Output:

    Component_Name    x   y          z
    Sample_Name                       
    bar             1.0 2.0        NaN
    foo             NaN NaN  125.92766
    
    0 讨论(0)
提交回复
热议问题