Sorting the order of bars in pandas/matplotlib bar plots

后端 未结 3 659
醉酒成梦
醉酒成梦 2020-12-08 02:45

What is the Pythonic/pandas way of sorting \'levels\' within a column in pandas to give a specific ordering of bars in bar plot.

For example, given:

         


        
相关标签:
3条回答
  • 2020-12-08 03:20

    You'll have to provide a mapping to specify how to order the day names. (If they were stored as proper dates, there would be other ways to do this.)

    Updated:

    Build the key. You could write out a dictionary explicitly or use something clever like this dict comprehension.

    weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
    mapping = {day: i for i, day in enumerate(weekdays)}
    key = df['day'].map(mapping)
    

    And the sorting is simple:

    df.iloc[key.argsort()]
    
    0 讨论(0)
  • 2020-12-08 03:26

    I will provide bellow code to extend Dan's answer to address the "FURTHER GENERALIZATION" section of the OP's question. First, a complete example for the simple case (just one variable) based in Dan's solution:

    import pandas as pd
    
    # Create dataframe 
    df=pd.DataFrame({
        'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
        'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
        'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
    })
    
    
    # Calculate the total amount for each day
    df_grouped = df.groupby(['day']).sum().amount.reset_index()
    
    # Use Dan's trick to order days names in the table created by groupby
    weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
    mapping = {day: i for i, day in enumerate(weekdays)}
    key = df_grouped['day'].map(mapping)    
    df_grouped = df_grouped.iloc[key.argsort()]
    
    # Draw the bar chart
    df_grouped.plot(kind='bar', x='day')
    

    And now, we use the same ordering technique to order the rows of the pivot table (instead of the rows created by groupby).

    import pandas as pd
    
    # Create dataframe 
    df=pd.DataFrame({
        'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
        'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
        'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
    })
    
    # Get the amount for each day AND EACH GROUP
    df_grouped = df.groupby(['group', 'day']).sum().amount.reset_index()
    
    # Create pivot table to get the total amount for each day and each in the proper format to plot multiple series with pandas
    df_pivot = df_grouped.pivot('day','group','amount').reset_index()
    
    # Use Dan's trick to order days names in the table created by PIVOT (not the table created by groupby, in the previous example)
    weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
    mapping = {day: i for i, day in enumerate(weekdays)}
    key = df_pivot['day'].map(mapping)    
    df_pivot = df_pivot.iloc[key.argsort()]
    
    # Draw the bar chart
    df_pivot.plot(kind='bar', x='day')
    

    The result is shown bellow:

    0 讨论(0)
  • 2020-12-08 03:37

    I know this response is late, but a simplistic solution to the two cases presented, without use of a dictionary/mappings would be something like I've posted below.

    Setting 'day' as an index enables you to use .loc to select data in a specific order

    1) For the two separate plots

    df=pd.DataFrame({'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
         'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
         'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]})
    
    order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']`
    df.set_index('day').loc[order].groupby('group').plot(kind='bar')
    

    2) For the pivot example with the dodged plot:

    order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']
    df.pivot('day','group','amount').loc[order].plot(kind='bar')
    

    note that pivot results in day being in the index already so you can use .loc here again.

    Edit: it is best practice to use .loc instead of .ix in these solutions, .ix will be deprecated and can have weird results when column names and indexes are numbers.

    0 讨论(0)
提交回复
热议问题