stack bar plot in matplotlib and add label to each section

后端 未结 2 925
眼角桃花
眼角桃花 2020-11-27 05:59

I am trying to replicate the following image in matplotlib and it seems barh is my only option. Though it appears that you can\'t stack barh graphs

相关标签:
2条回答
  • 2020-11-27 06:16
    • The easies way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame
      • This will plot, and annotate correctly, even when all categories ('People'), don't have all segments (e.g. some value is 0 or NaN)
    • Once the data is in the dataframe:
      1. It's easier to manipulate and analyze
      2. It can be plotted with the matplotlib engine, using:
        • pandas.DataFrame.plot.barh
          • label_text = f'{width}' for annotations
        • pandas.DataFrame.plot.bar
          • label_text = f'{height}' for annotations
          • SO: Vertical Stacked Bar Chart with Centered Labels
    • These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.
    • Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
      • Each .Rectangle has methods for extracting the various values that define the rectangle.
      • Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
    • The labels are made using an f-string, label_text = f'{width:.2f}%', so any additional text can be added as needed.

    Create a DataFrame

    import pandas as pd
    import numpy as np
    
    # create sample data as shown in the OP
    np.random.seed(365)
    people = ('A','B','C','D','E','F','G','H')
    bottomdata = 3 + 10 * np.random.rand(len(people))
    topdata = 3 + 10 * np.random.rand(len(people))
    
    # create the dataframe
    df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)
    
    # display(df)
       Female   Male
    A   12.41   7.42
    B    9.42   4.10
    C    9.85   7.38
    D    8.89  10.53
    E    8.44   5.92
    F    6.68  11.86
    G   10.67  12.97
    H    6.05   7.87
    

    Plot and Annotate

    • Plotting the bar, is 1 line, the remainder is annotating the rectangles
    # plot the dataframe with 1 line
    ax = df.plot.barh(stacked=True, figsize=(8, 6))
    
    # .patches is everything inside of the chart
    for rect in ax.patches:
        # Find where everything is located
        height = rect.get_height()
        width = rect.get_width()
        x = rect.get_x()
        y = rect.get_y()
        
        # The height of the bar is the data value and can be used as the label
        label_text = f'{width:.2f}%'  # f'{width:.2f}' to format decimal values
        
        # ax.text(x, y, text)
        label_x = x + width / 2
        label_y = y + height / 2
        
        # only plot labels greater than given width
        if width > 0:
            ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
    
    # move the legend
    ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
    
    # add labels
    ax.set_ylabel("People", fontsize=18)
    ax.set_xlabel("Percent", fontsize=18)
    plt.show()
    

    Example with Missing Segment

    # set one of the dataframe values to 0
    df.iloc[4, 1] = 0
    
    • Note the annotations are all in the correct location from df.

    0 讨论(0)
  • 2020-11-27 06:24

    Edit 2: for more heterogeneous data. (I've left the above method since I find it more usual to work with the same number of records per series)

    Answering the two parts of the question:

    a) barh returns a container of handles to all the patches that it drew. You can use the coordinates of the patches to aid the text positions.

    b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib), you can stack bar graphs horizontally by setting the 'left' input.

    and additionally c) handling data that is less uniform in shape.

    Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.

    import numpy as np
    import matplotlib.pyplot as plt
    
    # some labels for each row
    people = ('A','B','C','D','E','F','G','H')
    r = len(people)
    
    # how many data points overall (average of 3 per person)
    n = r * 3
    
    # which person does each segment belong to?
    rows = np.random.randint(0, r, (n,))
    # how wide is the segment?
    widths = np.random.randint(3,12, n,)
    # what label to put on the segment (xrange in py2.7, range for py3)
    labels = range(n)
    colors ='rgbwmc'
    
    patch_handles = []
    
    fig = plt.figure(figsize=(10,8))
    ax = fig.add_subplot(111)
    
    
    
    left = np.zeros(r,)
    row_counts = np.zeros(r,)
    
    for (r, w, l) in zip(rows, widths, labels):
        print r, w, l
        patch_handles.append(ax.barh(r, w, align='center', left=left[r],
            color=colors[int(row_counts[r]) % len(colors)]))
        left[r] += w
        row_counts[r] += 1
        # we know there is only one patch but could enumerate if expanded
        patch = patch_handles[-1][0] 
        bl = patch.get_xy()
        x = 0.5*patch.get_width() + bl[0]
        y = 0.5*patch.get_height() + bl[1]
        ax.text(x, y, "%d%%" % (l), ha='center',va='center')
      
    y_pos = np.arange(8)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(people)
    ax.set_xlabel('Distance')
    
    plt.show()
    

    Which produces a graph like this heterogeneous hbars, with a different number of segments present in each series.

    Note that this is not particularly efficient since each segment used an individual call to ax.barh. There may be more efficient methods (e.g. by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.


    Edit: updated to answer both parts of the question.

    import numpy as np
    import matplotlib.pyplot as plt
    
    people = ('A','B','C','D','E','F','G','H')
    segments = 4
    
    # generate some multi-dimensional data & arbitrary labels
    data = 3 + 10* np.random.rand(segments, len(people))
    percentages = (np.random.randint(5,20, (len(people), segments)))
    y_pos = np.arange(len(people))
    
    fig = plt.figure(figsize=(10,8))
    ax = fig.add_subplot(111)
    
    colors ='rgbwmc'
    patch_handles = []
    left = np.zeros(len(people)) # left alignment of data starts at zero
    for i, d in enumerate(data):
        patch_handles.append(ax.barh(y_pos, d, 
          color=colors[i%len(colors)], align='center', 
          left=left))
        # accumulate the left-hand offsets
        left += d
        
    # go through all of the bar segments and annotate
    for j in range(len(patch_handles)):
        for i, patch in enumerate(patch_handles[j].get_children()):
            bl = patch.get_xy()
            x = 0.5*patch.get_width() + bl[0]
            y = 0.5*patch.get_height() + bl[1]
            ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')
    
    ax.set_yticks(y_pos)
    ax.set_yticklabels(people)
    ax.set_xlabel('Distance')
    
    plt.show()
    

    You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):

    example output

    See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.


    0 讨论(0)
提交回复
热议问题