How does one insert statistical annotations (stars or p-values) into matplotlib / seaborn plots?

后端 未结 2 1654
北海茫月
北海茫月 2020-12-07 12:10

This seems like a trivial question, but I\'ve been searching for a while and can\'t seem to find an answer. It also seems like something that should be a standard part of th

相关标签:
2条回答
  • 2020-12-07 12:48

    Here how to add statistical annotation to a Seaborn box plot:

    import seaborn as sns, matplotlib.pyplot as plt
    
    tips = sns.load_dataset("tips")
    sns.boxplot(x="day", y="total_bill", data=tips, palette="PRGn")
    
    # statistical annotation
    x1, x2 = 2, 3   # columns 'Sat' and 'Sun' (first column: 0, see plt.xticks())
    y, h, col = tips['total_bill'].max() + 2, 2, 'k'
    plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
    plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)
    
    plt.show()
    

    And here the result:

    0 讨论(0)
  • 2020-12-07 12:56

    One may also be interested in adding several annotations to different pairs of boxes. In such a case, it might be useful to handle the placement of the different lines and texts in the y-axis automatically. I and other contributors wrote a small function to handle these cases (see Github repo), which correctly stacks the lines one on top of each other without overlapping. Annotations can be either inside or outside the plot, and several statistical tests are implemented: Mann-Whitney and t-test (independent and paired). Here is one minimal example.

    import matplotlib.pyplot as plt
    import seaborn as sns
    from statannot import add_stat_annotation
    
    sns.set(style="whitegrid")
    df = sns.load_dataset("tips")
    
    x = "day"
    y = "total_bill"
    order = ['Sun', 'Thur', 'Fri', 'Sat']
    ax = sns.boxplot(data=df, x=x, y=y, order=order)
    add_stat_annotation(ax, data=df, x=x, y=y, order=order,
                        box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],
                        test='Mann-Whitney', text_format='star', loc='outside', verbose=2)
    

    x = "day"
    y = "total_bill"
    hue = "smoker"
    ax = sns.boxplot(data=df, x=x, y=y, hue=hue)
    add_stat_annotation(ax, data=df, x=x, y=y, hue=hue,
                        box_pairs=[(("Thur", "No"), ("Fri", "No")),
                                     (("Sat", "Yes"), ("Sat", "No")),
                                     (("Sun", "No"), ("Thur", "Yes"))
                                    ],
                        test='t-test_ind', text_format='full', loc='inside', verbose=2)
    plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
    

    0 讨论(0)
提交回复
热议问题