Tweaking seaborn.boxplot

前端 未结 2 1292
清歌不尽
清歌不尽 2021-01-31 03:23

I would like to compare a set of distributions of scores (score), grouped by some categories (centrality) and colored by some other (model

相关标签:
2条回答
  • 2021-01-31 04:07

    It has been a while since this answer has activity, but I'll answer OP's question regarding the weird looking lower-bounds for any people that need help in the future.

    Once you set your y-axis to logarithmic scale, it becomes impossible to represent y=0, since log(0) tends to -inf.

    Therefore, when the values regarding the lower part of your boxplot are either zero or very close to it the box has that look of seeming to be 'cut in half'.

    Needless to say that it's also impossible to represent negative y values in a logarithmic scale.

    0 讨论(0)
  • 2021-01-31 04:08

    Outlier display

    You should be able to pass any arguments to seaborn.boxplot that you can pass to plt.boxplot (see documentation), so you could adjust the display of the outliers by setting flierprops. Here are some examples of what you can do with your outliers.

    If you don't want to display them, you could do

    seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
                    showfliers=False)
    

    or you could make them light gray like so:

    flierprops = dict(markerfacecolor='0.75', markersize=5,
                  linestyle='none')
    seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
                    flierprops=flierprops)
    

    Order of groups

    You can set the order of the groups manually with hue_order, e.g.

    seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
                    hue_order=["original", "Havel..","etc"])
    

    Scaling of y-axis

    You could just get the minimum and maximum values of all y-values and set y_lim accordingly? Something like this:

    y_values = data["scores"].values
    seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
                    y_lim=(np.min(y_values),np.max(y_values)))
    

    EDIT: This last point doesn't really make sense since the automatic y_lim range will already include all the values, but I'm leaving it just as an example of how to adjust these settings. As mentioned in the comments, log-scaling probably makes more sense.

    0 讨论(0)
提交回复
热议问题