Making pyplot.hist() first and last bins include outliers

前端 未结 2 2192
余生分开走
余生分开走 2021-02-12 14:30

pyplot.hist() documentation specifies that when setting a range for a histogram \"lower and upper outliers are ignored\".

Is it possible to make th

2条回答
  •  情歌与酒
    2021-02-12 15:05

    I was also struggling with this, and didn't want to use .clip() because it could be misleading, so I wrote a little function (borrowing heavily from this) to indicate that the upper and lower bins contained outliers:

    def outlier_aware_hist(data, lower=None, upper=None):
        if not lower or lower < data.min():
            lower = data.min()
            lower_outliers = False
        else:
            lower_outliers = True
    
        if not upper or upper > data.max():
            upper = data.max()
            upper_outliers = False
        else:
            upper_outliers = True
    
        n, bins, patches = plt.hist(data, range=(lower, upper), bins='auto')
    
        if lower_outliers:
            n_lower_outliers = (data < lower).sum()
            patches[0].set_height(patches[0].get_height() + n_lower_outliers)
            patches[0].set_facecolor('c')
            patches[0].set_label('Lower outliers: ({:.2f}, {:.2f})'.format(data.min(), lower))
    
        if upper_outliers:
            n_upper_outliers = (data > upper).sum()
            patches[-1].set_height(patches[-1].get_height() + n_upper_outliers)
            patches[-1].set_facecolor('m')
            patches[-1].set_label('Upper outliers: ({:.2f}, {:.2f})'.format(upper, data.max()))
    
        if lower_outliers or upper_outliers:
            plt.legend()
    

    You can also combine it with an automatic outlier detector (borrowed from here) like so:

    def mad(data):
        median = np.median(data)
        diff = np.abs(data - median)
        mad = np.median(diff)
        return mad
    
    def calculate_bounds(data, z_thresh=3.5):
        MAD = mad(data)
        median = np.median(data)
        const = z_thresh * MAD / 0.6745
        return (median - const, median + const)
    
    outlier_aware_hist(data, *calculate_bounds(data))
    

提交回复
热议问题