Circular / polar histogram in python

后端 未结 2 1883
被撕碎了的回忆
被撕碎了的回忆 2021-02-02 07:34

I have periodic data and the distribution for it is best visualised around a circle. Now the question is how can I do this visualisation using matplotlib? If not, c

2条回答
  •  情歌与酒
    2021-02-02 08:24

    Quick answer

    Use the function circular_hist() I wrote below.

    By default this function plots frequency proportional to area, not radius (the reasoning behind this decision is offered below under "longer form answer").

    def circular_hist(ax, x, bins=16, density=True, offset=0, gaps=True):
        """
        Produce a circular histogram of angles on ax.
    
        Parameters
        ----------
        ax : matplotlib.axes._subplots.PolarAxesSubplot
            axis instance created with subplot_kw=dict(projection='polar').
    
        x : array
            Angles to plot, expected in units of radians.
    
        bins : int, optional
            Defines the number of equal-width bins in the range. The default is 16.
    
        density : bool, optional
            If True plot frequency proportional to area. If False plot frequency
            proportional to radius. The default is True.
    
        offset : float, optional
            Sets the offset for the location of the 0 direction in units of
            radians. The default is 0.
    
        gaps : bool, optional
            Whether to allow gaps between bins. When gaps = False the bins are
            forced to partition the entire [-pi, pi] range. The default is True.
    
        Returns
        -------
        n : array or list of arrays
            The number of values in each bin.
    
        bins : array
            The edges of the bins.
    
        patches : `.BarContainer` or list of a single `.Polygon`
            Container of individual artists used to create the histogram
            or list of such containers if there are multiple input datasets.
        """
        # Wrap angles to [-pi, pi)
        x = (x+np.pi) % (2*np.pi) - np.pi
    
        # Force bins to partition entire circle
        if not gaps:
            bins = np.linspace(-np.pi, np.pi, num=bins+1)
    
        # Bin data and record counts
        n, bins = np.histogram(x, bins=bins)
    
        # Compute width of each bin
        widths = np.diff(bins)
    
        # By default plot frequency proportional to area
        if density:
            # Area to assign each bin
            area = n / x.size
            # Calculate corresponding bin radius
            radius = (area/np.pi) ** .5
        # Otherwise plot frequency proportional to radius
        else:
            radius = n
    
        # Plot data on ax
        patches = ax.bar(bins[:-1], radius, zorder=1, align='edge', width=widths,
                         edgecolor='C0', fill=False, linewidth=1)
    
        # Set the direction of the zero angle
        ax.set_theta_offset(offset)
    
        # Remove ylabels for area plots (they are mostly obstructive)
        if density:
            ax.set_yticks([])
    
        return n, bins, patches
    

    Example usage:

    import matplotlib.pyplot as plt
    import numpy as np
    
    angles0 = np.random.normal(loc=0, scale=1, size=10000)
    angles1 = np.random.uniform(0, 2*np.pi, size=1000)
    
    # Construct figure and axis to plot on
    fig, ax = plt.subplots(1, 2, subplot_kw=dict(projection='polar'))
    
    # Visualise by area of bins
    circular_hist(ax[0], angles0)
    # Visualise by radius of bins
    circular_hist(ax[1], angles1, offset=np.pi/2, density=False)
    

    Longer form answer

    I'd always recommend caution when using circular histograms as they can easily mislead readers.

    In particular, I'd advise staying away from circular histograms where frequency and radius are plotted proportionally. I recommend this because the mind is greatly affected by the area of the bins, not just by their radial extent. This is similar to how we're used to interpreting pie charts: by area.

    So, instead of using the radial extent of a bin to visualise the number of data points it contains, I'd recommend visualising the number of points by area.

    The problem

    Consider the consequences of doubling the number of data points in a given histogram bin. In a circular histogram where frequency and radius are proportional, the radius of this bin will increase by a factor of 2 (as the number of points has doubled). However, the area of this bin will have been increased by a factor of 4! This is because the area of the bin is proportional to the radius squared.

    If this doesn't sound like too much of a problem yet, let's see it graphically:

    Both of the above plots visualise the same data points.

    In the lefthand plot it's easy to see that there are twice as many data points in the (0, pi/4) bin than there are in the (-pi/4, 0) bin.

    However, take a look at the right hand plot (frequency proportional to radius). At first glance your mind is greatly affected by the area of the bins. You'd be forgiven for thinking there are more than twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin. However, you'd have been misled. It is only on closer inspection of the graphic (and of the radial axis) that you realise there are exactly twice as many data points in the (0, pi/4) bin than in the (-pi/4, 0) bin. Not more than twice as many, as the graph may have originally suggested.

    The above graphics can be recreated with the following code:

    import numpy as np
    import matplotlib.pyplot as plt
    plt.style.use('seaborn')
    
    # Generate data with twice as many points in (0, np.pi/4) than (-np.pi/4, 0)
    angles = np.hstack([np.random.uniform(0, np.pi/4, size=100),
                        np.random.uniform(-np.pi/4, 0, size=50)])
    
    bins = 2
    
    fig = plt.figure()
    ax = fig.add_subplot(1, 2, 1)
    polar_ax = fig.add_subplot(1, 2, 2, projection="polar")
    
    # Plot "standard" histogram
    ax.hist(angles, bins=bins)
    # Fiddle with labels and limits
    ax.set_xlim([-np.pi/4, np.pi/4])
    ax.set_xticks([-np.pi/4, 0, np.pi/4])
    ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])
    
    # bin data for our polar histogram
    count, bin = np.histogram(angles, bins=bins)
    # Plot polar histogram
    polar_ax.bar(bin[:-1], count, align='edge', color='C0')
    
    # Fiddle with labels and limits
    polar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])
    polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])
    polar_ax.set_rlabel_position(90)
    

    A solution

    Since we are so greatly affected by the area of the bins in circular histograms, I find it more effective to ensure that the area of each bin is proportional to the number of observations in it, instead of the radius. This is similar to how we are used to interpreting pie charts, where area is the quantity of interest.

    Let's use the dataset we used in the previous example to reproduce the graphics based on area, instead of radius:

    I believe readers have less chance of being misled at first glance of this graphic.

    However, when plotting a circular histogram with area proportional to radius we have the disadvantage that you'd never have known that there are exactly twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin just by eyeballing the areas. Although, you could counter this by annotating each bin with its corresponding density. I think this disadvantage is preferable to misleading a reader.

    Of course I'd ensure that an informative caption was placed alongside this figure to explain that here we visualise frequency with area, not radius.

    The above plots were created as:

    fig = plt.figure()
    ax = fig.add_subplot(1, 2, 1)
    polar_ax = fig.add_subplot(1, 2, 2, projection="polar")
    
    # Plot "standard" histogram
    ax.hist(angles, bins=bins, density=True)
    # Fiddle with labels and limits
    ax.set_xlim([-np.pi/4, np.pi/4])
    ax.set_xticks([-np.pi/4, 0, np.pi/4])
    ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])
    
    # bin data for our polar histogram
    counts, bin = np.histogram(angles, bins=bins)
    # Normalise counts to compute areas
    area = counts / angles.size
    # Compute corresponding radii from areas
    radius = (area / np.pi)**.5
    
    polar_ax.bar(bin[:-1], radius, align='edge', color='C0')
    
    # Label angles according to convention
    polar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])
    polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])
    

提交回复
热议问题