Allocate scatter plot into specific bins

前端 未结 2 1799
隐瞒了意图╮
隐瞒了意图╮ 2021-01-23 08:49

I have a scatter plot that gets sorted into 4 Bins. These are separated by two arcs and a line in the middle (see figure belo

2条回答
  •  鱼传尺愫
    2021-01-23 09:52

    Patches have a test for containing points or not: contains_point and even for arrays of points:contains_points

    Just to play with I have a code snippet for you, which you can add between the part where you're adding your patches and the #Sorting the coordinates into bins codeblock.

    It adds two additional (transparent) ellipses for calculating if the arcs would contain points if they were fully closed ellipses. Then your bin calculation is just a boolean combination of tests if a point belongs to the big oval, the left or right ellipsis or has positive or negative x-coordinate.

    ov1 = mpl.patches.Ellipse(ang1, 70, 110, alpha=0)
    ov2 = mpl.patches.Ellipse(ang2, 70, 110, alpha=0)
    ax.add_patch(ov1)
    ax.add_patch(ov2)
    
    for px, py in zip(X, Y):
        in_oval = Oval.contains_point(ax.transData.transform(([px, py])), 0)
        in_left = ov1.contains_point(ax.transData.transform(([px, py])), 0)
        in_right = ov2.contains_point(ax.transData.transform(([px, py])), 0)
        on_left = px < 0
        on_right = px > 0
        if in_oval:
            if in_left:
                n_bin = 1
            elif in_right:
                n_bin = 4
            elif on_left:
                n_bin = 2
            elif on_right:
                n_bin = 3
            else:
                n_bin = -1
        else:
            n_bin = -1
        print('({:>2}/{:>2}) is {}'.format(px, py, 'in Bin ' +str(n_bin) if n_bin>0 else 'outside'))
    

    The output is:

    (24/94) is in Bin 3
    (15/61) is in Bin 3
    (71/76) is in Bin 4
    (72/83) is in Bin 4
    ( 6/69) is in Bin 3
    (13/86) is in Bin 3
    (77/78) is outside
    (52/57) is in Bin 4
    (52/45) is in Bin 4
    (62/94) is in Bin 4
    (46/82) is in Bin 4
    (43/74) is in Bin 4
    (31/56) is in Bin 4
    (35/70) is in Bin 4
    (41/94) is in Bin 4
    

    Note you still should decide how to define bins when points have x-coord=0 - at the moment they're equal to outside, as on_left and on_rightboth do not feel responsible for them...

    PS: Thanks to @ImportanceOfBeingErnest for the hint to the necessary transformation: https://stackoverflow.com/a/49112347/8300135

    Note: for all the following EDITS you'll need to import numpy as np
    EDIT: Function for counting the bin distribution per X, Y array input:

    def bin_counts(X, Y):
        bc = dict()
        E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        L = np.array(X) < 0
        R = np.array(X) > 0
        bc[1] = np.sum(E & E_l)
        bc[2] = np.sum(E & L & ~E_l)
        bc[3] = np.sum(E & R & ~E_r)
        bc[4] = np.sum(E & E_r)
        return bc
    

    Will lead to this result:

    bin_counts(X, Y)
    Out: {1: 0, 2: 0, 3: 4, 4: 10}
    

    EDIT2: many rows in two 2D-arrays for X and Y:

    np.random.seed(42)
    X = np.random.randint(-80, 80, size=(100, 10))
    Y = np.random.randint(0, 120, size=(100, 10))
    

    looping over all the rows:

    for xr, yr in zip(X, Y):
        print(bin_counts(xr, yr))
    

    result:

    {1: 1, 2: 2, 3: 6, 4: 0}
    {1: 1, 2: 0, 3: 4, 4: 2}
    {1: 5, 2: 2, 3: 1, 4: 1}
    ...
    {1: 3, 2: 2, 3: 2, 4: 0}
    {1: 2, 2: 4, 3: 1, 4: 1}
    {1: 1, 2: 1, 3: 6, 4: 2}
    

    EDIT3: for returning not the number of points in each bin, but an array with four arrays containing the x,y-coordinates of the points in each bin, use the following:

    X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]  
    Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]      
    
    def bin_points(X, Y):
        X = np.array(X)
        Y = np.array(Y)
        E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
        L = X < 0
        R = X > 0
        bp1 = np.array([X[E & E_l], Y[E & E_l]]).T
        bp2 = np.array([X[E & L & ~E_l], Y[E & L & ~E_l]]).T
        bp3 = np.array([X[E & R & ~E_r], Y[E & R & ~E_r]]).T
        bp4 = np.array([X[E & E_r], Y[E & E_r]]).T
        return [bp1, bp2, bp3, bp4]
    
    print(bin_points(X, Y))
    [array([], shape=(0, 2), dtype=int32), array([], shape=(0, 2), dtype=int32), array([[24, 94],
           [15, 61],
           [ 6, 69],
           [13, 86]]), array([[71, 76],
           [72, 83],
           [52, 57],
           [52, 45],
           [62, 94],
           [46, 82],
           [43, 74],
           [31, 56],
           [35, 70],
           [41, 94]])]
    

    ...and again, for applying this to the big 2D-arrays, just iterate over them:

    np.random.seed(42)
    X = np.random.randint(-100, 100, size=(100, 10))
    Y = np.random.randint(-40, 140, size=(100, 10))
    
    bincol = ['r', 'g', 'b', 'y', 'k']
    
    for xr, yr in zip(X, Y):
        for i, binned_points in enumerate(bin_points(xr, yr)):
            ax.scatter(*binned_points.T, c=bincol[i], marker='o' if i<4 else 'x')
    

提交回复
热议问题