I use matplotlib 1.15.1 and I try to generate scattergrams like this:
The ellipses have fixes s
This approach should test if a point is within an ellipse, given the ellipse's centre, width, height and angle. You find the point's x and y coordinates relative to the ellipse centre, then transform those using the angle to be the coordinates along the major and minor axes. Finally, you find the normalised distance of the point from the cell centre, where a distance of 1 would be on the ellipse, less than 1 is inside, and more than 1 is outside.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
fig,ax = plt.subplots(1)
ax.set_aspect('equal')
# Some test points
x = np.random.rand(500)*0.5+0.7
y = np.random.rand(500)*0.5+0.7
# The ellipse
g_ell_center = (0.8882, 0.8882)
g_ell_width = 0.36401857095483
g_ell_height = 0.16928136341606
angle = 30.
g_ellipse = patches.Ellipse(g_ell_center, g_ell_width, g_ell_height, angle=angle, fill=False, edgecolor='green', linewidth=2)
ax.add_patch(g_ellipse)
cos_angle = np.cos(np.radians(180.-angle))
sin_angle = np.sin(np.radians(180.-angle))
xc = x - g_ell_center[0]
yc = y - g_ell_center[1]
xct = xc * cos_angle - yc * sin_angle
yct = xc * sin_angle + yc * cos_angle
rad_cc = (xct**2/(g_ell_width/2.)**2) + (yct**2/(g_ell_height/2.)**2)
# Set the colors. Black if outside the ellipse, green if inside
colors_array = np.array(['black'] * len(rad_cc))
colors_array[np.where(rad_cc <= 1.)[0]] = 'green'
ax.scatter(x,y,c=colors_array,linewidths=0.3)
plt.show()
Note, this whole script takes 0.6 seconds to run and process 500 points. That includes creating and saving the figure, etc.
The process of setting the colors_array using the np.where
method above takes 0.00007s for 500 points.
Note, in an older implementation shown below, setting the colors_array in a loop took 0.00016 s:
colors_array = []
for r in rad_cc:
if r <= 1.:
# point in ellipse
colors_array.append('green')
else:
# point not in ellipse
colors_array.append('black')
Your current implementation should only be calling contains_point
25,000 to 50,000 times, which isn't a lot. So, I'm guessing that the implementation of contains_point
is targeted toward precision rather than speed.
Since you have a distribution of points where only a small percentage will be in any given ellipse, and therefore most will rarely be anywhere near any given ellipse, you can easily use rectangular coordinates as a short-cut to figure out whether the point is close enough to the ellipse to be worth calling contains_point
.
Compute the left and right x coordinates and top and bottom y coordinates of the ellipse, possibly with a bit of padding to account for rendering differences, then check if the point is within those, such as the following pseudo-code:
if point.x >= ellipse_left and point.x <= ellipse_right and _
point.y >= ellipse_top and point.y <= ellipse_bottom:
if ellipse.contains_point(point, radius=0):
... use the contained point here
This approach eliminates expensive calculations for most of the points, allowing simple comparisons instead to rule out the obvious mismatches, while preserving the accuracy of the computations where the point is close enough that it might be in the ellipse. If e.g. only 1% of your points are anywhere near a given ellipse, this approach will eliminate 99% of your calls to contains_point
and instead replace them with much faster comparisons.