Best way to efficiently find high density regions

前端 未结 3 1134
北荒
北荒 2021-02-02 11:59

Over the course of my coding, I have come across a problem as follows: Find the a region of fixed size in a 2D space that has the highest density of particles. The particles can

3条回答
  •  一向
    一向 (楼主)
    2021-02-02 12:20

    Algorithm 1

    Create a 500x500 2D array, where each cell contains the count of the number of particles in that cell. Then convolve that array with a 50x50 kernel, the resulting array will have the count of particles in a 50x50 region in each cell. Then find the cell with the largest value.

    If you are using a 50x50 box as a region, the kernel can be decomposed into two separate convolutions, one for each axis. The resulting algorithm is O(n^2) space and time, where n is the width and height of the 2D space you are searching.

    As a reminder, a one-dimensional convolution with a boxcar function can be completed in O(n) time and space and it can be done in place. Let x(t) be the input for t=1..n, and let y(t) be the output. Define x(t)=0 and y(t)=0 for t<1 and t>n. Define the kernel f(t) to be 1 for 0..d-1 and 0 elsewhere. The definition for convolution gives us the following formula:

    y(t) = sum i x(t-i) * f(i) = sum i=0..d-1 x(t-i)

    This looks like it takes time O(n*d), but we can rewrite it as a recurrence:

    y(t) = y(t-1) + x(t) - x(t-d)

    This shows that the one-dimensional convolution is O(n), independent of d. To perform the two-dimensional convolution, you simply perform the one-dimensional convolution for each axis. This works because the boxcar kernel can be decomposed: in general, most kernels cannot be decomposed. The Gaussian kernel is another kernel that can be decomposed, which is why Gaussian blur in an image editing program is so fast.

    For the kind of numbers you specify, this will be extremely fast. 500x500 is an extremely small data set, and your computer can check 202,500 regions in a few milliseconds at most. You will have to ask yourself whether it is worth the extra hours, days, or weeks of time it will take you to optimize further.

    This is the same as justhalf's solution, except due to the decomposed convolution, the region size does not affect the algorithm's speed.

    Algorithm 2

    Assume there is at least one point. Without loss of generality, consider the 2D space to be the entire plane. Let d be the width and height of the region. Let N be the number of points.

    Lemma: There exists a region of maximum density which has a point on its left edge.

    Proof: Let R be a region of maximum density. Let R' be the same region, translated right by the distance between the left edge of R and the leftmost point in R. All points in R must also lie in R', therefore R' is also a region of maximum density.

    The algorithm

    1. Insert all points into a K-D tree. This can be done in O(N log2 N) time.

    2. For each point, consider the region of width d and height 2d where the point is centered on the left edge of the region. Call this region R.

    3. Query the K-D tree for the points in region R. Call this set S. This can be done in O(N1/2+|S|) time.

    4. Find the d x d subregion of R containing the largest number of points in S. This can be done in O(|S| log |S|) time by sorting S by y-coordinate and then performing a linear scan.

    The resulting algorithm has a time of O(N3/2 + N |S| log |S|).

    Comparison

    Algorithm #1 is superior to algorithm #2 when the density is high. Algorithm #2 is only superior when the density of particles is very low, and the density at which algorithm #2 is superior decreases as the total board size increases.

    Note that the continuous case can be considered to have zero density, at which point only algorithm #2 works.

提交回复
热议问题