Best way to efficiently find high density regions

前端 未结 3 1141
北荒
北荒 2021-02-02 11:59

Over the course of my coding, I have come across a problem as follows: Find the a region of fixed size in a 2D space that has the highest density of particles. The particles can

相关标签:
3条回答
  • 2021-02-02 12:20

    Algorithm 1

    Create a 500x500 2D array, where each cell contains the count of the number of particles in that cell. Then convolve that array with a 50x50 kernel, the resulting array will have the count of particles in a 50x50 region in each cell. Then find the cell with the largest value.

    If you are using a 50x50 box as a region, the kernel can be decomposed into two separate convolutions, one for each axis. The resulting algorithm is O(n^2) space and time, where n is the width and height of the 2D space you are searching.

    As a reminder, a one-dimensional convolution with a boxcar function can be completed in O(n) time and space and it can be done in place. Let x(t) be the input for t=1..n, and let y(t) be the output. Define x(t)=0 and y(t)=0 for t<1 and t>n. Define the kernel f(t) to be 1 for 0..d-1 and 0 elsewhere. The definition for convolution gives us the following formula:

    y(t) = sum i x(t-i) * f(i) = sum i=0..d-1 x(t-i)

    This looks like it takes time O(n*d), but we can rewrite it as a recurrence:

    y(t) = y(t-1) + x(t) - x(t-d)

    This shows that the one-dimensional convolution is O(n), independent of d. To perform the two-dimensional convolution, you simply perform the one-dimensional convolution for each axis. This works because the boxcar kernel can be decomposed: in general, most kernels cannot be decomposed. The Gaussian kernel is another kernel that can be decomposed, which is why Gaussian blur in an image editing program is so fast.

    For the kind of numbers you specify, this will be extremely fast. 500x500 is an extremely small data set, and your computer can check 202,500 regions in a few milliseconds at most. You will have to ask yourself whether it is worth the extra hours, days, or weeks of time it will take you to optimize further.

    This is the same as justhalf's solution, except due to the decomposed convolution, the region size does not affect the algorithm's speed.

    Algorithm 2

    Assume there is at least one point. Without loss of generality, consider the 2D space to be the entire plane. Let d be the width and height of the region. Let N be the number of points.

    Lemma: There exists a region of maximum density which has a point on its left edge.

    Proof: Let R be a region of maximum density. Let R' be the same region, translated right by the distance between the left edge of R and the leftmost point in R. All points in R must also lie in R', therefore R' is also a region of maximum density.

    The algorithm

    1. Insert all points into a K-D tree. This can be done in O(N log2 N) time.

    2. For each point, consider the region of width d and height 2d where the point is centered on the left edge of the region. Call this region R.

    3. Query the K-D tree for the points in region R. Call this set S. This can be done in O(N1/2+|S|) time.

    4. Find the d x d subregion of R containing the largest number of points in S. This can be done in O(|S| log |S|) time by sorting S by y-coordinate and then performing a linear scan.

    The resulting algorithm has a time of O(N3/2 + N |S| log |S|).

    Comparison

    Algorithm #1 is superior to algorithm #2 when the density is high. Algorithm #2 is only superior when the density of particles is very low, and the density at which algorithm #2 is superior decreases as the total board size increases.

    Note that the continuous case can be considered to have zero density, at which point only algorithm #2 works.

    0 讨论(0)
  • 2021-02-02 12:20

    Divide the region into 1000x1000 and count the number of particles in every (overlapping) 2x2. You can partition them simply by normalizing 0..1, scaling 0..999, and casting to integer. Counts can easily be stored as a 2D array of integers (ushort, uint, or ulong... mmmm tea). This is equivalent to simple 2D spatial partitioning used in broad-phase collision detection.

    0 讨论(0)
  • 2021-02-02 12:40

    I don't know what brute force method you use, but the most brute force way would be O(n^2 d^2), by iterating over every region in O(n^2) time, then count the number of particles in that region in O(d^2) time where d is the size of your region.

    This problem is exactly the same as this problem: Rat Attack, since the region area is fixed, and so the density is the same as the count, for which the solution is O(n^2 + k*d^2), where

    1. n is the size of the whole area (length of the side)
    2. k is the number of particles
    3. d is the size of each region (length of the side)

    by this algorithm:

    1. For each particle, update the count of the O(d^2) regions affected by this particle
    2. Iterate over all O(n^2) possible regions, find the maximum

    as shown in this code, I copy the relevant part here for your reference:

    using namespace std;
    
    int mat [1024 + 3] [1024 + 3]; // Here n is assumed to be 1024
    
    int main ()
    {
        int testCases; scanf ("%d", &testCases);
    
        while ( testCases-- ) {
    
            Set(mat, 0);
    
            int d; scanf ("%d", &d); // d is the size of the region
            int k; scanf ("%d", &k); // k is the number of particles
    
            int x, y, cost;
    
            for ( int i = 0; i < k; i++ ) {
                scanf ("%d %d %d", &x, &y, &cost); // Read each particle position
    
                // Update the count of the d^2 region affected by this particle
                for ( int j = max (0, x - d); j <= min (x + d, 1024); j++ ) {
                    for ( int k = max (0, y - d); k <= min (y + d, 1024); k++ ) mat [j] [k] += cost;
                }
            }
    
            int resX, resY, maxi = -1;
    
            // Find the maximum count over all regions
            for ( int i = 0; i < 1025; i++ ) {
                for ( int j = 0; j < 1025; j++ ) {
                    if ( maxi < mat [i] [j] ) {
                        maxi = mat [i] [j];
                        resX = i;
                        resY = j;
                    }
                }
            }
    
            printf ("%d %d %d\n", resX, resY, maxi);
    
        }
        return 0;
    }
    

    I've put my comments in the code to explain it to you.

    0 讨论(0)
提交回复
热议问题