A range intersection algorithm better than O(n)?

前端 未结 9 710
悲&欢浪女
悲&欢浪女 2020-12-04 18:27

Range intersection is a simple, but non-trivial problem.

Its has been answered twice already:

  • Find number range intersection
  • Comparing date ra
相关标签:
9条回答
  • 2020-12-04 18:52

    Edit: It sounds like this solution is more or less an Interval Tree. A more complete implementation of an Interval Tree can be found here.

    class TreeNode
    {
    public:
        long pivot;
        List<Range> leaves;  //Any ranges that intersect the pivot
        TreeNode left;        //Tree nodes that fall to the left of the pivot
        TreeNode right;       //Tree nodes that fall to the right of the pivot
    };
    

    Prep O(n log n):

    1. Create the list of ranges
    2. Pick the pivot points (possibly by using a sorted list of the end dates.) ??
    3. Build your tree.

    Search:

    1. Use binary search to find the first pivot that is >= TestRange.End
    2. Traverse the tree until the pivot > TestRange.Start

      2a. Add the leaves to your result.


    Example:

    Ranges:

    • 0 - 2
    • 1 - 2
    • 2 - 3
    • 1 - 4
    • 2 - 4
    • 0 - 5
    • 4 - 5
    • 2 - 6
    • 3 - 7

    Tree:

                                 4
                   --------------+------------------
                   3             |                 7
                   |            1-4                |
                   |            2-4                |
                   |            0-5                |
                   |            4-5                |
          ---------+------                 --------+--------
          2        |    null              6        |       null
     -----+----   2-3                 ----+----   3-7
    null  |  null                   null  |  null    
         0-2                             2-6
         1-2
    
    0 讨论(0)
  • 2020-12-04 18:56

    If ranges overlap, and one wants to retrieve all the ranges that overlap (or contain) a given target range, most of the solutions above don't appear to work.

    As some have pointed out, if (worst-case) all the ranges happen to intersect the target range (for example, if the target range is {0..MAXINT} or similar) then of course it takes O(n) to return the n ranges.

    But isn't the interesting and typical/average case, where only a very small % of the n total ranges do intersect the target range? Call the number that do intersect "m" -- in that case, you might conceivably be able to do as well as O(m). And if n=10^9 and m=10, that's a make-or-break difference.

    Consider the simple case of a text document that has various regions marked up for their "type" -- perhaps you want to find all the marked-up units that contain or intersect a given contiguous range of text (for example, a paragraph). In HTML, XML, or similar those can only be ancestors of the text-node(s) containing at least some characters of the target range. In typical representations with parent pointers in each node, that's O(m) -- way better than O(n), especially because m is (for short or synchronous target ranges) merely the tree nesting depth, which tends to be even lower than ln(n) because big XML documents in practice get bushier not deeper.

    The interesting case is harder: what if your "elements" do not form a tree as in XML, but can overlap as in MECS, CLIX, LMNL, and some other systems? You still want to find all the regions/"elements" that overlap your target, but they aren't so easily organized.

    On the other hand, you should be able to do very well because marked-up ranges in many applications are most frequently small -- there are far more words, sentences, and paragraphs in a book, than there are chapters. So even though there may be a huge number of ranges that start before the target and a huge number that end after it, the intersection will be very small on average.

    I think that's what the original questioner was getting at, and I'm afraid i didn't see an answer that addresses that problem. If it's not what the original question was about, then I'd like to pose it as a new question.

    0 讨论(0)
  • 2020-12-04 19:00

    This depends on your exact problem, in the linked question, the ranges where distinct, no common part, and the searched ranged could span multiple ranges. If your problem is the same, its is really easy: Take an array of the ranges, sort them by their lowest values (since they do not overlap, this would be also the same order as sorted by their upper values).

    Now just make a binsearch for the your target lower value (or smaller if not exact) and one for the target upper value(or bigger if not exact). The resulting indexes are the the ranges which are coverd. You have to check wheter the ranges at the indexes itself are in- or excluded, but that are just 2 checks. Overall complexity O(log n).

    0 讨论(0)
  • 2020-12-04 19:05

    Overlapping Ranges:

    Prep O(n log n):

    1. Make a array / vector of the ranges.
    2. Sort the vector by the end of the range (break ties by sorting by the start of the range)
    3. Make a second vector of ints. This represents the point at which you can stop searching.

      int stop[size];
      stop[size-1] = Ranges[size - 1].start;
      for (int i = size - 2; i >= 0; i--)
      {
          stop[i] = min(Ranges[i].start, stop[i+1]);
      }
      

    Search:

    1. Use binary search to find the first range with an End value of >= TestRange.Start
    2. Iterator starting at the binary search until stop[i] > TestRange.End:

      2a. If the range if the current range is within the TestRange, add it to your result.

    0 讨论(0)
  • 2020-12-04 19:05

    Just as a quad tree works for a set of 2d points, a simple binary tree should work for this case. Build a tree with your ranges.

    To explain further: Each node in the tree contains two integers, the beginning and end of the range, and the two children if it's not a leaf node. To find the ranges that your input range spans, then starting at the top of the tree

      - if the node range intersects the input range:
         - if it's a leaf node, then add the range to your result list
         - if it's not a leaf node, then traverse down to the child nodes and repeat this process.
    

    It should be O(logN)

    Further detail: The binary tree would be structured like a 1-d version of a quad tree. Each node would have three integers (sorry I said two above, but now I realize you need three), the lowest representing the lowest value of the lowest range that's below this node, the highest representing the highest value of the highest range that's below this node, and the pivot. The left child would span from this node's lowest to its pivot. The right child would span from this node's pivot to this node's highest. If there is only one range that goes from "lowest" to "highest", you wouldn't have a pivot and this would be a leaf. Ideally you'd pick the pivots for each node to keep the tree balanced.

    0 讨论(0)
  • 2020-12-04 19:07

    When I had this problem, I used a sorted array of ranges and a binary search to look for intersections. This is (I believe) O(log n) performance, with a little bit of overhead to deal with overlapping ranges.

    The answer to your question is, I think, derivable from the code below, but stopping short of the insertion. I present the entire code to avoid confusion by the differing context - I needed to insert a range of Unicode codepoints into a list of codepoint ranges.

    -- EDIT --

    Adapting the code below to determine intersections of multiple ranges involves a trivial forward search from the insertion point until a range is found which no longer intersects.

    -- END EDIT --

    The Range class contains:

    final int                               lower;                                  // lower end of range
    final int                               upper;                                  // upper end of range
    
    public int compareTo(Object obj) {
        if(obj==null) { return -1; }
    
        Range                           oth=(Range)obj;
    
        if(lower<oth.lower) { return -1; }
        if(lower>oth.lower) { return  1; }
        if(upper<oth.upper) { return -1; }
        if(upper>oth.upper) { return  1; }
        return 0;
        }
    

    Range Insertion:

    public Builder addRange(int fir, int las) {
        if(fir!=-1) { fir&=0x001FFFFF; }
        if(las!=-1) { las&=0x001FFFFF; }
    
        if(codepoints==null || codepoints.length==0) {
            codepoints=new Range[]{new Range(fir,las)};
            }
        else {
            int                         idx=Range.findChar(codepoints,fir);
            int                         ins=(idx<0 ? -(idx+1) : idx);
    
            if(idx<0) {
                if     (ins>0                 && fir==(codepoints[ins-1].upper+1)) { idx=(ins-1); }  // new range adjoins the following range (can't overlap or idx would be >=0)
                else if(ins<codepoints.length && las>=(codepoints[ins  ].lower-1)) { idx=ins;     }  // new range overlaps or adjoins the following range
                }
    
            if(idx<0) {
                codepoints=(Range[])Util.arrayInsert(codepoints,ins,new Range(fir,las));
                }
            else {
                boolean                 rmv=false;
    
                for(int xa=(idx+1); xa<codepoints.length && codepoints[xa].lower<=las; xa++) {
                    if(las<codepoints[xa].upper) { las=codepoints[xa].upper; }
                    codepoints[xa]=null;
                    rmv=true;
                    }
                if(codepoints[idx].lower>fir || codepoints[idx].upper<las) {
                    codepoints[idx]=new Range((codepoints[idx].lower < fir ? codepoints[idx].lower : fir),(codepoints[idx].upper>las ? codepoints[idx].upper : las));
                    }
                if(rmv) { codepoints=Range.removeNulls(codepoints); }
                }
            }
        return this;
        }
    

    Binary Search:

    static int findChar(Range[] arr, int val) {
        if(arr.length==1) {
            if     (val< arr[0].lower) { return -1; }                             // value too low
            else if(val<=arr[0].upper) { return  0; }                             // value found
            else                       { return -2; }                             // value too high
            }
        else {
            int                             lowidx=0;                               // low index
            int                             hghidx=(arr.length-1);                  // high index
            int                             mididx;                                 // middle index
            Range                           midval;                                 // middle value
    
            while(lowidx<=hghidx) {
                mididx=((lowidx+hghidx)>>>1);
                midval=arr[mididx];
                if     (val< midval.lower) { hghidx=(mididx-1); }                   // value too low
                else if(val<=midval.upper) { return mididx;     }                   // value found
                else                       { lowidx=(mididx+1); }                   // value too high
                }
            return -(lowidx+1);                                                     // value not found.
            }
        }
    
    0 讨论(0)
提交回复
热议问题