My math-fu is failing me! I need an efficient way of reducing network ranges to supersets, e.g. if I input list of IP ranges:
You know that you can easily convert IPv4 addresses to int numbers (int32 numbers), do you? Working with int numbers is much easier. So basically every address is a number in the range 0 to 2^32. Every range has a start number and an end number. Your example
1.1.1.1 to 2.2.2.5
1.1.1.2 to 2.2.2.4
can be written as
16,843,009 to 33,686,021
16,843,010 to 33,686,020
So it's pretty easy to see if one range is within the other range. A range is completely within the other range if the following condition is given
startIP2 >= startIP1 && startIP2 <= endIP1 &&
endIP1 >= startIP1 && endIP2 <= endIP1
In that case the range startIP2-endIP2 is completely within startIP1-endIP1. If only the first line is true, then startIP2 is within the range startIP1-endIP1, but the end is beyond the range. If only the second line is true, the endIP is within the range, but the start IP is beyond the range. In that case, if only one line is true, you need to expand the range at the beginning or at the end. If both lines are false, the ranges are completely disjoint, in that case they are two completely independent ranges.
What you need to do is simply check the ranges for overlap. If two ranges overlap, then they get merged into a single range. Ranges overlap if the right hand side of one range is greater than the left hand side of another.
This is a union of segments computation. An optimal algorithm (in O(nlog(n))) consists in doing the following:
At the end, you obtain a sorted list of disjoint supersets. Still, two supersets A and B can be adjacent (the endpoint of A is just before the starting point of B). If you want A and B to be merged, you can do this either by a simple postprocessing step, or by slightly modifying step 3: when LE-RE reaches zero, you would consider it the end of a superset only if the next element in L is not the direct successor of your current element.
Alright, my coworker came up with this answer, which I think is pretty excellent. Let me know if you see any issues: