Iterative Solution to End-Overlapping Indices

问题

I have a list that holds tuples that represent ranges of numbers. My goal is to return all (see the note below; really looking for the longest) possible subsets of this collection that overlap only by the second value in each tuple or not at all. The function I have been using is a recursive solution to this problem.

def get_all_end_overlapping_indices(lst, i, out):
    all_possibilities = []

    def _get_all_end_overlapping_indices_helper(list_in, i, out):
        r = -1
        if i == len(list_in):
            if out:
                if len(all_possibilities) == 0:
                    all_possibilities.append(out)
                else:                       
                    all_possibilities.append(out)

            return 

        n = i + 1

        while n < len(list_in) and r > list_in[n][0]:
            n += 1
        _get_all_end_overlapping_indices_helper(list_in, n, out)
        r = list_in[i][1]

        n = i + 1
        while n < len(list_in) and r > list_in[n][0]:
            n += 1
        _get_all_end_overlapping_indices_helper(list_in, n, out + [list_in[i]])

    _get_all_end_overlapping_indices_helper.count = 0
    lst.sort()
    _get_all_end_overlapping_indices_helper(list_in = lst, i = 0, out = [])
    
    return all_possibilities

We get the following result with lst = [(0.0, 2.0), (0.0, 4.0), (2.5, 4.5), (2.0, 5.75), (2.0, 4.0), (6.0, 7.25), (4.0, 5.5)]

[(6.0, 7.25)]
[(4.0, 5.5)]
[(4.0, 5.5), (6.0, 7.25)]
[(2.5, 4.5)]
[(2.5, 4.5), (6.0, 7.25)]
[(2.0, 5.75)]
[(2.0, 5.75), (6.0, 7.25)]
[(2.0, 4.0)]
[(2.0, 4.0), (6.0, 7.25)]
[(2.0, 4.0), (4.0, 5.5)]
[(2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 4.0)]
[(0.0, 4.0), (6.0, 7.25)]
[(0.0, 4.0), (4.0, 5.5)]
[(0.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0)]
[(0.0, 2.0), (6.0, 7.25)]
[(0.0, 2.0), (4.0, 5.5)]
[(0.0, 2.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0), (2.5, 4.5)]
[(0.0, 2.0), (2.5, 4.5), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 5.75)]
[(0.0, 2.0), (2.0, 5.75), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0)]
[(0.0, 2.0), (2.0, 4.0), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]

As I will eventually be dealing with larger collections of tuples (and this runs quite slowly), I'd like to implement an iterative solution; unfortunately, I'm stumped. This snippet originally came from: Find all possible combinations that overlap by end and start. Although it works, I find it tricky to decipher how it's working. Could anyone provide some tips on how you might construct an iterative solution to this problem?

Note: I'm actually looking to get only the longest outputs (see below). I can always filter out the shorter ones (i.e. the ones that sit inside the longest ones) later, but if it makes it easier, I can gladly do away with them.

[(0.0, 2.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 2.0), (2.5, 4.5), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 5.75), (6.0, 7.25)]
[(0.0, 2.0), (2.0, 4.0), (4.0, 5.5), (6.0, 7.25)]
[(0.0, 4.0), (4.0, 5.5), (6.0, 7.25)]

回答1:

We can solve this problem in polynomial time by reducing it to the problem of the longest path in a DAG (directed acyclic graph).

First, we need to model the problem as a DAG. Each tuple represents a vertex, and we build an edge from (a,b) to (c, d) if and only if b <= c.

What we can then see is that (1) the graph obtained is acyclic, by construction and (2) the longest path from a vertex to another in this graph will represent the longest sequence of overlapping tuples.

Luckily, the longest path problem, which is NP-hard in the general case, is not hard in a DAG. The problem is described in length in this document (page 4).
The overall complexity to find the longest overlapping sequence of tuples should then be: O(n²) to build the graph, O(n²) to sort vertices, and O(n²) to find the longest path, so O(n²) in the worst case. This is way faster than the recursive approach you were going for, since we don't want to enumerate all combinations, but we want only the longest.

Below is a python 3 code that will compute the longest sequence of tuples. In the case I misunderstood the 'overlap' relation on tuples, it is easily modifiable in the overlap_condition function.

def overlap_condition(tup1, tup2):
    if tup1 == tup2:
        return False
    a, b = tup1
    c, d = tup2
    return b <= c


def adj_mat_from_tup_list(tup_lst):
    return [
        [
            1 if overlap_condition(tup_lst[i], tup_lst[j]) else 0
            for j in range(len(tup_lst))
        ] for i in range(len(tup_lst))

    ]


def topological_sort(adj_mat):
    sorted_v = []
    sinks = {
        i for i in range(len(adj_mat))
        if not any(adj_mat[j][i] == 1 for j in range(len(adj_mat)))
    }

    while sinks:
        v = sinks.pop()
        sorted_v += [v]
        for j in range(len(adj_mat)):
            if adj_mat[v][j] == 1:
                adj_mat[v][j] = 0
                if not any(adj_mat[w][j] for w in range(len(adj_mat))):
                    sinks.add(j)
    return sorted_v


def get_longest_path(adj_mat, sorted_v):
    dists = {v: 0 for v in range(len(adj_mat))}
    preds = {v: None for v in range(len(adj_mat))}
    for v in sorted_v:
        for u in range(len(adj_mat)):
            if adj_mat[u][v]:
                dists[v] = max(dists[v], dists[u] + 1)
                preds[v] = u

    current_v = {
        v for v in range(len(adj_mat))
        if dists[v] == max(dists.values())
    }.pop()
    result = [current_v]
    while preds[current_v] is not None:
        current_v = preds[current_v]
        result += [current_v]
    return result[::-1]


def get_all_end_overlap_tups(tup_lst):
    sorted_v = topological_sort(adj_mat_from_tup_list(tup_lst))
    adj_mat = adj_mat_from_tup_list(tup_lst)
    return [tup_lst[i] for i in get_longest_path(adj_mat, sorted_v)]


lst = [
    (0.0, 2.0), (0.0, 4.0), (2.5, 4.5), (2.0, 5.75),
    (2.0, 4.0), (6.0, 7.25), (4.0, 5.5)
]

print(get_all_end_overlap_tups(lst))

来源：https://stackoverflow.com/questions/62734114/iterative-solution-to-end-overlapping-indices

标签

python

list

loops

recursion

iteration