merge lists with intersection

前端 未结 3 2129
南笙
南笙 2020-12-18 02:38

Given that:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]

How can I compare each list within g so that for

相关标签:
3条回答
  • 2020-12-18 02:47

    Here's a quickie that will give a list of all the sets that intersect:

    sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]
    

    Note that each result will be repeated, since each list is being compared twice, once on the left and once on the right.

    0 讨论(0)
  • 2020-12-18 03:03

    If g or g's elements are huge you can use Disjoint Sets to boost efficiency.

    This data structure can be used to classify each element into the set it should belong to.

    The first step is to build a Disjoint Set collection with all g sets labeled by their index in g:

    g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7],[99]]
    g = map(set, g)
    dss = CDisjointSets()
    for i in xrange(len(g)):
        dss.MakeSet(i)
    

    Then, the sets gets joined whenever the intersection is not empty:

    for i in xrange(len(g)):
        for j in xrange(i+1, len(g)):
            if g[i].intersection(g[j]):
                dss.Join(i,j)
    

    At this point dss gives you a common label for sets of g that should joined together:

    print(dss)
    

    parent(0) = 0 parent(1) = 1 parent(2) = 2 parent(3) = 3 parent(4) = 2 parent(5) = 3 parent(6) = 3 parent(7) = 7 parent(8) = 8 parent(9) = 2 parent(10) = 10

    Now you just have to build the new sets joining those which have the same label:

    l2set = dict()
    for i in xrange(len(g)):
        label = dss.FindLabel(i).getLabel()
        l2set[label] = l2set.get(label, set()).union(g[i])
    print(l2set)
    

    Resulting in:

    {0: set([]), 1: set([]), 2: set([0, 2, 3, 7]), 3: set([1, 4, 5, 6]), 7: set([]), 8:   set([]), 10: set([99])}
    

    This is the implementation of Disjoint Sets I used but surely you can find anotherone with a better sintax:

    """ Disjoint Sets
        -------------
        Pablo Francisco Pérez Hidalgo
        December,2012. """
    class CDisjointSets:
    
        #Class to represent each set
        class DSet:
            def __init__(self, label_value):
                self.__label = label_value
                self.rank = 1
                self.parent = self
            def getLabel(self):
                return self.__label
    
        #CDisjointSets Private attributes
        __sets = None
    
        #CDisjointSets Constructors and public methods.
        def __init__(self):
            self.__sets = {}
    
        def MakeSet(self, label):
            if label in self.__sets: #This check slows the operation a lot,
                return False         #it should be removed if it is sure that
                                     #two sets with the same label are not goind
                                     #to be created.
            self.__sets[label] = self.DSet(label)
    
        #Pre: 'labelA' and 'labelB' are labels or existing disjoint sets.
        def Join(self, labelA, labelB):
            a = self.__sets[labelA]
            b = self.__sets[labelB]
            pa = self.Find(a)
            pb = self.Find(b)
            if pa == pb: 
                return #They are already joined
            parent = pa
            child = pb
            if pa.rank < pb.rank:
                parent = pb
                child = pa
            child.parent = parent
            parent.rank = max(parent.rank, child.rank+1)
    
        def Find(self,x):
            if x == x.parent:
                return x
            x.parent = self.Find(x.parent)
            return x.parent
    
        def FindLabel(self, label):
            return self.Find(self.__sets[label])
    
        def __str__(self):
            ret = ""
            for e in self.__sets:
                ret = ret + "parent("+self.__sets[e].getLabel().__str__()+") = "+self.FindLabel(e).parent.getLabel().__str__() + "\n"
            return ret
    
    0 讨论(0)
  • 2020-12-18 03:10

    As a much faster way You can first create a list of the set of items with len more than one (s) . then go through your list and update in place with union function !

    s=map(set,g)
    def find_intersection(m_list):
        for i,v in enumerate(m_list) : 
            for j,k in enumerate(m_list[i+1:],i+1):
               if v & k:
                  m_list[i]=v.union(m_list.pop(j))
                  return find_intersection(m_list)
        return m_list
    

    Demo :

    g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
    s=map(set,g)
    print find_intersection(s)
    
    [set([0, 2, 3, 7]), set([1, 4, 5, 6])]
    
    g=[[1,2,3],[3,4,5],[5,6],[6,7],[9,10],[10,11]]
    s=map(set,g)
    print find_intersection(s)
    
    [set([1, 2, 3, 4, 5, 6, 7]), set([9, 10, 11])]
    
    g=[[], [1], [0,2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
    s=map(set,g)
    print find_intersection(s)
    
    [set([1, 4, 5, 6]), set([0, 2, 3, 7])]
    

    Benchmark with @Mark's answer :

    from timeit import timeit
    
    
    s1="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
    sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))]
        """
    s2="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]]
    
    s=map(set,g)
    
    def find_intersection(m_list):
        for i,v in enumerate(m_list) : 
            for j,k in enumerate(m_list[i+1:],i+1):
               if v & k:
                  s[i]=v.union(m_list.pop(j))
                  return find_intersection(m_list)
        return m_list 
        """
    
    print ' first: ' ,timeit(stmt=s1, number=100000)
    print 'second : ',timeit(stmt=s2, number=100000)
    
    first:  3.8284008503
    second :  0.213887929916
    
    0 讨论(0)
提交回复
热议问题