Find duplicate items within a list of list of tuples Python

后端 未结 3 1347
醉话见心
醉话见心 2021-01-14 22:36

I want to find the matching item from the below given list.My List may be super large.

The very first item in the tuple \"N1_10\" is duplicated and matched with anot

相关标签:
3条回答
  • 2021-01-14 23:03

    Update: After rereading your question, it appears that you're trying to create equivalence classes, rather than collecting values for keys. If

    [[(1, 2), (3, 4), (2, 3)]]
    

    should become

    [(1, 2, 3, 4)]
    

    , then you're going to need to interpret your input as a graph and apply a connected components algorithm. You could turn your data structure into an adjacency list representation and traverse it with a breadth-first or depth-first search, or iterate over your list and build disjoint sets. In either case, your code is going to suddenly involve a lot of graph-related complexity, and it'll be hard to provide any output ordering guarantees based on the order of the input. Here's an algorithm based on a breadth-first search:

    import collections
    
    # build an adjacency list representation of your input
    graph = collections.defaultdict(set)
    for l in ListA:
        for first, second in l:
            graph[first].add(second)
            graph[second].add(first)
    
    # breadth-first search the graph to produce the output
    output = []
    marked = set() # a set of all nodes whose connected component is known
    for node in graph:
        if node not in marked:
            # this node is not in any previously seen connected component
            # run a breadth-first search to determine its connected component
            frontier = set([node])
            connected_component = []
            while frontier:
                marked |= frontier
                connected_component.extend(frontier)
    
                # find all unmarked nodes directly connected to frontier nodes
                # they will form the new frontier
                new_frontier = set()
                for node in frontier:
                    new_frontier |= graph[node] - marked
                frontier = new_frontier
            output.append(tuple(connected_component))
    

    Don't just copy this without understanding it, though; understand what it's doing, or write your own implementation. You'll probably need to be able to maintain this. (I would've used pseudocode, but Python is practically as simple as pseudocode already.)

    In case my original interpretation of your question was correct, and your input is a collection of key-value pairs that you want to aggregate, here's my original answer:

    Original answer

    import collections
    
    clusterer = collections.defaultdict(list)
    
    for l in ListA:
        for k, v in l:
            clusterer[k].append(v)
    
    output = clusterer.values()
    

    defaultdict(list) is a dict that automatically creates a list as the value for any key that wasn't already present. The loop goes over all the tuples, collecting all values that match up to the same key, then creates a list of (key, value_list) pairs from the defaultdict.

    (The output of this code is not quite in the form you specified, but I believe this form is more useful. If you want to change the form, that should be a simple matter.)

    0 讨论(0)
  • 2021-01-14 23:10
    tupleList = [(1, 2), (3, 4), (1, 4), (3, 2), (1, 2), (7, 9), (9, 8), (5, 6)]
    
    newSetSet = set ([frozenset (aTuple) for aTuple in tupleList])
    setSet = set ()
    
    while newSetSet != setSet:
        print '*'
        setSet = newSetSet
        newSetSet = set ()
        for set0 in setSet:
            merged = False
            for set1 in setSet:
                if set0 & set1 and set0 != set1:
                    newSetSet.add (set0 | set1)
                    merged = True
            if not merged:
                newSetSet.add (set0)
    
            print [tuple (element) for element in setSet]
            print [tuple (element) for element in newSetSet]
            print 
    
    print [tuple (element) for element in newSetSet]
    
    # Result:  [(1, 2, 3, 4), (5, 6), (8, 9, 7)]
    
    0 讨论(0)
  • 2021-01-14 23:15

    Does output order matter? This is the simplest way I could think of:

    ListA  = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],[('N1_22', 'N3_72'), ('N1_10', 'N3_98')],
                [('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]
    
    idx = dict()
    
    for sublist in ListA:
        for pair in sublist:
            for item in pair:
                mapping = idx.get(item,set())
                mapping.update(pair)
                idx[item] = mapping 
                for subitem in mapping:
                    submapping = idx.get(subitem,set())
                    submapping.update(mapping)
                    idx[subitem] = submapping
    
    
    for x in set([frozenset(x) for x in idx.values()]):
        print list(x)
    

    Output:

    ['N3_72', 'N1_22']
    ['N2_28', 'N3_98', 'N1_10']
    ['N2_61', 'N3_37']
    ['N2_33', 'N3_28']
    ['N2_55', 'N3_62']
    ['N2_44', 'N1_35']
    
    0 讨论(0)
提交回复
热议问题