filter out “reversed” duplicated tuples from a list in Python

后端 未结 4 1381
滥情空心
滥情空心 2021-01-15 14:48

I\'ve a list like this:

[(\'192.168.1.100\', \'192.168.1.101\', \'A\'), (\'192.168.1.101\', \'192.168.1.100\', \'A\'), 
 (\'192.168.1.103\', \'192.168.1.101\         


        
相关标签:
4条回答
  • 2021-01-15 15:12

    The straightforward, yet inefficient (O(n²)) approach (thanks, @Rafał Dowgird!):

    >>> uniq=[]
    >>> for i in l:                           # O(n), n being the size of l
    ...     if not (i in uniq or tuple([i[1], i[0], i[2]]) in uniq): # O(n)
    ...             uniq.append(i)                                   # O(1)
    ... 
    >>> uniq
    [('192.168.1.100', '192.168.1.101', 'A'), 
     ('192.168.1.103', '192.168.1.101', 'B'), 
     ('192.168.1.104', '192.168.1.100', 'C')]
    

    A more efficient approach using Python's Set:

    >>> uniq=set()
    >>> for i in l: # O(n), n=|l|
    ...     if not (i in uniq or tuple([i[1], i[0], i[2]]) in uniq): # O(1)-Hashtable
    ...             uniq.add(i)
    ... 
    >>> list(uniq)
    [('192.168.1.104', '192.168.1.100', 'C'), 
     ('192.168.1.100', '192.168.1.101', 'A'), 
     ('192.168.1.103', '192.168.1.101', 'B')]
    

    You can sort it according to the last element:

    >>> sorted(list(uniq), key=lambda i:i[2])
    [('192.168.1.100', '192.168.1.101', 'A'), 
     ('192.168.1.103', '192.168.1.101', 'B'), 
     ('192.168.1.104', '192.168.1.100', 'C')]
    
    0 讨论(0)
  • 2021-01-15 15:12
    >>> L=[('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.100', 'A'), 
    ...  ('192.168.1.103', '192.168.1.101', 'B'), ('192.168.1.104', '192.168.1.100', 'C')]
    >>> set(tuple(sorted((a,b))+[c]) for a,b,c in L)
    set([('192.168.1.100', '192.168.1.104', 'C'), ('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.103', 'B')])
    
    0 讨论(0)
  • 2021-01-15 15:33

    One possible way to do this would be as follows

    >>> somelist=[('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.100', 'A'), 
     ('192.168.1.103', '192.168.1.101', 'B'), ('192.168.1.104', '192.168.1.100', 'C')]
    >>> list(set((y,x,z) if x > y else (x,y,z) for (x,y,z) in somelist))
    [('192.168.1.100', '192.168.1.104', 'C'), ('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.103', 'B')]
    >>> 
    

    Assuming the difference is because of the order of the IP addresses which are the first two item, create a generator and feed it to a set comprehension such that the IP address in the tuples are always in order. Then from the set create a list.

    Considering Rafel's comment here is one another solution which preserves the order of a non-duplicate tuple

    >>> someset=set()
    >>> [someset.add(e)  for e in somelist if (e not in someset and e[0:2][::-1]+e[2:] not in someset)]
    >>> list(someset)
    

    The reason I am using a set in the above solution to make the membership operation faster

    0 讨论(0)
  • 2021-01-15 15:37

    Group by normalized (i.e. addresses sorted) values, return original ones:

    data = [('192.168.1.100', '192.168.1.101', 'A'),
      ('192.168.1.101', '192.168.1.100', 'A'),
      ('192.168.1.103', '192.168.1.101', 'B'),
      ('192.168.1.104', '192.168.1.100', 'C')]
    normalized = dict([(min(t[0], t[1]), max(t[0], t[1]), t[2]), t]
                      for t in data)
    result = normalized.values()
    
    0 讨论(0)
提交回复
热议问题