filter out “reversed” duplicated tuples from a list in Python

后端未结

关注

 4  1381

I\'ve a list like this:

[(\'192.168.1.100\', \'192.168.1.101\', \'A\'), (\'192.168.1.101\', \'192.168.1.100\', \'A\'), 
 (\'192.168.1.103\', \'192.168.1.101\


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2021-01-15 15:12
              
            
            
                                                                       
The straightforward, yet inefficient (O(n²)) approach (thanks, @Rafał Dowgird!):

>>> uniq=[]
>>> for i in l:                           # O(n), n being the size of l
...     if not (i in uniq or tuple([i[1], i[0], i[2]]) in uniq): # O(n)
...             uniq.append(i)                                   # O(1)
... 
>>> uniq
[('192.168.1.100', '192.168.1.101', 'A'), 
 ('192.168.1.103', '192.168.1.101', 'B'), 
 ('192.168.1.104', '192.168.1.100', 'C')]


A more efficient approach using Python's Set:

>>> uniq=set()
>>> for i in l: # O(n), n=|l|
...     if not (i in uniq or tuple([i[1], i[0], i[2]]) in uniq): # O(1)-Hashtable
...             uniq.add(i)
... 
>>> list(uniq)
[('192.168.1.104', '192.168.1.100', 'C'), 
 ('192.168.1.100', '192.168.1.101', 'A'), 
 ('192.168.1.103', '192.168.1.101', 'B')]


You can sort it according to the last element:

>>> sorted(list(uniq), key=lambda i:i[2])
[('192.168.1.100', '192.168.1.101', 'A'), 
 ('192.168.1.103', '192.168.1.101', 'B'), 
 ('192.168.1.104', '192.168.1.100', 'C')]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2021-01-15 15:12
              
            
            
                                                                       
>>> L=[('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.100', 'A'), 
...  ('192.168.1.103', '192.168.1.101', 'B'), ('192.168.1.104', '192.168.1.100', 'C')]
>>> set(tuple(sorted((a,b))+[c]) for a,b,c in L)
set([('192.168.1.100', '192.168.1.104', 'C'), ('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.103', 'B')])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2021-01-15 15:33
              
            
            
                                                                       
One possible way to do this would be as follows

>>> somelist=[('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.100', 'A'), 
 ('192.168.1.103', '192.168.1.101', 'B'), ('192.168.1.104', '192.168.1.100', 'C')]
>>> list(set((y,x,z) if x > y else (x,y,z) for (x,y,z) in somelist))
[('192.168.1.100', '192.168.1.104', 'C'), ('192.168.1.100', '192.168.1.101', 'A'), ('192.168.1.101', '192.168.1.103', 'B')]
>>> 


Assuming the difference is because of the order of the IP addresses which are the first two item, create a generator and feed it to a set comprehension such that the IP address in the tuples are always in order. Then from the set create a list.

Considering Rafel's comment here is one another solution which preserves the order of a non-duplicate tuple

>>> someset=set()
>>> [someset.add(e)  for e in somelist if (e not in someset and e[0:2][::-1]+e[2:] not in someset)]
>>> list(someset)


The reason I am using a set in the above solution to make the membership operation faster
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2021-01-15 15:37
              
            
            
                                                                       
Group by normalized (i.e. addresses sorted) values, return original ones:

data = [('192.168.1.100', '192.168.1.101', 'A'),
  ('192.168.1.101', '192.168.1.100', 'A'),
  ('192.168.1.103', '192.168.1.101', 'B'),
  ('192.168.1.104', '192.168.1.100', 'C')]
normalized = dict([(min(t[0], t[1]), max(t[0], t[1]), t[2]), t]
                  for t in data)
result = normalized.values()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复