How to remove these duplicates in a list (python)

前端未结

关注

 5  2047

biglist = 

[ 

    {\'title\':\'U2 Band\',\'link\':\'u2.com\'}, 
    {\'title\':\'ABC Station\',\'link\':\'abc.com\'}, 
    {\'title\':\'Live Concert by U2\',\'link


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-01-03 15:50
              
            
            
                                                                       
Make a new dictionary, with 'u2.com' and 'abc.com' as the keys, and your list elements as the values. The dictionary will enforce uniqueness. Something like this:

uniquelist = dict((element['link'], element) for element in reversed(biglist))


(The reversed is there so that the first elements in the list will be the ones that remain in the dictionary. If you take that out, then you will get the last element instead).

Then you can get elements back into a list like this:

biglist = uniquelist.values()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-01-03 15:55
              
            
            
                                                                       
biglist = \
[ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

def dedupe(lst):
    d = {}
    for x in lst:
        link = x["link"]
        if link in d:
            continue
        d[link] = x
    return d.values()

lst = dedupe(biglist)


dedupe() keeps the first of any duplicates.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤城傲影        
                
              
                            
                2021-01-03 16:03
              
            
            
                                                                       
You can use defaultdict to group items by link, then removed duplicates if you want to.

from collections import defaultdict

nodupes = defaultdict(list)
for d in biglist:
    nodupes[d['url']].append(d['title']


This will give you:

defaultdict(<type 'list'>, {'abc.com': ['ABC Station'], 'u2.com': ['U2 Band', 
'Live Concert by U2']})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2021-01-03 16:11
              
            
            
                                                                       
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-01-03 16:12
              
            
            
                                                                       
You can sort the list, using the link field of each dictionary as the sort key, then iterate through the list once and remove duplicates (or rather, create a new list with duplicates removed, as is the Python idiom), like so:

# sort the list using the 'link' item as the sort key
biglist.sort(key=lambda elt: elt['link'])

newbiglist = []
for item in biglist:
    if newbiglist == [] or item['link'] != newbiglist[-1]['link']:
        newbiglist.append(item)


This code will give you the first element (relative ordering in the original biglist) for any group of "duplicates". This is true because the .sort() algorithm used by Python is guaranteed to be a stable sort -- it does not change the order of elements determined to be equal to one another (in this case, elements with the same link).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复