Remove duplicates from a list of dictionaries when only one of the key values is different

前端未结

关注

 5  1096

I have seen some similar answers, but I can\'t find something specific for this case:

I have a list of dictionaries like this:

[
 {\"element\":Bla, \


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-12 03:06
              
            
            
                                                                       
Apologies for terrible variable names. There is probably a cleaner way but this should work

seen = {(item["element"], item["version"]): False for item in mylist}

output = []
for item in mylist:
    item_key = (item["element"], item["version"])
    if not seen[item_key]:
        output.append(item)
        seen[item_key] = True

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-12 03:16
              
            
            
                                                                       
Pandas can solve this quickly:

import pandas as pd
Bla = "Bla"
d = [
{"element":Bla, "version":2, "date":"12/04/12"},
{"element":Bla, "version":2, "date":"12/05/12"},
{"element":Bla, "version":3, "date":"12/04/12"}
]
df = pd.DataFrame(d)
df[~df.drop("date", axis=1).duplicated()]


output:

       date element  version
0  12/04/12     Bla        2
2  12/04/12     Bla        3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-12 03:22
              
            
            
                                                                       
You say you have a lot of other keys in the dictionary not mentioned in the question.

Here is O(n) algorithm to do what you need:

>>> seen = set()
>>> result = []
>>> for d in dicts:
...     h = d.copy()
...     h.pop('date')
...     h = tuple(h.items())
...     if h not in seen:
...         result.append(d)
...         seen.add(h)

>>> pprint(result)
[{'date': '12/04/12', 'element': 'Bla', 'version': 2},
 {'date': '12/04/12', 'element': 'Bla', 'version': 3}]


h is a copy of the dict. date key is removed from it with pop.

Then tuple is created as a hashable type which can be added to set.

If h has never been seen before, we append it to result and add to seen. Additions to seen is O(1) as well as lookups (h not in seen).

At the end, result contains only unique elements in terms of defined h values.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-12 03:22
              
            
            
                                                                       
You could use the "unique_everseen" recipe from itertools to create a new list.

list(unique_everseen(original_list, key=lambda e: '{element}@{version}'.format(**e)))


If your "key" needs to be wider than the lambda I have written (to accomodate more values), then it's probably worth extracting to a function:

def key_without_date(element):
    return '@'.join(["{}".format(v) for k,v in element.iteritems() if k != 'date'])

list(unique_everseen(original_list, key=key_without_date))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-12 03:30
              
            
            
                                                                       
This works:

LoD=[
{"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/04/12"},
{"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/05/12"},
{"element":'Bla', "version":3, 'list':[1,2,3], "date":"12/04/12"}
]

LoDcopy=[]
seen=set()


for d in LoD:
    dc=d.copy()
    del dc['date']
    s=dc.__str__()
    if s in seen: continue
    seen.add(s)
    LoDcopy.append(d)    

print LoDcopy 


prints:

[{'date': '12/04/12', 'version': 2, 'list': [1, 2, 3], 'element': 'Bla'}, 
 {'date': '12/04/12', 'version': 3, 'list': [1, 2, 3], 'element': 'Bla'}]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复