Remove duplicates from list of dicts

后端未结

关注

 2  1176

I presently have the ability to remove duplicates if there is no key in front of the nested dictionary. An example of my list of dicts that works with this function is:

相关标签:

2条回答

予麋鹿

2020-12-20 08:20

To remove duplicates from a list of dicts:

list_of_unique_dicts = []
for dict_ in list_of_dicts:
    if dict_ not in list_of_unique_dicts:
        list_of_unique_dicts.append(dict_)

0 讨论(0)

心在旅途

2020-12-20 08:36

If the order in the result is not important, you can use a set to remove the duplicates by converting the dicts into frozen sets:

def remove_dict_duplicates(list_of_dicts):
    """
    Remove duplicates.
    """
    packed = set(((k, frozenset(v.items())) for elem in list_of_dicts for
                 k, v in elem.items()))
    return [{k: dict(v)} for k, v in packed]

This assumes that all values of the innermost dicts are hashable.

Giving up the order yields potential speedups for large lists. For example, creating a list with 100,000 elements:

inner = {'asndb_prefix': '50.999.0.0/16',
         'cidr': '50.999.0.0/14',
         'cymru_asn': '14618',
         'cymru_country': 'US',    
         'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',    
         'cymru_prefix': '50.16.0.0/16',    
         'ip': '50.16.221.xxx',    
         'network_id': '50.16.xxx.0/24',    
         'pyasn_asn': 14618,    
          'whois_asn': '14618'}

large_list = list_of_dicts + [{x: inner} for x in range(int(1e5))]

It takes quite a while checking for duplicates in the result list again and again:

def remove_dupes(list_of_dicts):
    """Source: answer from wim
    """ 
    list_of_unique_dicts = []
    for dict_ in list_of_dicts
        if dict_ not in list_of_unique_dicts:
            list_of_unique_dicts.append(dict_)
    return list_of_unique_dicts

%timeit  remove_dupes(large_list
1 loop, best of 3: 2min 55s per loop

My approach, using a set is a bit faster:

%timeit remove_dict_duplicates(large_list)
1 loop, best of 3: 590 ms per loop

0 讨论(0)