I presently have the ability to remove duplicates if there is no key in front of the nested dictionary. An example of my list of dicts that works with this function is:
To remove duplicates from a list of dicts:
list_of_unique_dicts = []
for dict_ in list_of_dicts:
if dict_ not in list_of_unique_dicts:
list_of_unique_dicts.append(dict_)
If the order in the result is not important, you can use a set to remove the duplicates by converting the dicts into frozen sets:
def remove_dict_duplicates(list_of_dicts):
"""
Remove duplicates.
"""
packed = set(((k, frozenset(v.items())) for elem in list_of_dicts for
k, v in elem.items()))
return [{k: dict(v)} for k, v in packed]
This assumes that all values of the innermost dicts are hashable.
Giving up the order yields potential speedups for large lists. For example, creating a list with 100,000 elements:
inner = {'asndb_prefix': '50.999.0.0/16',
'cidr': '50.999.0.0/14',
'cymru_asn': '14618',
'cymru_country': 'US',
'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',
'cymru_prefix': '50.16.0.0/16',
'ip': '50.16.221.xxx',
'network_id': '50.16.xxx.0/24',
'pyasn_asn': 14618,
'whois_asn': '14618'}
large_list = list_of_dicts + [{x: inner} for x in range(int(1e5))]
It takes quite a while checking for duplicates in the result list again and again:
def remove_dupes(list_of_dicts):
"""Source: answer from wim
"""
list_of_unique_dicts = []
for dict_ in list_of_dicts
if dict_ not in list_of_unique_dicts:
list_of_unique_dicts.append(dict_)
return list_of_unique_dicts
%timeit remove_dupes(large_list
1 loop, best of 3: 2min 55s per loop
My approach, using a set is a bit faster:
%timeit remove_dict_duplicates(large_list)
1 loop, best of 3: 590 ms per loop