I have seen some similar answers, but I can\'t find something specific for this case:
I have a list of dictionaries like this:
[
{\"element\":Bla, \
Apologies for terrible variable names. There is probably a cleaner way but this should work
seen = {(item["element"], item["version"]): False for item in mylist}
output = []
for item in mylist:
item_key = (item["element"], item["version"])
if not seen[item_key]:
output.append(item)
seen[item_key] = True
Pandas can solve this quickly:
import pandas as pd
Bla = "Bla"
d = [
{"element":Bla, "version":2, "date":"12/04/12"},
{"element":Bla, "version":2, "date":"12/05/12"},
{"element":Bla, "version":3, "date":"12/04/12"}
]
df = pd.DataFrame(d)
df[~df.drop("date", axis=1).duplicated()]
output:
date element version
0 12/04/12 Bla 2
2 12/04/12 Bla 3
You say you have a lot of other keys in the dictionary not mentioned in the question.
Here is O(n)
algorithm to do what you need:
>>> seen = set()
>>> result = []
>>> for d in dicts:
... h = d.copy()
... h.pop('date')
... h = tuple(h.items())
... if h not in seen:
... result.append(d)
... seen.add(h)
>>> pprint(result)
[{'date': '12/04/12', 'element': 'Bla', 'version': 2},
{'date': '12/04/12', 'element': 'Bla', 'version': 3}]
h
is a copy of the dict. date
key is removed from it with pop
.
Then tuple
is created as a hashable type which can be added to set
.
If h
has never been seen before, we append it to result
and add to seen
. Additions to seen
is O(1)
as well as lookups (h not in seen
).
At the end, result
contains only unique elements in terms of defined h
values.
You could use the "unique_everseen
" recipe from itertools to create a new list
.
list(unique_everseen(original_list, key=lambda e: '{element}@{version}'.format(**e)))
If your "key" needs to be wider than the lambda
I have written (to accomodate more values), then it's probably worth extracting to a function:
def key_without_date(element):
return '@'.join(["{}".format(v) for k,v in element.iteritems() if k != 'date'])
list(unique_everseen(original_list, key=key_without_date))
This works:
LoD=[
{"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/04/12"},
{"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/05/12"},
{"element":'Bla', "version":3, 'list':[1,2,3], "date":"12/04/12"}
]
LoDcopy=[]
seen=set()
for d in LoD:
dc=d.copy()
del dc['date']
s=dc.__str__()
if s in seen: continue
seen.add(s)
LoDcopy.append(d)
print LoDcopy
prints:
[{'date': '12/04/12', 'version': 2, 'list': [1, 2, 3], 'element': 'Bla'},
{'date': '12/04/12', 'version': 3, 'list': [1, 2, 3], 'element': 'Bla'}]