I am trying to remove the duplicates from following list
distinct_cur = [{\'rtc\': 0, \'vf\': 0, \'mtc\': 0, \'doc\': \'good job\', \'foc\': 195, \'st\': 0
You're creating a set
out of different elements and expect that it will remove the duplicates based on a criterion that only you know.
You have to iterate through your list, and add to the result list only if doc
has a different value than the previous ones:
for instance like this:
done = set()
result = []
for d in distinct_cur:
if d['doc'] not in done:
done.add(d['doc']) # note it down for further iterations
result.append(d)
that will keep only the first occurrence(s) of the dictionaries which have the same doc
key by registering the known keys in an aux set.
Another possibility is to use a dictionary with the key as the "doc"
key of the dictionary, iterating backwards in the list so the first items overwrite the last ones in the list:
result = {i['doc']:i for i in reversed(distinct_cur)}.values()
I see 2 similar solutions that depend on your domain problem: do you want to keep the first instance of a key or the last instance?
Using the last (so as to overwrite the previous matches) is simpler:
d = {r['doc']: r for r in distinct_cur}.values()
Try this:
distinct_cur =[dict(t) for t in set([tuple(d.items()) for d in distinct_cur])]
Worked for me...
One liner to deduplicate the list of dictionaries distinct_cur
on the primary_key of doc
[i for n, i in enumerate(distinct_cur) if i.get('doc') not in [y.get('doc') for y in distinct_cur[n + 1:]]]