问题
I would like to remove duplicate dicts in list.
Specifically, if two dict having the same content under the key paper_title, maintain one and remove the other duplicate.
For example, given the list below
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
It should return
return_value = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
According to the tutorial, this can be achieved using list comprehension or frozenet. Such that
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
return_value= [i for n, i in enumerate(test_list) if i not in test_list[n + 1:]]
However,it return no duplicates
return_value = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
May I know, which part of the code, I should change?
Also, is there any more faster way to achieve similar result?
回答1:
It is because your sample dict
s are strictly all different. If you change Paper_year
to same, it works as expected:
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 3}, \ # Change 2 to 3
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
[i for n, i in enumerate(test_list) if i not in test_list[n + 1:]]
#[{'Paper_year': 3, 'paper_title': 'This is duplicate'},
# {'Paper_year': 3, 'paper_title': 'Unique One'},
# {'Paper_year': 3, 'paper_title': 'Unique two'}]
One way to achieve the expected output using itertools.groupby
:
from itertools import groupby
f = lambda x: x["paper_title"]
[next(g) for k, g in groupby(sorted(test_list, key=f),key=f)]
Output:
[{'Paper_year': 2, 'paper_title': 'This is duplicate'},
{'Paper_year': 3, 'paper_title': 'Unique One'},
{'Paper_year': 3, 'paper_title': 'Unique two'}]
回答2:
In your answer you are comparing dicts duplicate, what you want to do is compare value of a key duplicate comparison
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
def check_presence(l,v): #list,value
for i in l:
if i['paper_title']==v :return True
return False
return_value= [i for n, i in enumerate(test_list) if not check_presence(test_list[:n],test_list[n]['paper_title'])]
print(return_value)
回答3:
j = []
z = []
for i in test_list:
for key,value in i.items():
if key == "paper_title":
if value not in z:
j.append(i)
z.append(value)
else:
j.append(i)
This simple code can be used
回答4:
So unlike the tutorial you are following, you are trying to find unique entries based upon a single key in a dictionary rather than unique entries across all the key values.
The condition you've added for constructing the list in the comprehension is:
i not in test_list[n+1:]
Which basically is the same as checking to see if i
is equal to any of the entries in the list from position n+1
to the end of the list.
Since {"paper_title": 'This is duplicate', 'Paper_year': 2} !=
{"paper_title": 'This is duplicate', 'Paper_year': 3}` you end up with both results in the list that you construct.
This is unlike the tutorial in which {'Akshat': 3} == {'Akshat': 3}
so the second result is excluded.
Others have already responded with solutions that utilize the key, but I already typed this far so I hope this explanation adds a little more context to why it wasn't working.
回答5:
As per other answers - there are no pure duplicates. Simplest way to implement your requirement is use pandas IMHO
import pandas as pd
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
{"paper_title": 'This is duplicate', 'Paper_year': 3}, \
{"paper_title": 'Unique One', 'Paper_year': 3}, \
{"paper_title": 'Unique two', 'Paper_year': 3}]
test_list = pd.DataFrame(test_list).groupby("paper_title").first().reset_index().to_dict(orient="records")
test_list
output
[{'paper_title': 'This is duplicate', 'Paper_year': 2},
{'paper_title': 'Unique One', 'Paper_year': 3},
{'paper_title': 'Unique two', 'Paper_year': 3}]
来源:https://stackoverflow.com/questions/62787181/unable-to-remove-duplicate-dicts-in-list-using-list-comprehension-or-frozenset