Unable to remove duplicate dicts in list using list comprehension or frozenset

问题

I would like to remove duplicate dicts in list.

Specifically, if two dict having the same content under the key paper_title, maintain one and remove the other duplicate.

For example, given the list below

test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
             {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]

It should return

return_value = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]

According to the tutorial, this can be achieved using list comprehension or frozenet. Such that

test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
             {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]


return_value= [i for n, i in enumerate(test_list) if i not in test_list[n + 1:]]

However,it return no duplicates

return_value = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
                 {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
                 {"paper_title": 'Unique One', 'Paper_year': 3}, \
                 {"paper_title": 'Unique two', 'Paper_year': 3}]

May I know, which part of the code, I should change?

Also, is there any more faster way to achieve similar result?

回答1:

It is because your sample dicts are strictly all different. If you change Paper_year to same, it works as expected:

test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 3}, \ # Change 2 to 3
             {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]

[i for n, i in enumerate(test_list) if i not in test_list[n + 1:]]
#[{'Paper_year': 3, 'paper_title': 'This is duplicate'},
# {'Paper_year': 3, 'paper_title': 'Unique One'},
# {'Paper_year': 3, 'paper_title': 'Unique two'}]

One way to achieve the expected output using itertools.groupby:

from itertools import groupby

f = lambda x: x["paper_title"]
[next(g) for k, g in groupby(sorted(test_list, key=f),key=f)]

Output:

[{'Paper_year': 2, 'paper_title': 'This is duplicate'},
 {'Paper_year': 3, 'paper_title': 'Unique One'},
 {'Paper_year': 3, 'paper_title': 'Unique two'}]

回答2:

In your answer you are comparing dicts duplicate, what you want to do is compare value of a key duplicate comparison

test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
             {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]
def check_presence(l,v): #list,value
    for i in l: 
        if i['paper_title']==v :return True 
    return False
return_value= [i for n, i in enumerate(test_list) if not check_presence(test_list[:n],test_list[n]['paper_title'])]
print(return_value)

回答3:

j = []
z = []
for i in test_list:
    for key,value in i.items():
       if key == "paper_title":
           if value not in z:
               j.append(i)          
               z.append(value)   
       else:
          j.append(i)

This simple code can be used

回答4:

So unlike the tutorial you are following, you are trying to find unique entries based upon a single key in a dictionary rather than unique entries across all the key values.

The condition you've added for constructing the list in the comprehension is: i not in test_list[n+1:]

Which basically is the same as checking to see if i is equal to any of the entries in the list from position n+1 to the end of the list.

Since {"paper_title": 'This is duplicate', 'Paper_year': 2} != {"paper_title": 'This is duplicate', 'Paper_year': 3}` you end up with both results in the list that you construct.

This is unlike the tutorial in which {'Akshat': 3} == {'Akshat': 3} so the second result is excluded.

Others have already responded with solutions that utilize the key, but I already typed this far so I hope this explanation adds a little more context to why it wasn't working.

回答5:

As per other answers - there are no pure duplicates. Simplest way to implement your requirement is use pandas IMHO

import pandas as pd
test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
             {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
             {"paper_title": 'Unique One', 'Paper_year': 3}, \
             {"paper_title": 'Unique two', 'Paper_year': 3}]
test_list = pd.DataFrame(test_list).groupby("paper_title").first().reset_index().to_dict(orient="records")
test_list

output

[{'paper_title': 'This is duplicate', 'Paper_year': 2},
 {'paper_title': 'Unique One', 'Paper_year': 3},
 {'paper_title': 'Unique two', 'Paper_year': 3}]

来源：https://stackoverflow.com/questions/62787181/unable-to-remove-duplicate-dicts-in-list-using-list-comprehension-or-frozenset

标签

python

duplicates

list-comprehension