How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.
Method 1:
list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))
Explanation: [val for idx, val in enumerate(input_list) if val in input_list[idx+1:]] is a list comprehension, that returns an element, if the same element is present from it's current position, in list, the index.
Example: input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,7,42]
starting with the first element in list, 42, with index 0, it checks if the element 42, is present in input_list[1:] (i.e., from index 1 till end of list) Because 42 is present in input_list[1:], it will return 42.
Then it goes to the next element 31, with index 1, and checks if element 31 is present in the input_list[2:] (i.e., from index 2 till end of list), Because 31 is present in input_list[2:], it will return 31.
similarly it goes through all the elements in the list, and will return only the repeated/duplicate elements into a list.
Then because we have duplicates, in a list, we need to pick one of each duplicate, i.e. remove duplicate among duplicates, and to do so, we do call a python built-in named set(), and it removes the duplicates,
Then we are left with a set, but not a list, and hence to convert from a set to list, we use, typecasting, list(), and that converts the set of elements to a list.
Method 2:
def dupes(ilist):
temp_list = [] # initially, empty temporary list
dupe_list = [] # initially, empty duplicate list
for each in ilist:
if each in temp_list: # Found a Duplicate element
if not each in dupe_list: # Avoid duplicate elements in dupe_list
dupe_list.append(each) # Add duplicate element to dupe_list
else:
temp_list.append(each) # Add a new (non-duplicate) to temp_list
return dupe_list
Explanation: Here We create two empty lists, to start with. Then keep traversing through all the elements of the list, to see if it exists in temp_list (initially empty). If it is not there in the temp_list, then we add it to the temp_list, using append method.
If it already exists in temp_list, it means, that the current element of the list is a duplicate, and hence we need to add it to dupe_list using append method.
raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]
clean_list = list(set(raw_list))
duplicated_items = []
for item in raw_list:
try:
clean_list.remove(item)
except ValueError:
duplicated_items.append(item)
print(duplicated_items)
# [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]
You basically remove duplicates by converting to set (clean_list
), then iterate the raw_list
, while removing each item
in the clean list for occurrence in raw_list
. If item
is not found, the raised ValueError
Exception is caught and the item
is added to duplicated_items
list.
If the index of duplicated items is needed, just enumerate
the list and play around with the index. (for index, item in enumerate(raw_list):
) which is faster and optimised for large lists (like thousands+ of elements)
collections.Counter is new in python 2.7:
Python 2.5.4 (r254:67916, May 31 2010, 15:03:39)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
File "", line 1, in
AttributeError: 'module' object has no attribute 'Counter'
>>>
In an earlier version you can use a conventional dict instead:
a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
if elem in d:
d[elem] += 1
else:
d[elem] = 1
print [x for x, y in d.items() if y > 1]
Python 3.8 one-liner if you don't care to write your own algorithm or use libraries:
l = [1,2,3,2,1,5,6,5,5,5]
res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]
print(res)
Prints item and count:
[(1, 2), (2, 2), (5, 4)]
groupby
takes a grouping function so you can define your groupings in different ways and return additional Tuple
fields as needed.
groupby
is lazy so it shouldn't be too slow.
There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:
def get_duplicates(sorted_list):
duplicates = []
last = sorted_list[0]
for x in sorted_list[1:]:
if x == last:
duplicates.append(x)
last = x
return set(duplicates)
Notes:
A bit late, but maybe helpful for some. For a largish list, I found this worked for me.
l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
if x in s:
s.remove(x)
else:
d.append(x)
d
[1,3,1]
Shows just and all duplicates and preserves order.