How do I find the duplicates in a list and create another list with them?

前端未结

关注

 30  1571

How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.

相关标签:

30条回答

北海茫月

2020-11-22 01:18
Method 1:
```
list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))
```
Explanation: [val for idx, val in enumerate(input_list) if val in input_list[idx+1:]] is a list comprehension, that returns an element, if the same element is present from it's current position, in list, the index.

Example: input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,7,42]

starting with the first element in list, 42, with index 0, it checks if the element 42, is present in input_list[1:] (i.e., from index 1 till end of list) Because 42 is present in input_list[1:], it will return 42.

Then it goes to the next element 31, with index 1, and checks if element 31 is present in the input_list[2:] (i.e., from index 2 till end of list), Because 31 is present in input_list[2:], it will return 31.

similarly it goes through all the elements in the list, and will return only the repeated/duplicate elements into a list.

Then because we have duplicates, in a list, we need to pick one of each duplicate, i.e. remove duplicate among duplicates, and to do so, we do call a python built-in named set(), and it removes the duplicates,

Then we are left with a set, but not a list, and hence to convert from a set to list, we use, typecasting, list(), and that converts the set of elements to a list.

Method 2:
```
def dupes(ilist):
    temp_list = [] # initially, empty temporary list
    dupe_list = [] # initially, empty duplicate list
    for each in ilist:
        if each in temp_list: # Found a Duplicate element
            if not each in dupe_list: # Avoid duplicate elements in dupe_list
                dupe_list.append(each) # Add duplicate element to dupe_list
        else: 
            temp_list.append(each) # Add a new (non-duplicate) to temp_list

    return dupe_list
```
Explanation: Here We create two empty lists, to start with. Then keep traversing through all the elements of the list, to see if it exists in temp_list (initially empty). If it is not there in the temp_list, then we add it to the temp_list, using append method.

If it already exists in temp_list, it means, that the current element of the list is a duplicate, and hence we need to add it to dupe_list using append method.
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-11-22 01:18
```
raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]

clean_list = list(set(raw_list))
duplicated_items = []

for item in raw_list:
    try:
        clean_list.remove(item)
    except ValueError:
        duplicated_items.append(item)


print(duplicated_items)
# [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]
```
You basically remove duplicates by converting to set (clean_list), then iterate the raw_list, while removing each item in the clean list for occurrence in raw_list. If item is not found, the raised ValueError Exception is caught and the item is added to duplicated_items list.

If the index of duplicated items is needed, just enumerate the list and play around with the index. (for index, item in enumerate(raw_list):) which is faster and optimised for large lists (like thousands+ of elements)
0 讨论(0)
发布评论:

提交评论
- 加载中...

别那么骄傲

2020-11-22 01:20

collections.Counter is new in python 2.7:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>>

In an earlier version you can use a conventional dict instead:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]

0 讨论(0)

孤城傲影

2020-11-22 01:20
Python 3.8 one-liner if you don't care to write your own algorithm or use libraries:
```
l = [1,2,3,2,1,5,6,5,5,5]

res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]

print(res)
```
Prints item and count:
```
[(1, 2), (2, 2), (5, 4)]
```
groupby takes a grouping function so you can define your groupings in different ways and return additional Tuple fields as needed.

groupby is lazy so it shouldn't be too slow.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-11-22 01:20
There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:
```
def get_duplicates(sorted_list):
    duplicates = []
    last = sorted_list[0]
    for x in sorted_list[1:]:
        if x == last:
            duplicates.append(x)
        last = x
    return set(duplicates)
```
Notes:
- If you wish to preserve duplication count, get rid of the cast to 'set' at the bottom to get the full list
- If you prefer to use generators, replace duplicates.append(x) with yield x and the return statement at the bottom (you can cast to set later)
0 讨论(0)
发布评论:

提交评论
- 加载中...
夕颜

2020-11-22 01:22
A bit late, but maybe helpful for some. For a largish list, I found this worked for me.
```
l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
    if x in s:
        s.remove(x)
    else:
        d.append(x)
d
[1,3,1]
```
Shows just and all duplicates and preserves order.
0 讨论(0)
发布评论:

提交评论
- 加载中...