How do I find the duplicates in a list and create another list with them?

前端 未结 30 1571
梦谈多话
梦谈多话 2020-11-22 00:56

How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.

相关标签:
30条回答
  • 2020-11-22 01:18

    Method 1:

    list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))
    

    Explanation: [val for idx, val in enumerate(input_list) if val in input_list[idx+1:]] is a list comprehension, that returns an element, if the same element is present from it's current position, in list, the index.

    Example: input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,7,42]

    starting with the first element in list, 42, with index 0, it checks if the element 42, is present in input_list[1:] (i.e., from index 1 till end of list) Because 42 is present in input_list[1:], it will return 42.

    Then it goes to the next element 31, with index 1, and checks if element 31 is present in the input_list[2:] (i.e., from index 2 till end of list), Because 31 is present in input_list[2:], it will return 31.

    similarly it goes through all the elements in the list, and will return only the repeated/duplicate elements into a list.

    Then because we have duplicates, in a list, we need to pick one of each duplicate, i.e. remove duplicate among duplicates, and to do so, we do call a python built-in named set(), and it removes the duplicates,

    Then we are left with a set, but not a list, and hence to convert from a set to list, we use, typecasting, list(), and that converts the set of elements to a list.

    Method 2:

    def dupes(ilist):
        temp_list = [] # initially, empty temporary list
        dupe_list = [] # initially, empty duplicate list
        for each in ilist:
            if each in temp_list: # Found a Duplicate element
                if not each in dupe_list: # Avoid duplicate elements in dupe_list
                    dupe_list.append(each) # Add duplicate element to dupe_list
            else: 
                temp_list.append(each) # Add a new (non-duplicate) to temp_list
    
        return dupe_list
    

    Explanation: Here We create two empty lists, to start with. Then keep traversing through all the elements of the list, to see if it exists in temp_list (initially empty). If it is not there in the temp_list, then we add it to the temp_list, using append method.

    If it already exists in temp_list, it means, that the current element of the list is a duplicate, and hence we need to add it to dupe_list using append method.

    0 讨论(0)
  • 2020-11-22 01:18
    raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]
    
    clean_list = list(set(raw_list))
    duplicated_items = []
    
    for item in raw_list:
        try:
            clean_list.remove(item)
        except ValueError:
            duplicated_items.append(item)
    
    
    print(duplicated_items)
    # [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]
    

    You basically remove duplicates by converting to set (clean_list), then iterate the raw_list, while removing each item in the clean list for occurrence in raw_list. If item is not found, the raised ValueError Exception is caught and the item is added to duplicated_items list.

    If the index of duplicated items is needed, just enumerate the list and play around with the index. (for index, item in enumerate(raw_list):) which is faster and optimised for large lists (like thousands+ of elements)

    0 讨论(0)
  • 2020-11-22 01:20

    collections.Counter is new in python 2.7:

    
    Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
    [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
    a = [1,2,3,2,1,5,6,5,5,5]
    import collections
    print [x for x, y in collections.Counter(a).items() if y > 1]
    Type "help", "copyright", "credits" or "license" for more information.
      File "", line 1, in 
    AttributeError: 'module' object has no attribute 'Counter'
    >>> 
    

    In an earlier version you can use a conventional dict instead:

    a = [1,2,3,2,1,5,6,5,5,5]
    d = {}
    for elem in a:
        if elem in d:
            d[elem] += 1
        else:
            d[elem] = 1
    
    print [x for x, y in d.items() if y > 1]
    
    0 讨论(0)
  • 2020-11-22 01:20

    Python 3.8 one-liner if you don't care to write your own algorithm or use libraries:

    l = [1,2,3,2,1,5,6,5,5,5]
    
    res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]
    
    print(res)
    

    Prints item and count:

    [(1, 2), (2, 2), (5, 4)]
    

    groupby takes a grouping function so you can define your groupings in different ways and return additional Tuple fields as needed.

    groupby is lazy so it shouldn't be too slow.

    0 讨论(0)
  • 2020-11-22 01:20

    There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:

    def get_duplicates(sorted_list):
        duplicates = []
        last = sorted_list[0]
        for x in sorted_list[1:]:
            if x == last:
                duplicates.append(x)
            last = x
        return set(duplicates)
    

    Notes:

    • If you wish to preserve duplication count, get rid of the cast to 'set' at the bottom to get the full list
    • If you prefer to use generators, replace duplicates.append(x) with yield x and the return statement at the bottom (you can cast to set later)
    0 讨论(0)
  • 2020-11-22 01:22

    A bit late, but maybe helpful for some. For a largish list, I found this worked for me.

    l=[1,2,3,5,4,1,3,1]
    s=set(l)
    d=[]
    for x in l:
        if x in s:
            s.remove(x)
        else:
            d.append(x)
    d
    [1,3,1]
    

    Shows just and all duplicates and preserves order.

    0 讨论(0)
提交回复
热议问题