How do I find the duplicates in a list and create another list with them?

前端 未结 30 1569
梦谈多话
梦谈多话 2020-11-22 00:56

How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.

相关标签:
30条回答
  • 2020-11-22 01:00

    How about simply loop through each element in the list by checking the number of occurrences, then adding them to a set which will then print the duplicates. Hope this helps someone out there.

    myList  = [2 ,4 , 6, 8, 4, 6, 12];
    newList = set()
    
    for i in myList:
        if myList.count(i) >= 2:
            newList.add(i)
    
    print(list(newList))
    ## [4 , 6]
    
    0 讨论(0)
  • 2020-11-22 01:00

    Very simple and quick way of finding dupes with one iteration in Python is:

    testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']
    
    testListDict = {}
    
    for item in testList:
      try:
        testListDict[item] += 1
      except:
        testListDict[item] = 1
    
    print testListDict
    

    Output will be as follows:

    >>> print testListDict
    {'blue': 3, 'green': 1, 'red': 2}
    

    This and more in my blog http://www.howtoprogramwithpython.com

    0 讨论(0)
  • 2020-11-22 01:00

    Some other tests. Of course to do...

    set([x for x in l if l.count(x) > 1])
    

    ...is too costly. It's about 500 times faster (the more long array gives better results) to use the next final method:

    def dups_count_dict(l):
        d = {}
    
        for item in l:
            if item not in d:
                d[item] = 0
    
            d[item] += 1
    
        result_d = {key: val for key, val in d.iteritems() if val > 1}
    
        return result_d.keys()
    

    Only 2 loops, no very costly l.count() operations.

    Here is a code to compare the methods for example. The code is below, here is the output:

    dups_count: 13.368s # this is a function which uses l.count()
    dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
    dups_count_counter: 0.024s # collections.Counter
    

    The testing code:

    import numpy as np
    from time import time
    from collections import Counter
    
    class TimerCounter(object):
        def __init__(self):
            self._time_sum = 0
    
        def start(self):
            self.time = time()
    
        def stop(self):
            self._time_sum += time() - self.time
    
        def get_time_sum(self):
            return self._time_sum
    
    
    def dups_count(l):
        return set([x for x in l if l.count(x) > 1])
    
    
    def dups_count_dict(l):
        d = {}
    
        for item in l:
            if item not in d:
                d[item] = 0
    
            d[item] += 1
    
        result_d = {key: val for key, val in d.iteritems() if val > 1}
    
        return result_d.keys()
    
    
    def dups_counter(l):
        counter = Counter(l)    
    
        result_d = {key: val for key, val in counter.iteritems() if val > 1}
    
        return result_d.keys()
    
    
    
    def gen_array():
        np.random.seed(17)
        return list(np.random.randint(0, 5000, 10000))
    
    
    def assert_equal_results(*results):
        primary_result = results[0]
        other_results = results[1:]
    
        for other_result in other_results:
            assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)
    
    
    if __name__ == '__main__':
        dups_count_time = TimerCounter()
        dups_count_dict_time = TimerCounter()
        dups_count_counter = TimerCounter()
    
        l = gen_array()
    
        for i in range(3):
            dups_count_time.start()
            result1 = dups_count(l)
            dups_count_time.stop()
    
            dups_count_dict_time.start()
            result2 = dups_count_dict(l)
            dups_count_dict_time.stop()
    
            dups_count_counter.start()
            result3 = dups_counter(l)
            dups_count_counter.stop()
    
            assert_equal_results(result1, result2, result3)
    
        print 'dups_count: %.3f' % dups_count_time.get_time_sum()
        print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
        print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()
    
    0 讨论(0)
  • 2020-11-22 01:01

    this is the way I had to do it because I challenged myself not to use other methods:

    def dupList(oldlist):
        if type(oldlist)==type((2,2)):
            oldlist=[x for x in oldlist]
        newList=[]
        newList=newList+oldlist
        oldlist=oldlist
        forbidden=[]
        checkPoint=0
        for i in range(len(oldlist)):
            #print 'start i', i
            if i in forbidden:
                continue
            else:
                for j in range(len(oldlist)):
                    #print 'start j', j
                    if j in forbidden:
                        continue
                    else:
                        #print 'after Else'
                        if i!=j: 
                            #print 'i,j', i,j
                            #print oldlist
                            #print newList
                            if oldlist[j]==oldlist[i]:
                                #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]
                                forbidden.append(j)
                                #print 'forbidden', forbidden
                                del newList[j-checkPoint]
                                #print newList
                                checkPoint=checkPoint+1
        return newList
    

    so your sample works as:

    >>>a = [1,2,3,3,3,4,5,6,6,7]
    >>>dupList(a)
    [1, 2, 3, 4, 5, 6, 7]
    
    0 讨论(0)
  • 2020-11-22 01:02

    I would do this with pandas, because I use pandas a lot

    import pandas as pd
    a = [1,2,3,3,3,4,5,6,6,7]
    vc = pd.Series(a).value_counts()
    vc[vc > 1].index.tolist()
    

    Gives

    [3,6]
    

    Probably isn't very efficient, but it sure is less code than a lot of the other answers, so I thought I would contribute

    0 讨论(0)
  • 2020-11-22 01:03
    list2 = [1, 2, 3, 4, 1, 2, 3]
    lset = set()
    [(lset.add(item), list2.append(item))
     for item in list2 if item not in lset]
    print list(lset)
    
    0 讨论(0)
提交回复
热议问题