Removing duplicates from a list of lists

前端 未结 12 1274
萌比男神i
萌比男神i 2020-11-22 10:37

I have a list of lists in Python:

k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]

And I want to remove duplicate elements from it. Was if it

相关标签:
12条回答
  • 2020-11-22 10:53

    Create a dictionary with tuple as the key, and print the keys.

    • create dictionary with tuple as key and index as value
    • print list of keys of dictionary

    k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
    
    dict_tuple = {tuple(item): index for index, item in enumerate(k)}
    
    print [list(itm) for itm in dict_tuple.keys()]
    
    # prints [[1, 2], [5, 6, 2], [3], [4]]
    
    0 讨论(0)
  • 2020-11-22 10:53

    Strangely, the answers above removes the 'duplicates' but what if I want to remove the duplicated value also?? The following should be useful and does not create a new object in memory!

    def dictRemoveDuplicates(self):
        a=[[1,'somevalue1'],[1,'somevalue2'],[2,'somevalue1'],[3,'somevalue4'],[5,'somevalue5'],[5,'somevalue1'],[5,'somevalue1'],[5,'somevalue8'],[6,'somevalue9'],[6,'somevalue0'],[6,'somevalue1'],[7,'somevalue7']]
    
    
    print(a)
    temp = 0
    position = -1
    for pageNo, item in a:
        position+=1
        if pageNo != temp:
            temp = pageNo
            continue
        else:
            a[position] = 0
            a[position - 1] = 0
    a = [x for x in a if x != 0]         
    print(a)
    

    and the o/p is:

    [[1, 'somevalue1'], [1, 'somevalue2'], [2, 'somevalue1'], [3, 'somevalue4'], [5, 'somevalue5'], [5, 'somevalue1'], [5, 'somevalue1'], [5, 'somevalue8'], [6, 'somevalue9'], [6, 'somevalue0'], [6, 'somevalue1'], [7, 'somevalue7']]
    [[2, 'somevalue1'], [3, 'somevalue4'], [7, 'somevalue7']]
    
    0 讨论(0)
  • 2020-11-22 10:53

    A bit of a background, I just started with python and learnt comprehensions.

    k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
    dedup = [elem.split('.') for elem in set(['.'.join(str(int_elem) for int_elem in _list) for _list in k])]
    
    0 讨论(0)
  • 2020-11-22 10:55
    >>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
    >>> k = sorted(k)
    >>> k
    [[1, 2], [1, 2], [3], [4], [4], [5, 6, 2]]
    >>> dedup = [k[i] for i in range(len(k)) if i == 0 or k[i] != k[i-1]]
    >>> dedup
    [[1, 2], [3], [4], [5, 6, 2]]
    

    I don't know if it's necessarily faster, but you don't have to use to tuples and sets.

    0 讨论(0)
  • 2020-11-22 10:57

    Even your "long" list is pretty short. Also, did you choose them to match the actual data? Performance will vary with what these data actually look like. For example, you have a short list repeated over and over to make a longer list. This means that the quadratic solution is linear in your benchmarks, but not in reality.

    For actually-large lists, the set code is your best bet—it's linear (although space-hungry). The sort and groupby methods are O(n log n) and the loop in method is obviously quadratic, so you know how these will scale as n gets really big. If this is the real size of the data you are analyzing, then who cares? It's tiny.

    Incidentally, I'm seeing a noticeable speedup if I don't form an intermediate list to make the set, that is to say if I replace

    kt = [tuple(i) for i in k]
    skt = set(kt)
    

    with

    skt = set(tuple(i) for i in k)
    

    The real solution may depend on more information: Are you sure that a list of lists is really the representation you need?

    0 讨论(0)
  • 2020-11-22 11:03

    List of tuple and {} can be used to remove duplicates

    >>> [list(tupl) for tupl in {tuple(item) for item in k }]
    [[1, 2], [5, 6, 2], [3], [4]]
    >>> 
    
    0 讨论(0)
提交回复
热议问题