Getting first n unique elements from Python list

后端 未结 12 1120
無奈伤痛
無奈伤痛 2021-02-04 23:59

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from

12条回答
  •  悲哀的现实
    2021-02-05 00:37

    I would use a set to remember what was seen and return from the generator when you have seen enough:

    a = [1, 2, 2, 3, 3, 4, 5, 6]
        
    def get_unique_N(iterable, N):
        """Yields (in order) the first N unique elements of iterable. 
        Might yield less if data too short."""
        seen = set()
        for e in iterable:
            if e in seen:
                continue
            seen.add(e)
            yield e
            if len(seen) == N:
                return
                
    k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
    print(list(k))
        
    

    Output:

    [1, 2, 3, 4]
    

    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration


    Your solution using elif element not in itr[:index] and count uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

    Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

    For 6 uniques (in longer list):

    • you would have lookups of O(1)+O(2)+...+O(5001)
    • mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})

提交回复
热议问题