Getting first n unique elements from Python list

后端 未结 12 1090
無奈伤痛
無奈伤痛 2021-02-04 23:59

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from

相关标签:
12条回答
  • 2021-02-05 00:34

    If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like

    from collections import OrderedDict
    
    
    def nub(iterable):
        """Returns unique elements preserving order."""
        return OrderedDict.fromkeys(iterable).keys()
    

    and then implementation of iterate can be simplified to

    from itertools import islice
    
    
    def iterate(itr, upper=5):
        return islice(nub(itr), upper)
    

    or if you want always a list as an output

    def iterate(itr, upper=5):
        return list(nub(itr))[:upper]
    

    Improvements

    As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

    def nub(iterable):
        seen = set()
        add_seen = seen.add
        for element in iterable:
            if element in seen:
                continue
            yield element
            add_seen(element)
    
    0 讨论(0)
  • 2021-02-05 00:34

    Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

    from itertools import groupby, islice
    
    def first_unique(data, upper):
        return islice((key for (key, _) in groupby(data)), 0, upper)
    
    a = [1, 2, 2, 3, 3, 4, 5, 6]
    
    print(list(first_unique(a, 5)))
    

    Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

    0 讨论(0)
  • 2021-02-05 00:34

    Given

    import itertools as it
    
    
    a = [1, 2, 2, 3, 3, 4, 5, 6]
    

    Code

    A simple list comprehension (similar to @cdlane's answer).

    [k for k, _ in it.groupby(a)][:5]
    # [1, 2, 3, 4, 5]
    

    Alternatively, in Python 3.6+:

    list(dict.fromkeys(a))[:5]
    # [1, 2, 3, 4, 5]
    
    0 讨论(0)
  • 2021-02-05 00:37

    I would use a set to remember what was seen and return from the generator when you have seen enough:

    a = [1, 2, 2, 3, 3, 4, 5, 6]
        
    def get_unique_N(iterable, N):
        """Yields (in order) the first N unique elements of iterable. 
        Might yield less if data too short."""
        seen = set()
        for e in iterable:
            if e in seen:
                continue
            seen.add(e)
            yield e
            if len(seen) == N:
                return
                
    k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
    print(list(k))
        
    

    Output:

    [1, 2, 3, 4]
    

    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration


    Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

    Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

    For 6 uniques (in longer list):

    • you would have lookups of O(1)+O(2)+...+O(5001)
    • mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})
    0 讨论(0)
  • 2021-02-05 00:39

    Example list:

    a = [1, 2, 2, 3, 3, 4, 5, 6]
    

    Function returns all or count of unique items needed from list

    1st argument - list to work with, 2nd argument (optional) - count of unique items (by default - None - it means that all unique elements will be returned)

    def unique_elements(lst, number_of_elements=None):
        return list(dict.fromkeys(lst))[:number_of_elements]
    

    Here is example how it works. List name is "a", and we need to get 2 unique elements:

    print(unique_elements(a, 2))
    

    Output:

    0 讨论(0)
  • 2021-02-05 00:41

    You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

    N = 3
    a = [1, 2, 2, 3, 3, 3, 4]
    d = {x: True for x in a}
    list(d.keys())[:N]
    
    0 讨论(0)
提交回复
热议问题