Getting first n unique elements from Python list

后端未结

关注

 12  1090

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from

相关标签:

12条回答

余生分开走

2021-02-05 00:34
If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like
```
from collections import OrderedDict


def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()
```
and then implementation of iterate can be simplified to
```
from itertools import islice


def iterate(itr, upper=5):
    return islice(nub(itr), upper)
```
or if you want always a list as an output
```
def iterate(itr, upper=5):
    return list(nub(itr))[:upper]
```
Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:
```
def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
        if element in seen:
            continue
        yield element
        add_seen(element)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2021-02-05 00:34
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:
```
from itertools import groupby, islice

def first_unique(data, upper):
    return islice((key for (key, _) in groupby(data)), 0, upper)

a = [1, 2, 2, 3, 3, 4, 5, 6]

print(list(first_unique(a, 5)))
```
Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-02-05 00:34
Given
```
import itertools as it


a = [1, 2, 2, 3, 3, 4, 5, 6]
```
Code

A simple list comprehension (similar to @cdlane's answer).
```
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
```
Alternatively, in Python 3.6+:
```
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-02-05 00:37
I would use a set to remember what was seen and return from the generator when you have seen enough:
```
a = [1, 2, 2, 3, 3, 4, 5, 6]
    
def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable. 
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
        if e in seen:
            continue
        seen.add(e)
        yield e
        if len(seen) == N:
            return
            
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
    
```
Output:
```
[1, 2, 3, 4]
```
According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

For 6 uniques (in longer list):
- you would have lookups of O(1)+O(2)+...+O(5001)
- mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2021-02-05 00:39
Example list:
```
a = [1, 2, 2, 3, 3, 4, 5, 6]
```
Function returns all or count of unique items needed from list

1st argument - list to work with, 2nd argument (optional) - count of unique items (by default - None - it means that all unique elements will be returned)
```
def unique_elements(lst, number_of_elements=None):
    return list(dict.fromkeys(lst))[:number_of_elements]
```
Here is example how it works. List name is "a", and we need to get 2 unique elements:
```
print(unique_elements(a, 2))
```
Output:
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-02-05 00:41
You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.
```
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

Getting first n unique elements from Python list

Improvements

Example list:

Function returns all or count of unique items needed from list

Output: