I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1, 2, 2, 3, 3, 4, 5, 6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5]
vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]
:
For 6 uniques (in longer list):
O(1)+O(2)+...+O(5001)
5001*O(1)
lookup + memory for set( {1, 2, 3, 4, 5, 6})
a = [1, 2, 2, 3, 3, 4, 5, 6]
1st argument - list to work with, 2nd argument (optional) - count of unique items (by default - None - it means that all unique elements will be returned)
def unique_elements(lst, number_of_elements=None):
return list(dict.fromkeys(lst))[:number_of_elements]
Here is example how it works. List name is "a", and we need to get 2 unique elements:
print(unique_elements(a, 2))
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]