Python has an ordered dictionary. What about an ordered set?
I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).
Simply pip install boltons
(or copy setutils.py
into your codebase), import the IndexedSet
and:
>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'
Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet
, but that also means you can bug me if there are any issues. :)
While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.
There are the packages:
Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.
my_set[5]
)remove(item)
Both implementations have O(1) for add(item)
and __contains__(item)
(item in my_set
).
In case you're already using pandas in your code, its Index
object behaves pretty like an ordered set, as shown in this article.
Examples from the article:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB # intersection
indA | indB # union
indA - indB # difference
indA ^ indB # symmetric difference
The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.
There's no OrderedSet
in official library.
I make an exhaustive cheatsheet of all the data structure for your reference.
DataStructure = {
'Collections': {
'Map': [
('dict', 'OrderDict', 'defaultdict'),
('chainmap', 'types.MappingProxyType')
],
'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
},
'Sequence': {
'Basic': ['list', 'tuple', 'iterator']
},
'Algorithm': {
'Priority': ['heapq', 'queue.PriorityQueue'],
'Queue': ['queue.Queue', 'multiprocessing.Queue'],
'Stack': ['collection.deque', 'queue.LifeQueue']
},
'text_sequence': ['str', 'byte', 'bytearray']
}
As other answers mention, as for python 3.7+, the dict is ordered by definition. Instead of subclassing OrderedDict
we can subclass abc.collections.MutableSet
or typing.MutableSet
using the dict's keys to store our values.
class OrderedSet(typing.MutableSet[T]):
"""A set that preserves insertion order by internally using a dict."""
def __init__(self, iterable: t.Iterator[T]):
self._d = dict.fromkeys(iterable)
def add(self, x: T) -> None:
self._d[x] = None
def discard(self, x: T) -> None:
self._d.pop(x)
def __contains__(self, x: object) -> bool:
return self._d.__contains__(x)
def __len__(self) -> int:
return self._d.__len__()
def __iter__(self) -> t.Iterator[T]:
return self._d.__iter__()
Then just:
x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]
I put this code in a small library, so anyone can just pip install
it.