Does Python have an ordered set?

前端 未结 14 1261
予麋鹿 2020-11-21 13:20

Python has an ordered dictionary. What about an ordered set?

  • 2020-11-21 13:45

    I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).

    Simply pip install boltons (or copy into your codebase), import the IndexedSet and:

    >>> from boltons.setutils import IndexedSet
    >>> x = IndexedSet(list(range(4)) + list(range(8)))
    >>> x
    IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
    >>> x - set(range(2))
    IndexedSet([2, 3, 4, 5, 6, 7])
    >>> x[-1]
    >>> fcr = IndexedSet('')
    >>> ''.join(fcr[:fcr.index('.')])

    Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet, but that also means you can bug me if there are any issues. :)

    0 讨论(0)
  • 2020-11-21 13:52

    Implementations on PyPI

    While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.

    There are the packages:

    • ordered-set (Python based)
    • orderedset (Cython based)
    • collections-extended
    • boltons (under iterutils.IndexedSet, Python-based)
    • oset (last updated in 2012)

    Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.

    Some differences

    • ordered-set (version 1.1)
    • advantage: O(1) for lookups by index (e.g. my_set[5])
    • oset (version 0.1.3)
    • advantage: O(1) for remove(item)
    • disadvantage: apparently O(n) for lookups by index

    Both implementations have O(1) for add(item) and __contains__(item) (item in my_set).

    0 讨论(0)
  • 2020-11-21 13:52

    In case you're already using pandas in your code, its Index object behaves pretty like an ordered set, as shown in this article.

    Examples from the article:

    indA = pd.Index([1, 3, 5, 7, 9])
    indB = pd.Index([2, 3, 5, 7, 11])
    indA & indB  # intersection
    indA | indB  # union
    indA - indB  # difference
    indA ^ indB  # symmetric difference
    0 讨论(0)
  • 2020-11-21 13:52

    The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.

    0 讨论(0)
  • 2020-11-21 13:54

    There's no OrderedSet in official library. I make an exhaustive cheatsheet of all the data structure for your reference.

    DataStructure = {
        'Collections': {
            'Map': [
                ('dict', 'OrderDict', 'defaultdict'),
                ('chainmap', 'types.MappingProxyType')
            'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
        'Sequence': {
            'Basic': ['list', 'tuple', 'iterator']
        'Algorithm': {
            'Priority': ['heapq', 'queue.PriorityQueue'],
            'Queue': ['queue.Queue', 'multiprocessing.Queue'],
            'Stack': ['collection.deque', 'queue.LifeQueue']
        'text_sequence': ['str', 'byte', 'bytearray']
    0 讨论(0)
  • 2020-11-21 13:55

    As other answers mention, as for python 3.7+, the dict is ordered by definition. Instead of subclassing OrderedDict we can subclass abc.collections.MutableSet or typing.MutableSet using the dict's keys to store our values.

    class OrderedSet(typing.MutableSet[T]):
        """A set that preserves insertion order by internally using a dict."""
        def __init__(self, iterable: t.Iterator[T]):
            self._d = dict.fromkeys(iterable)
        def add(self, x: T) -> None:
            self._d[x] = None
        def discard(self, x: T) -> None:
        def __contains__(self, x: object) -> bool:
            return self._d.__contains__(x)
        def __len__(self) -> int:
            return self._d.__len__()
        def __iter__(self) -> t.Iterator[T]:
            return self._d.__iter__()

    Then just:

    x = OrderedSet([1, 2, -1, "bar"])
    assert list(x) == [1, 2, -1, "bar", 0]

    I put this code in a small library, so anyone can just pip install it.

    0 讨论(0)