Does anybody know if Python has an equivalent to Java\'s SortedSet interface?
Heres what I\'m looking for: lets say I have an object of type foo
, and I know
If you're looking for an implementation of an efficient container type for Python implemented using something like a balanced search tree (A Red-Black tree for example) then it's not part of the standard library.
I was able to find this, though:
http://www.brpreiss.com/books/opus7/
The source code is available here:
http://www.brpreiss.com/books/opus7/public/Opus7-1.0.tar.gz
I don't know how the source code is licensed, and I haven't used it myself, but it would be a good place to start looking if you're not interested in rolling your own container classes.
There's PyAVL which is a C module implementing an AVL tree.
Also, this thread might be useful to you. It contains a lot of suggestions on how to use the bisect module to enhance the existing Python dictionary to do what you're asking.
Of course, using insort() that way would be pretty expensive for insertion and deletion, so consider it carefully for your application. Implementing an appropriate data structure would probably be a better approach.
In any case, to understand whether you should keep the data structure sorted or sort it when you iterate over it you'll have to know whether you intend to insert a lot or iterate a lot. Keeping the data structure sorted makes sense if you modify its content relatively infrequently but iterate over it a lot. Conversely, if you insert and delete members all the time but iterate over the collection relatively infrequently, sorting the collection of keys before iterating will be faster. There is no one correct approach.
Do you have the possibility of using Jython? I just mention it because using TreeMap, TreeSet, etc. is trivial. Also if you're coming from a Java background and you want to head in a Pythonic direction Jython is wonderful for making the transition easier. Though I recognise that use of TreeSet in this case would not be part of such a "transition".
For Jython superusers I have a question myself: the blist package can't be imported because it uses a C file which must be imported. But would there be any advantage of using blist instead of TreeSet? Can we generally assume the JVM uses algorithms which are essentially as good as those of CPython stuff?
You can use insort
from the bisect module to insert new elements efficiently in an already sorted list:
from bisect import insort
items = [1,5,7,9]
insort(items, 3)
insort(items, 10)
print items # -> [1, 3, 5, 7, 9, 10]
Note that this does not directly correspond to SortedSet
, because it uses a list. If you insert the same item more than once you will have duplicates in the list.
Take a look at BTrees. It look like you need one of them. As far as I understood you need structure that will support relatively cheap insertion of element into storage structure and cheap sorting operation (or even lack of it). BTrees offers that.
I've experience with ZODB.BTrees, and they scale to thousands and millions of elements.
Similar to blist.sortedlist, the sortedcontainers module provides a sorted list, sorted set, and sorted dict data type. It uses a modified B-tree in the underlying implementation and is faster than blist in most cases.
The sortedcontainers module is pure-Python so installation is easy:
pip install sortedcontainers
Then for example:
from sortedcontainers import SortedList, SortedDict, SortedSet
help(SortedList)
The sortedcontainers module has 100% coverage testing and hours of stress. There's a pretty comprehensive performance comparison that lists most of the options you'd consider for this.
If you only need the keys, and no associated value, Python offers sets:
s = set(a_list)
for k in sorted(s):
print k
However, you'll be sorting the set each time you do this. If that is too much overhead you may want to look at HeapQueues. They may not be as elegant and "Pythonic" but maybe they suit your needs.