On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:
Should the stdlib have e.g. SortedList? Probably not, because the use cases
of such data types are too niche to a one-size-fits-all implementation, and
there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
because those are somewhat fundamental data structures and implementing,
testing, and verifying them is very much non-trivial. While niche, having
them at your immediate disposal is very powerful.
By that reasoning, we shouldn't have dicts, for exactly the same reason:
for anyone wanting an associative array, there are so many implementation
variants to choose from:
- hash table with linear probing
- hash table with chaining
- AVL tree
- red-black tree
- judy array
- splay tree
- treap
- scapegoat tree
and many more, and most of them can be tuned.
If Python didn't already have dict, your argument against SortedList
would equally apply to it: they are "fundamental data structures and
implementing, testing, and verifying them is very much non-trivial".
So if your argument is correct, that would imply that standardizing on
one single dict implementation, one that isn't even tunable by the
caller, was a mistake. We should have provided a dozen different hash
tables, trees and treaps.
But... we didn't, and I don't think that Python is a worse language
because we only have one associative array implementation in the stdlib.
Whenever I need an associative array, I don't lose sleep over whether I
could get 2% better performance for hits, at a cost of 17% worse
performance for misses, by swapping over to some other implementation. I
just reach for dict, knowing that it will almost always be good enough.
Last year, for fun, after wishing there was a SortedSet in the standard
lib, I ended up implementing a Red-Black Tree and BTree based sorted
dictionary/set[1]. After then trying to use them for my use case[2], I
found that, in order to fully and truly exploit their benefits, the basic
Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
would let me, e.g. binary search to a particular spot and then iterate, or
give me a range between two points, etc.
I believe that sortedcontainers.SortedSet provides that functionality
via the irange() method.
http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange