How does Scala's Vector work?

前端 未结 3 1284
醉梦人生
醉梦人生 2021-02-02 08:09

I read this page about the time complexity of Scala collections. As it says, Vector\'s complexity is eC for all operations.

It made me wonder w

3条回答
  •  夕颜
    夕颜 (楼主)
    2021-02-02 08:18

    The keyword here is Trie. Vector is implemented as a Trie datastructure. See http://en.wikipedia.org/wiki/Trie.

    More precisely, it is a "bit-mapped vector trie". I've just found a consise enough description of the structure (along with an implementation - apparently in Rust) here:

    https://bitbucket.org/astrieanna/bitmapped-vector-trie

    The most relevant excerpt is:

    A Bitmapped Vector Trie is basically a 32-tree. Level 1 is an array of size 32, of whatever data type. Level 2 is an array of 32 Level 1's. and so on, until: Level 7 is an array of 2 Level 6's.

    UPDATE: In reply to Lai Yu-Hsuan's comment about complexity:

    I will have to assume you meant "depth" here :-D. The legend for "eC" says "The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.".

    If you are willing to consider the worst case, and given that there is an upper bound to the maximum size of the vector, then yes indeed we can say that the complexity is constant. Say that we consider the maximum size to be 2^32, then this means that the worst case is 7 operations at most, in any case. Then again, we can always consider the worst case for any type of collection, find an upper bound and say this is constant complexity, but for a list by example, this would mean a constant of 4 billions, which is not quite practical.

    But Vector is the opposite, 7 operations being more than practical, and this is how we can afford to consider its complexity constant in practice.

    Another way to look at this: we are not talking about log(2,N), but log(32,N). If you try to plot that you'll see it is practically an horizontal line. So pragmatically speaking you'll never be able to see much increase in processing time as the collection grows. Yes, that's still not really constant (which is why it is marked as "eC" and not just "C"), and you'll be able to see a difference around short vectors (but again, a very small difference because the number of operations grows so much slowly).

提交回复
热议问题