Why is insertion into my tree faster on sorted input than random input?

前端 未结 8 1166
慢半拍i
慢半拍i 2021-02-02 12:33

Now I\'ve always heard binary search trees are faster to build from randomly selected data than ordered data, simply because ordered data requires explicit rebalancing to keep t

8条回答
  •  广开言路
    2021-02-02 13:06

    You're only seeing a difference of about 2x. Unless you've tuned the daylights out of this code, that's basically in the noise. Most well-written programs, especially those involving data structure, can easily have more room for improvement than that. Here's an example.

    I just ran your code and took a few stackshots. Here's what I saw:

    Random Insert:

    1 Insert:64 -> HeapifyLeft:81 -> RotateRight:150
    1 Insert:64 -> Make:43 ->Treap:35
    1 Insert:68 -> Make:43
    

    Ordered Insert:

    1 Insert:61
    1 OrderedInsert:224
    1 Insert:68 -> Make:43
    1 Insert:68 -> HeapifyRight:90 -> RotateLeft:107
    1 Insert:68
    1 Insert:68 -> Insert:55 -> IsEmpty.get:51
    

    This is a pretty small number of samples, but it suggests in the case of random input that Make (line 43) is consuming a higher fraction of time. That is this code:

        private Treap Make(Treap left, T value, Treap right, int priority)
        {
            return new Treap(Comparer, left, value, right, priority);
        }
    

    I then took 20 stackshots of the Random Insert code to get a better idea of what it was doing:

    1 Insert:61
    4 Insert:64
    3 Insert:68
    2 Insert:68 -> Make:43
    1 Insert:64 -> Make:43
    1 Insert:68 -> Insert:57 -> Make:48 -> Make:43
    2 Insert:68 -> Insert:55
    1 Insert:64 -> Insert:55
    1 Insert:64 -> HeapifyLeft:81 -> RotateRight:150
    1 Insert:64 -> Make:43 -> Treap:35
    1 Insert:68 -> HeapifyRight:90 -> RotateLeft:107 -> IsEmpty.get:51
    1 Insert:68 -> HeapifyRight:88
    1 Insert:61 -> AnonymousMethod:214
    

    This reveals some information.
    25% of time is spent in line Make:43 or its callees.
    15% of time is spent in that line, not in a recognized routine, in other words, in new making a new node.
    90% of time is spent in lines Insert:64 and 68 (which call Make and heapify.
    10% of time is spent in RotateLeft and Right.
    15% of time is spent in Heapify or its callees.

    I also did a fair amount of single-stepping (at the source level), and came to the suspicion that, since the tree is immutable, it spends a lot of time making new nodes because it doesn't want to change old ones. Then the old ones are garbage collected because nobody refers to them anymore.

    This has got to be inefficient.

    I'm still not answering your question of why inserting ordered numbers is faster than randomly generated numbers, but it doesn't really surprise me, because the tree is immutable.

    I don't think you can expect any performance reasoning about tree algorithms to carry over easily to immutable trees, because the slightest change deep in the tree causes it to be rebuilt on the way back out, at a high cost in new-ing and garbage collection.

提交回复
热议问题