Why is insertion into my tree faster on sorted input than random input?

前端 未结 8 1163
慢半拍i
慢半拍i 2021-02-02 12:33

Now I\'ve always heard binary search trees are faster to build from randomly selected data than ordered data, simply because ordered data requires explicit rebalancing to keep t

相关标签:
8条回答
  • Yes it's the number of rotations that is causing the extra time. Here's what I did:

    • Remove the lines checking priority in HeapifyLeft and HeapifyRight so rotations are always done.
    • Added a Console.WriteLine after the if in RotateLeft and RotateRight.
    • Added a Console.WriteLine in the IsEmpty part of the Insert method to see what was being inserted.
    • Ran the test once with 5 values each.

    Output:

    TimeIt(5, RandomInsert)
    Inserting 0.593302943554382
    Inserting 0.348900582338171
    RotateRight
    Inserting 0.75496212381635
    RotateLeft
    RotateLeft
    Inserting 0.438848891499848
    RotateRight
    RotateLeft
    RotateRight
    Inserting 0.357057290783644
    RotateLeft
    RotateRight
    
    TimeIt(5, OrderedInsert)
    Inserting 0.150707998383189
    Inserting 1.58281302712057
    RotateLeft
    Inserting 2.23192588297274
    RotateLeft
    Inserting 3.30518679009061
    RotateLeft
    Inserting 4.32788012657682
    RotateLeft
    

    Result: 2 times as many rotations on random data.

    0 讨论(0)
  • 2021-02-02 12:54

    Self-balancing trees exist to fix the problems associated non-randomly-distributed data. By definition, they trade away a bit of the best-case performance to vastly improve the worst-case performance associated with non-balanced BSTs, specifically that of sorted input.

    You're actually overthinking this problem, because slower insertion of random data vs. ordered data is a characteristic of any balanced tree. Try it on an AVL and you'll see the same results.

    Cameron had the right idea, removing the priority check to force the worst case. If you do that and instrument your tree so you can see how many rebalances are happening for each insert, it actually becomes very obvious what's going on. When inserting sorted data, the tree always rotates left and the root's right child is always empty. Insertion always results in exactly one rebalance because the insertion node has no children and no recursion occurs. On the other hand, when you run it on the random data, almost immediately you start to see multiple rebalances happening on every insert, as many as 5 or 6 of them in the smallest case (50 inserts), because it's happening on subtrees as well.

    With priority checking turned back on, not only are rebalances typically less expensive due to more nodes being pushed into the left subtree (where they never come out of because no insertions happen there), but they are also less likely to occur. Why? Because in the treap, high-priority nodes float to the top, and the constant left-rotations (not accompanied by right-rotations) start to push all the high-priority nodes into the left subtree as well. The result is that rebalances happen less frequently due to the uneven distribution of probability.

    If you instrument the rebalancing code you'll see that this is true; for both the sorted and random input, you end up with almost identical numbers of left-rotations, but the random input also gives the same number of right-rotations, which makes for twice as many in all. This shouldn't be surprising - Gaussian input should result in a Gaussian distribution of rotations. You'll also see that there are only about 60-70% as many top-level rebalances for the sorted input, which perhaps is surprising, and again, that's due to the sorted input messing with the natural distribution of priorities.

    You can also verify this by inspecting the full tree at the end of an insertion loop. With the random input, priorities tend to decrease fairly linearly by level; with the sorted input, priorities tend to stay very high until you get to one or two levels from the bottom.

    Hopefully I've done a decent job explaining this... let me know if any of it is too vague.

    0 讨论(0)
  • 2021-02-02 12:57

    Aaronaught has done a really decent job explaining this.

    For these two special cases, I find it easier to grasp it in terms of the insertion path lengths.

    For random input, your insertion path goes down to one of the leaves and the length of the path - thus the number of rotations - are bounded by the height of the tree.

    In the sorted case, you walk on the right spine of the treap and the bound is the length of the spine, which is less than or equal to the the height.

    Since you rotate nodes along the insertion path and your insertion path is the spine in this case, these rotations will always shorten the spine (which will result in a shorter insertion path at the next insertion, since the insertion path is just the spine etc.)

    Edit: for the random case the insertion path is 1.75x longer.

    0 讨论(0)
  • 2021-02-02 12:59

    I ran your code, and I think it has to do with the number of rotations. During ordered input, the number of rotations are optimal, and the tree will never have to rotate back.

    During random input the tree will have to perform more rotations, because it may have to rotate back and forth.

    To really find out, I would have to add counters for the numbers of left and right rotations for each run. You can probably do this yourself.

    UPDATE:

    I put breakpoints on rotateleft and rotateright. During ordered input rotateright is never used. During random input, both are hit, and it seems to me that they are used more frequently.

    UPDATE 2:

    I added some output to the 50 item ordered run (substituting with integers for clarity), to learn more:

    TimeIt(50, OrderedInsert)
    LastValue = 0, Top.Value = 0, Right.Count = 0, Left.Count = 0
    RotateLeft @value=0
    LastValue = 1, Top.Value = 1, Right.Count = 0, Left.Count = 1
    LastValue = 2, Top.Value = 1, Right.Count = 1, Left.Count = 1
    LastValue = 3, Top.Value = 1, Right.Count = 2, Left.Count = 1
    RotateLeft @value=3
    RotateLeft @value=2
    RotateLeft @value=1
    LastValue = 4, Top.Value = 4, Right.Count = 0, Left.Count = 4
    LastValue = 5, Top.Value = 4, Right.Count = 1, Left.Count = 4
    LastValue = 6, Top.Value = 4, Right.Count = 2, Left.Count = 4
    RotateLeft @value=6
    LastValue = 7, Top.Value = 4, Right.Count = 3, Left.Count = 4
    LastValue = 8, Top.Value = 4, Right.Count = 4, Left.Count = 4
    RotateLeft @value=8
    RotateLeft @value=7
    LastValue = 9, Top.Value = 4, Right.Count = 5, Left.Count = 4
    LastValue = 10, Top.Value = 4, Right.Count = 6, Left.Count = 4
    RotateLeft @value=10
    RotateLeft @value=9
    RotateLeft @value=5
    RotateLeft @value=4
    LastValue = 11, Top.Value = 11, Right.Count = 0, Left.Count = 11
    LastValue = 12, Top.Value = 11, Right.Count = 1, Left.Count = 11
    RotateLeft @value=12
    LastValue = 13, Top.Value = 11, Right.Count = 2, Left.Count = 11
    RotateLeft @value=13
    LastValue = 14, Top.Value = 11, Right.Count = 3, Left.Count = 11
    LastValue = 15, Top.Value = 11, Right.Count = 4, Left.Count = 11
    RotateLeft @value=15
    RotateLeft @value=14
    LastValue = 16, Top.Value = 11, Right.Count = 5, Left.Count = 11
    LastValue = 17, Top.Value = 11, Right.Count = 6, Left.Count = 11
    RotateLeft @value=17
    LastValue = 18, Top.Value = 11, Right.Count = 7, Left.Count = 11
    LastValue = 19, Top.Value = 11, Right.Count = 8, Left.Count = 11
    RotateLeft @value=19
    LastValue = 20, Top.Value = 11, Right.Count = 9, Left.Count = 11
    LastValue = 21, Top.Value = 11, Right.Count = 10, Left.Count = 11
    RotateLeft @value=21
    LastValue = 22, Top.Value = 11, Right.Count = 11, Left.Count = 11
    RotateLeft @value=22
    RotateLeft @value=20
    RotateLeft @value=18
    LastValue = 23, Top.Value = 11, Right.Count = 12, Left.Count = 11
    LastValue = 24, Top.Value = 11, Right.Count = 13, Left.Count = 11
    LastValue = 25, Top.Value = 11, Right.Count = 14, Left.Count = 11
    RotateLeft @value=25
    RotateLeft @value=24
    LastValue = 26, Top.Value = 11, Right.Count = 15, Left.Count = 11
    LastValue = 27, Top.Value = 11, Right.Count = 16, Left.Count = 11
    RotateLeft @value=27
    LastValue = 28, Top.Value = 11, Right.Count = 17, Left.Count = 11
    RotateLeft @value=28
    RotateLeft @value=26
    RotateLeft @value=23
    RotateLeft @value=16
    RotateLeft @value=11
    LastValue = 29, Top.Value = 29, Right.Count = 0, Left.Count = 29
    LastValue = 30, Top.Value = 29, Right.Count = 1, Left.Count = 29
    LastValue = 31, Top.Value = 29, Right.Count = 2, Left.Count = 29
    LastValue = 32, Top.Value = 29, Right.Count = 3, Left.Count = 29
    RotateLeft @value=32
    RotateLeft @value=31
    LastValue = 33, Top.Value = 29, Right.Count = 4, Left.Count = 29
    RotateLeft @value=33
    RotateLeft @value=30
    LastValue = 34, Top.Value = 29, Right.Count = 5, Left.Count = 29
    RotateLeft @value=34
    LastValue = 35, Top.Value = 29, Right.Count = 6, Left.Count = 29
    LastValue = 36, Top.Value = 29, Right.Count = 7, Left.Count = 29
    LastValue = 37, Top.Value = 29, Right.Count = 8, Left.Count = 29
    RotateLeft @value=37
    LastValue = 38, Top.Value = 29, Right.Count = 9, Left.Count = 29
    LastValue = 39, Top.Value = 29, Right.Count = 10, Left.Count = 29
    RotateLeft @value=39
    LastValue = 40, Top.Value = 29, Right.Count = 11, Left.Count = 29
    RotateLeft @value=40
    RotateLeft @value=38
    RotateLeft @value=36
    LastValue = 41, Top.Value = 29, Right.Count = 12, Left.Count = 29
    LastValue = 42, Top.Value = 29, Right.Count = 13, Left.Count = 29
    RotateLeft @value=42
    LastValue = 43, Top.Value = 29, Right.Count = 14, Left.Count = 29
    LastValue = 44, Top.Value = 29, Right.Count = 15, Left.Count = 29
    RotateLeft @value=44
    LastValue = 45, Top.Value = 29, Right.Count = 16, Left.Count = 29
    LastValue = 46, Top.Value = 29, Right.Count = 17, Left.Count = 29
    RotateLeft @value=46
    RotateLeft @value=45
    LastValue = 47, Top.Value = 29, Right.Count = 18, Left.Count = 29
    LastValue = 48, Top.Value = 29, Right.Count = 19, Left.Count = 29
    LastValue = 49, Top.Value = 29, Right.Count = 20, Left.Count = 29
    

    The ordered items always gets added to the right side of the tree, naturally. When the right side gets bigger than the left, a rotateleft happens. Rotateright never happens. A new top node is selected roughly every time the tree doubles. The randomness of the priority value jitters it a little, so it goes 0, 1, 4, 11, 29 in this run.

    A random run reveals something interesting:

    TimeIt(50, RandomInsert)
    LastValue = 0,748661640914465, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 0
    LastValue = 0,669427539533669, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 1
    RotateRight @value=0,669427539533669
    LastValue = 0,318363281115127, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 2
    RotateRight @value=0,669427539533669
    LastValue = 0,33133987678743, Top.Value = 0,748661640914465, Right.Count = 0, Left.Count = 3
    RotateLeft @value=0,748661640914465
    LastValue = 0,955126694382693, Top.Value = 0,955126694382693, Right.Count = 0, Left.Count = 4
    RotateRight @value=0,669427539533669
    RotateLeft @value=0,33133987678743
    RotateLeft @value=0,318363281115127
    RotateRight @value=0,748661640914465
    RotateRight @value=0,955126694382693
    LastValue = 0,641024029180884, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 2
    LastValue = 0,20709771951991, Top.Value = 0,641024029180884, Right.Count = 3, Left.Count = 3
    LastValue = 0,830862050331599, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 3
    RotateRight @value=0,20709771951991
    RotateRight @value=0,318363281115127
    LastValue = 0,203250563798123, Top.Value = 0,641024029180884, Right.Count = 4, Left.Count = 4
    RotateLeft @value=0,669427539533669
    RotateRight @value=0,748661640914465
    RotateRight @value=0,955126694382693
    LastValue = 0,701743399585478, Top.Value = 0,641024029180884, Right.Count = 5, Left.Count = 4
    RotateLeft @value=0,669427539533669
    RotateRight @value=0,701743399585478
    RotateLeft @value=0,641024029180884
    LastValue = 0,675667521858433, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 6
    RotateLeft @value=0,33133987678743
    RotateLeft @value=0,318363281115127
    RotateLeft @value=0,203250563798123
    LastValue = 0,531275219531392, Top.Value = 0,675667521858433, Right.Count = 4, Left.Count = 7
    RotateRight @value=0,748661640914465
    RotateRight @value=0,955126694382693
    RotateLeft @value=0,701743399585478
    LastValue = 0,704049674190604, Top.Value = 0,675667521858433, Right.Count = 5, Left.Count = 7
    RotateRight @value=0,203250563798123
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    LastValue = 0,161392807104342, Top.Value = 0,161392807104342, Right.Count = 13, Left.Count = 0
    RotateRight @value=0,203250563798123
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    RotateLeft @value=0,161392807104342
    LastValue = 0,167598206162266, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 1
    LastValue = 0,154996359793002, Top.Value = 0,167598206162266, Right.Count = 13, Left.Count = 2
    RotateLeft @value=0,33133987678743
    LastValue = 0,431767346538495, Top.Value = 0,167598206162266, Right.Count = 14, Left.Count = 2
    RotateRight @value=0,203250563798123
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    RotateLeft @value=0,167598206162266
    LastValue = 0,173774613614089, Top.Value = 0,173774613614089, Right.Count = 14, Left.Count = 3
    RotateRight @value=0,830862050331599
    LastValue = 0,76559642412029, Top.Value = 0,173774613614089, Right.Count = 15, Left.Count = 3
    RotateRight @value=0,76559642412029
    RotateLeft @value=0,748661640914465
    RotateRight @value=0,955126694382693
    RotateLeft @value=0,704049674190604
    RotateLeft @value=0,675667521858433
    LastValue = 0,75742144871383, Top.Value = 0,173774613614089, Right.Count = 16, Left.Count = 3
    LastValue = 0,346844367844446, Top.Value = 0,173774613614089, Right.Count = 17, Left.Count = 3
    RotateRight @value=0,830862050331599
    LastValue = 0,787565814232251, Top.Value = 0,173774613614089, Right.Count = 18, Left.Count = 3
    LastValue = 0,734950566540915, Top.Value = 0,173774613614089, Right.Count = 19, Left.Count = 3
    RotateLeft @value=0,20709771951991
    RotateRight @value=0,318363281115127
    RotateLeft @value=0,203250563798123
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    RotateRight @value=0,75742144871383
    RotateLeft @value=0,173774613614089
    LastValue = 0,236504829598826, Top.Value = 0,236504829598826, Right.Count = 17, Left.Count = 6
    RotateLeft @value=0,830862050331599
    RotateLeft @value=0,787565814232251
    RotateLeft @value=0,76559642412029
    RotateRight @value=0,955126694382693
    LastValue = 0,895606500048007, Top.Value = 0,236504829598826, Right.Count = 18, Left.Count = 6
    LastValue = 0,599106418713511, Top.Value = 0,236504829598826, Right.Count = 19, Left.Count = 6
    LastValue = 0,8182332901369, Top.Value = 0,236504829598826, Right.Count = 20, Left.Count = 6
    RotateRight @value=0,734950566540915
    LastValue = 0,704216948572647, Top.Value = 0,236504829598826, Right.Count = 21, Left.Count = 6
    RotateLeft @value=0,346844367844446
    RotateLeft @value=0,33133987678743
    RotateRight @value=0,431767346538495
    RotateLeft @value=0,318363281115127
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    RotateRight @value=0,75742144871383
    LastValue = 0,379157059536854, Top.Value = 0,236504829598826, Right.Count = 22, Left.Count = 6
    RotateLeft @value=0,431767346538495
    LastValue = 0,46832062046431, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 6
    RotateRight @value=0,154996359793002
    LastValue = 0,0999000217299443, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 7
    RotateLeft @value=0,20709771951991
    LastValue = 0,229543754006524, Top.Value = 0,236504829598826, Right.Count = 23, Left.Count = 8
    RotateRight @value=0,8182332901369
    LastValue = 0,80358425984326, Top.Value = 0,236504829598826, Right.Count = 24, Left.Count = 8
    RotateRight @value=0,318363281115127
    LastValue = 0,259324726769386, Top.Value = 0,236504829598826, Right.Count = 25, Left.Count = 8
    RotateRight @value=0,318363281115127
    LastValue = 0,307835293145774, Top.Value = 0,236504829598826, Right.Count = 26, Left.Count = 8
    RotateLeft @value=0,431767346538495
    LastValue = 0,453910283024381, Top.Value = 0,236504829598826, Right.Count = 27, Left.Count = 8
    RotateLeft @value=0,830862050331599
    LastValue = 0,868997387527021, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 8
    RotateLeft @value=0,20709771951991
    RotateRight @value=0,229543754006524
    RotateLeft @value=0,203250563798123
    LastValue = 0,218358597354199, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 9
    RotateRight @value=0,0999000217299443
    RotateRight @value=0,161392807104342
    LastValue = 0,0642934488431986, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 10
    RotateRight @value=0,154996359793002
    RotateLeft @value=0,0999000217299443
    LastValue = 0,148295871982489, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 11
    LastValue = 0,217621828065078, Top.Value = 0,236504829598826, Right.Count = 28, Left.Count = 12
    RotateRight @value=0,599106418713511
    LastValue = 0,553135806020878, Top.Value = 0,236504829598826, Right.Count = 29, Left.Count = 12
    LastValue = 0,982277666210326, Top.Value = 0,236504829598826, Right.Count = 30, Left.Count = 12
    RotateRight @value=0,8182332901369
    LastValue = 0,803671114520948, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 12
    RotateRight @value=0,203250563798123
    RotateRight @value=0,218358597354199
    LastValue = 0,19310415405459, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 13
    LastValue = 0,0133136604043253, Top.Value = 0,236504829598826, Right.Count = 31, Left.Count = 14
    RotateLeft @value=0,46832062046431
    RotateRight @value=0,531275219531392
    RotateRight @value=0,641024029180884
    RotateRight @value=0,675667521858433
    RotateRight @value=0,75742144871383
    LastValue = 0,483394719419719, Top.Value = 0,236504829598826, Right.Count = 32, Left.Count = 14
    RotateLeft @value=0,431767346538495
    RotateRight @value=0,453910283024381
    LastValue = 0,453370328738061, Top.Value = 0,236504829598826, Right.Count = 33, Left.Count = 14
    LastValue = 0,762330518459124, Top.Value = 0,236504829598826, Right.Count = 34, Left.Count = 14
    LastValue = 0,699010426969738, Top.Value = 0,236504829598826, Right.Count = 35, Left.Count = 14
    

    Rotations happen not so much because the tree is unbalanced, but because of the priorities, which are randomly selected. For example we get 4 rotations at the 13th insertion. We have a tree balanced at 5/7 (which is fine), but get to 13/0! It would seem that the use of random priorities deserves further investigation. Anyhow, it is plain to see that the random inserts cause a lot more rotations, than the ordered inserts.

    0 讨论(0)
  • 2021-02-02 12:59

    @Guge is right. However there is a little tiny bit more to it. I am not saying that it is the biggest factor in this case - however it is there and it is hard to do anything about it.

    For a sorted input, lookups likely touch the nodes that are hot in the cache. (This is true in general for balanced trees like AVL trees, red-black trees, B-trees, etc.)

    Since inserts start with a lookup, this has an effect on insert/delete performance as well.

    Again, I am not claiming that it is the most significant factor in every and all cases. It is there, however, and will likely result in sorted inputs being always faster than random ones in these data structures.

    0 讨论(0)
  • 2021-02-02 13:01

    Try this: database on treap.

    http://code.google.com/p/treapdb/

    0 讨论(0)
提交回复
热议问题