What is the effect of ordering if…else if statements by probability?

后端 未结 10 1773
花落未央
花落未央 2020-12-07 13:39

Specifically, if I have a series of if...else if statements, and I somehow know beforehand the relative probability that each statement will evalua

相关标签:
10条回答
  • 2020-12-07 14:11

    If you already know the relative probability of if-else statement,then for performance purpose it would be better to use the sorted way, as it will only check one condition(the true one).

    In an unsorted way the compiler will check all the conditions unnecessarily and will take time.

    0 讨论(0)
  • 2020-12-07 14:12

    The way I usually see this solved for high-performance code is keeping the order that is most readable, but providing hints to the compiler. Here is one example from Linux kernel:

    if (likely(access_ok(VERIFY_READ, from, n))) {
        kasan_check_write(to, n);
        res = raw_copy_from_user(to, from, n);
    }
    if (unlikely(res))
        memset(to + (n - res), 0, res);
    

    Here the assumption is that access check will pass, and that no error is returned in res. Trying to reorder either of these if clauses would just confuse the code, but the likely() and unlikely() macros actually help readability by pointing out what is the normal case and what is the exception.

    The Linux implementation of those macros uses GCC specific features. It seems that clang and Intel C compiler support the same syntax, but MSVC doesn't have such feature.

    0 讨论(0)
  • 2020-12-07 14:16

    Just my 5 cents. It seems the effect of ordering if statements should depend on:

    1. Probability of each if statement.

    2. Number of iterations, so the branch predictor could kick in.

    3. Likely/unlikely compiler hints, i.e. code layout.

    To explore those factors, I benchmarked the following functions:

    ordered_ifs()

    for (i = 0; i < data_sz * 1024; i++) {
        if (data[i] < check_point) // highly likely
            s += 3;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (data[i] == check_point) // very unlikely
            s += 1;
    }
    

    reversed_ifs()

    for (i = 0; i < data_sz * 1024; i++) {
        if (data[i] == check_point) // very unlikely
            s += 1;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (data[i] < check_point) // highly likely
            s += 3;
    }
    

    ordered_ifs_with_hints()

    for (i = 0; i < data_sz * 1024; i++) {
        if (likely(data[i] < check_point)) // highly likely
            s += 3;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (unlikely(data[i] == check_point)) // very unlikely
            s += 1;
    }
    

    reversed_ifs_with_hints()

    for (i = 0; i < data_sz * 1024; i++) {
        if (unlikely(data[i] == check_point)) // very unlikely
            s += 1;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (likely(data[i] < check_point)) // highly likely
            s += 3;
    }
    

    data

    The data array contains random numbers between 0 and 100:

    const int RANGE_MAX = 100;
    uint8_t data[DATA_MAX * 1024];
    
    static void data_init(int data_sz)
    {
        int i;
            srand(0);
        for (i = 0; i < data_sz * 1024; i++)
            data[i] = rand() % RANGE_MAX;
    }
    

    The Results

    The following results are for Intel i5@3,2 GHz and G++ 6.3.0. The first argument is the check_point (i.e. probability in %% for the highly likely if statement), the second argument is data_sz (i.e. number of iterations).

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    ordered_ifs/75/4                    4326 ns       4325 ns     162613
    ordered_ifs/75/8                   18242 ns      18242 ns      37931
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    reversed_ifs/50/4                   5342 ns       5341 ns     126800
    reversed_ifs/50/8                  26050 ns      26050 ns      26894
    reversed_ifs/75/4                   3616 ns       3616 ns     193130
    reversed_ifs/75/8                  15697 ns      15696 ns      44618
    reversed_ifs/100/4                  3738 ns       3738 ns     188087
    reversed_ifs/100/8                  7476 ns       7476 ns      93752
    ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
    ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
    ordered_ifs_with_hints/75/4         3165 ns       3165 ns     218492
    ordered_ifs_with_hints/75/8        13785 ns      13785 ns      50574
    ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687
    ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205
    reversed_ifs_with_hints/50/4        6573 ns       6572 ns     105629
    reversed_ifs_with_hints/50/8       27351 ns      27351 ns      25568
    reversed_ifs_with_hints/75/4        3537 ns       3537 ns     197470
    reversed_ifs_with_hints/75/8       16130 ns      16130 ns      43279
    reversed_ifs_with_hints/100/4       3737 ns       3737 ns     187583
    reversed_ifs_with_hints/100/8       7446 ns       7446 ns      93782
    

    Analysis

    1. The Ordering Does Matter

    For 4K iterations and (almost) 100% probability of highly liked statement the difference is huge 223%:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    reversed_ifs/100/4                  3738 ns       3738 ns     188087
    

    For 4K iterations and 50% probability of highly liked statement the difference is about 14%:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    reversed_ifs/50/4                   5342 ns       5341 ns     126800
    

    2. Number of Iterations Does Matter

    The difference between 4K and 8K iterations for (almost) 100% probability of highly liked statement is about two times (as expected):

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    

    But the difference between 4K and 8K iterations for 50% probability of highly liked statement is 5,5 times:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    

    Why is so? Because of branch predictor misses. Here is the branch misses for each mentioned above case:

    ordered_ifs/100/4    0.01% of branch-misses
    ordered_ifs/100/8    0.01% of branch-misses
    ordered_ifs/50/4     3.18% of branch-misses
    ordered_ifs/50/8     15.22% of branch-misses
    

    So on my i5 the branch predictor fails spectacularly for not-so-likely branches and large data sets.

    3. Hints Help a Bit

    For 4K iterations the results are somewhat worse for 50% probability and somewhat better for close to 100% probability:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
    ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687
    

    But for 8K iterations the results are always a bit better:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
    ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205
    

    So, the hints also help, but just a tiny bit.

    Overall conclusion is: always benchmark the code, because the results may surprise.

    Hope that helps.

    0 讨论(0)
  • 2020-12-07 14:18

    As a general rule, most if not all Intel CPUs assume forward branches are not taken the first time they see them. See Godbolt's work.

    After that, the branch goes into a branch prediction cache, and past behavior is used to inform future branch prediction.

    So in a tight loop, the effect of misordering is going to be relatively small. The branch predictor is going to learn which set of branches is most likely, and if you have non-trivial amount of work in the loop the small differences won't add up much.

    In general code, most compilers by default (lacking another reason) will order the produced machine code roughly the way you ordered it in your code. Thus if statements are forward branches when they fail.

    So you should order your branches in the order of decreasing likelihood to get the best branch prediction from a "first encounter".

    A microbenchmark that loops tightly many times over a set of conditions and does trivial work is going to dominated by tiny effects of instruction count and the like, and little in the way of relative branch prediction issues. So in this case you must profile, as rules of thumb won't be reliable.

    On top of that, vectorization and many other optimizations apply to tiny tight loops.

    So in general code, put most likely code within the if block, and that will result in the fewest un-cached branch prediction misses. In tight loops, follow the general rule to start, and if you need to know more you have little choice but to profile.

    Naturally this all goes out the window if some tests are far cheaper than others.

    0 讨论(0)
提交回复
热议问题