What is the most efficient way to construct large block matrices in Mathematica?

后端 未结 2 1334
没有蜡笔的小新
没有蜡笔的小新 2021-02-05 20:58

Inspired by Mike Bantegui\'s question on constructing a matrix defined as a recurrence relation, I wonder if there is any general guidance that could be given on setting up larg

2条回答
  •  情歌与酒
    2021-02-05 21:38

    Looking at what Compile does to Do loops is instructive. Consider this:

    L=1200;
    Do[.7, {i, 1, 2 L}, {j, 1, i}] // Timing
    Do[.3 + .4, {i, 1, 2 L}, {j, 1, i}] // Timing
    Do[.3 + .4 + .5, {i, 1, 2 L}, {j, 1, i}] // Timing
    Do[.3 + .4 + .5 + .8, {i, 1, 2 L}, {j, 1, i}] // Timing 
    (*
    {0.390163, Null}
    {1.04115, Null}
    {1.95333, Null}
    {2.42332, Null}
    *)
    

    First, it seems safe to assume that Do does not automatically compile its argument if it's over some length (as Map, Nest etc do): you can keep adding constants and the derivative of time taken vs number of constants is constant. This is further supported by the nonexistence of such an option in SystemOptions["CompileOptions"].

    Next, since this loops around n(n-1)/2 times with n=2*L, so around 3*10^6 times for our L=1200, the time taken for each addition indicates that there is a lot more going on than is necessary.

    Next let us try

    Compile[{{L,_Integer}},Do[.7,{i,1,2 L},{j,1,i}]]@1200//Timing
    Compile[{{L,_Integer}},Do[.7+.7,{i,1,2 L},{j,1,i}]]@1200//Timing
    Compile[{{L,_Integer}},Do[.7+.7+.7+.7,{i,1,2 L},{j,1,i}]]@1200//Timing
    (*
    {0.032081, Null}
    {0.032857, Null}
    {0.032254, Null}
    *)
    

    So here things are more reasonable. Let's take a look:

    Needs["CompiledFunctionTools`"]
    f1 = Compile[{{L, _Integer}}, 
       Do[.7 + .7 + .7 + .7, {i, 1, 2 L}, {j, 1, i}]];
    f2 = Compile[{{L, _Integer}}, Do[2.8, {i, 1, 2 L}, {j, 1, i}]];
    CompilePrint[f1]
    CompilePrint[f2]
    

    the two CompilePrints give the same output, namely,

            1 argument
            9 Integer registers
            Underflow checking off
            Overflow checking off
            Integer overflow checking on
            RuntimeAttributes -> {}
    
            I0 = A1
            I5 = 0
            I2 = 2
            I1 = 1
            Result = V255
    
        1   I4 = I2 * I0
        2   I6 = I5
        3   goto 8
        4   I7 = I6
        5   I8 = I5
        6   goto 7
        7   if[ ++ I8 < I7] goto 7
        8   if[ ++ I6 < I4] goto 4
        9   Return
    

    f1==f2 returns True.

    Now, do

    f5 = Compile[{{L, _Integer}}, Block[{t = 0.},
            Do[t = Sin[i*j], {i, 1, 2 L}, {j, 1, i}]; t]];
    f6 = Compile[{{L, _Integer}}, Block[{t = 0.},
            Do[t = Sin[.45], {i, 1, 2 L}, {j, 1, i}]; t]];
    CompilePrint[f5]
    CompilePrint[f6]
    

    I won't show the full listings, but in the first there is a line R3 = Sin[ R1] while in the second there is an assignment to a register R1 = 0.43496553411123023 (which, however, is reassigned in the innermost part of the loop by R2 = R1; perhaps if we output to C this will be optimized by gcc eventually).

    So, in these very simple cases, uncompiled Do just blindly executes the body without inspecting it, while Compile does do various simple optimizations (in addition to outputing byte code). While here I am choosing examples that exaggerate how literally Do interprets its argument, this kind of thing partly explains the large speedup after compiling.

    As for the huge speedup in Mike Bantegui's question yesterday, I think the speedup in such simple problems (just looping and multiplying things) is because there is no reason that automatically produced C code can't be optimized by the compiler to get things running as fast as possible. The C code produced is too hard to understand for me, but the bytecode is readable and I don't think that there is anything all that wasteful. So it is not that shocking that it is so fast when compiled to C. Using built-in functions shouldn't be any faster than that, since there shouldn't be any difference in the algorithm (if there is, the Do loop shouldn't have been written that way).

    All this should be checked case by case, of course. In my experience, Do loops usually are the fastest way to go for this kind of operation. However, compilation has its limits: if you are producing large objects and trying to pass them around between two compiled functions (as arguments), the bottleneck can be this transfer. One solution is to simply put everything into one giant function and compile that; this ends up being harder and harder to do (you are forced to write C in mma, so to speak). Or you can try compiling the individual functions and using CompilationOptions -> {"InlineCompiledFunctions" -> True}] in the Compile. Things can get tricky very fast, though.

    But this is getting too long.

提交回复
热议问题