Expression templates: improving performance in evaluating expressions?

后端 未结 2 2015
天涯浪人
天涯浪人 2021-01-13 03:17

By the expression templates technique, a matrix expression like

D = A*B+sin(C)+3.;

is pretty much equivalent, in terms of computing perform

相关标签:
2条回答
  • 2021-01-13 03:56

    Evaluation of expression template typically happens when you save result to some special type like:

    Result D = A*B+sin(C)+3.;
    

    Result type of expression:

    A*B+sin(C)+3.
    

    is not Result, but it is something that convertable to Result. And evaluation happens during such conversion.


    My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused

    Such kind of "transfromation":

    Result D = A*B+sin(C)+3.;
    Result F = D*E;
    

    to

    Result F = (A*B+sin(C)+3.)*E;
    

    Is possible when you do not evaluate D. To do this, typically you should capture D as it's real , expression type. For instance, with help of auto:

    auto &&D = A*B+sin(C)+3.;
    Result F = D*E;
    

    However, you should be carefull - sometimes expression template captures references to it's operands, and if you have some rvalue which would expire after it's expression:

    auto &&D = A*get_large_rvalue();
    // At this point, result of **get_large_rvalue** is destructed
    // And D has expiried reference
    Result F = D*E;
    

    Where get_large_rvalue is:

    LargeMatrix get_large_rvalue();
    

    It's result is rvalue, it expiries at the end of full expression when get_large_rvalue was called. If something within expression would store pointer/reference to it (for later evaluation) and you would "defer" evaluation - pointer/reference will outlive pointed/referenced object.

    In order to prevent this, you should do:

    auto &&intermediate = get_large_rvalue(); // it would live till the end of scope
    auto &&D = A*intermediate ;
    Result F = D*E;
    

    I'm not familiar with C++11 but, as I understand, auto asks the compiler to determine the type of a variable from its initialization

    Yes, exactly. This is called Type Inference/Deduction.

    C++98/03 had type deduction only for template functions, in C++11 there is auto.

    Do you know how do CUDA and C++11 interact each other?

    I haven't used CUDA (though I used OpenCL), but I guess that there will be no any problems in Host code with C++11. Maybe some C++11 features are not supported within Device code, but for your purpose - you need auto only in Host code

    Finally, is there any possibility with only C++?

    Do you mean pre-C++11? I.e. C++98/C++03? Yes, it is possible, but it has more syntax noise, maybe that would be reason to reject it:

    // somehwhere
    {
        use_D(A*B+sin(C)+3.);
    }
    // ...
    template<typename Expression>
    void use_D(Expression D) // depending on your expression template library
                             //   it may be better to use (const Expression &e)
    {
        Result F = D*E;
    }
    

    I'm now using CUDA/Visual Studio 2010 under Windows. Could you please recommend a compiler/toolset/environment for both OS' to use C++11 in the framework of my interest (GPGPU and CUDA, in you know any)

    MSVC 2010 does supports some parts of C++11. In particular it supports auto. So, if you need only auto from C++11 - MSVC2010 is OK.

    But if you may use MSVC2012 - I would recommed to stick with it - it has much better C++11 support.

    Also, the trick auto &&intermediate = get_large_rvalue(); seems to be not "transparent" to a third party user (which is not supposed to know such an issue). Am I right? Any alternative?

    If expression template stores references to some values, and you defer it's evaluation. You should be sure that all it's references are alive at the place of evaluation. Use any method which you want - it can be done without auto, like:

    LargeMatrix temp = get_large_rvalue();
    

    Or maybe even global/static variable (less prefered method).

    A last comment/question: to use auto &&D = A*B+sin(C)+3.; it seems that I should overload the operator= for assignments between two expressions, right?

    No, such form does not requires nor copy/move assignment operator nor copy/move constructor.

    Basically it just names temporary value, and prolongs it's lifetime to the end of scope. Check this SO.

    But, if you would use another form:

    auto D = A*B+sin(C)+3.;
    

    In such case copy/move/conversion constructor maybe required in order to compile (though actual copy can be optimized away by compiler by use of Copy Ellision)

    Also, switching between using auto (for the intermediate expressions) and Result to force calculation seems to be non-transparent to a third party user. Any alternative?

    I am not sure if there is any alternative. This is by nature of expression templates. While you using them in expressions - they return some internal intermediate types, but when you store to some "special" type - evaluation is triggered.

    0 讨论(0)
  • 2021-01-13 04:00

    In c++11 you can use auto

    auto D = A*B+sin(C)+3.;
    

    assuming you are using expression templates, the type of D would be <some template type which represents an expression>. Now, you have to use this carefully, because you are saving up some memory (no need to allocate space for a matrix) but depending on how you use it this may not be the best.

    Think about

    F = D*E

    An element D[i][j] needs to be "visited" many times when computing D*E (actually n times where n is the size of the matrices). If D is a plain-matrix type this is no problem. If D is an expression you are evaluating it many, many times.

    On the contray, doing

    F = D + E
    

    is fine.

    Think about this: you can not write F = E*(A*B+sin(C)+3.); using only two nested loops.

    0 讨论(0)
提交回复
热议问题