Can GCC be coerced to generate efficient constructors for memory-aligned objects?

前端 未结 1 390
庸人自扰
庸人自扰 2021-01-31 17:29

I\'m optimizing a constructor that is called in one of our app\'s innermost loops. The class in question is about 100 bytes wide, consists of a bunch of ints,

1条回答
  •  有刺的猬
    2021-01-31 18:05

    Here's how I would do it. Don't declare any constructor; instead, declare a fixed Frobozz that contains default values:

    const Frobozz DefaultFrobozz =
      {
      0, 1, -1, 0,        // int na,nb,nc,nd;
      false, true, false, // bool ba,bb,bc;
      'a', 'b', 'c',      // char ca,cb,cc;
      -1, 1.0             // float fa,fb;
      } ;
    

    Then in OversimplifiedExample:

    Frobozz params (DefaultFrobozz) ;
    

    With gcc -O3 (version 4.5.2), the initialisation of params reduces to:

    leal    -72(%ebp), %edi
    movl    $_DefaultFrobozz, %esi
    movl    $16, %ecx
    rep movsl
    

    which is about as good as it gets in a 32-bit environment.

    Warning: I tried this with the 64-bit g++ version 4.7.0 20110827 (experimental), and it generated an explicit sequence of 64-bit copies instead of a block move. The processor doesn't allow rep movsq, but I would expect rep movsl to be faster than a sequence of 64-bit loads and stores. Perhaps not. (But the -Os switch -- optimise for space -- does use a rep movsl instruction.) Anyway, try this and let us know what happens.

    Edited to add: I was wrong about the processor not allowing rep movsq. Intel's documentation says "The MOVS, MOVSB, MOVSW, and MOVSD instructions can be preceded by the REP prefix", but it seems that this is just a documentation glitch. In any case, if I make Frobozz big enough, then the 64-bit compiler generates rep movsq instructions; so it probably knows what it's doing.

    0 讨论(0)
提交回复
热议问题