Why can assembly instructions contain multiplications in the “lea” instruction?

前端 未结 3 1266
醉酒成梦
醉酒成梦 2021-02-05 18:58

I am working on a very low level part of the application in which performance is critical.

While investigating the generated assembly, I noticed the following instructio

相关标签:
3条回答
  • 2021-02-05 19:32

    Actually, this is not something specific to the lea instruction.

    This type of addressing is called Scaled Addressing Mode. The multiplication is achieved by a bit shift, which is trivial:

    A Left Shift

    You could do 'scaled addressing' with a mov too, for example (note that this is not the same operation, the only similarity is the fact that ebx*4 represents an address multiplication):

     mov edx, [esi+4*ebx]
    

    (source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html#memory)

    For a more complete listing, see this Intel document. Table 2-3 shows that a scaling of 2, 4, or 8 is allowed. Nothing else.

    Latency (in terms of number of cycles): I don't think this should be affected at all. A shift is a matter of connections, and selecting between three possible shifts is the matter of 1 multiplexer worth of delay.

    0 讨论(0)
  • 2021-02-05 19:38

    To expand on your last question:

    Is the multiplication limited to powers of 2 (I would assume this is the case)?

    Note that you get the result of base + scale * index, so while scale has to be 1, 2, 4 or 8 (the size of x86 integer datatypes), you can get the equivalent of a multiplication by some different constants by using the same register as base and index, e.g.:

    lea eax, [eax*4 + eax]   ; multiply by 5
    

    This is used by the compiler to do strength reduction, e.g: for a multiplication by 100, depending on compiler options (target CPU model, optimization options), you may get:

    lea    (%edx,%edx,4),%eax   ; eax = orig_edx * 5
    lea    (%eax,%eax,4),%eax   ; eax = eax * 5 = orig_edx * 25
    shl    $0x2,%eax            ; eax = eax * 4 = orig_edx * 100
    
    0 讨论(0)
  • 2021-02-05 19:48

    To expand on my comment and to answer the rest of the question...

    Yes, it's limited to powers of two. (2, 4, and 8 specifically) So no multiplier is needed since it's just a shift. The point of it is to quickly generate an address from an index variable and a pointer - where the datatype is a simple 2, 4, or 8 byte word. (Though it's often abused for other uses as well.)

    As for the number of cycles that are needed: According to Agner Fog's tables it looks like the lea instruction is constant on some machines and variable on others.

    On Sandy Bridge there's a 2-cycle penalty if it's "complex or rip relative". But it doesn't say what "complex" means... So we can only guess unless you do a benchmark.

    0 讨论(0)
提交回复
热议问题