I am working on a very low level part of the application in which performance is critical.
While investigating the generated assembly, I noticed the following instructio
To expand on my comment and to answer the rest of the question...
Yes, it's limited to powers of two. (2, 4, and 8 specifically) So no multiplier is needed since it's just a shift. The point of it is to quickly generate an address from an index variable and a pointer - where the datatype is a simple 2, 4, or 8 byte word. (Though it's often abused for other uses as well.)
As for the number of cycles that are needed: According to Agner Fog's tables it looks like the lea
instruction is constant on some machines and variable on others.
On Sandy Bridge there's a 2-cycle penalty if it's "complex or rip relative". But it doesn't say what "complex" means... So we can only guess unless you do a benchmark.