Is a logical right shift by a power of 2 faster in AVR?

后端 未结 9 1536
栀梦
栀梦 2020-12-06 19:21

I would like to know if performing a logical right shift is faster when shifting by a power of 2

For example, is

myUnsigned >> 4
相关标签:
9条回答
  • 2020-12-06 19:45

    Indeed ATMega doesn't have a barrel shifter just like most (if not all) other 8-bit MCUs. Therefore it can only shift by 1 each time instead of any arbitrary values like more powerful CPUs. As a result shifting by 4 is theoretically slower than shifting by 3

    However ATMega does have a swap nibble instruction so in fact x >> 4 is faster than x >> 3

    Assuming x is an uint8_t then x >>= 3 is implemented by 3 right shifts

    x >>= 1;
    x >>= 1;
    x >>= 1;
    

    whereas x >>= 4 only need a swap and a bit clear

    swap(x);    // swap the top and bottom nibbles AB <-> BA
    x &= 0x0f;
    

    or

    x &= 0xf0;
    swap(x);
    

    For bigger cross-register shifts there are also various ways to optimize it

    With a uint16_t variable y consisting of the low part y0 and high part y1 then y >> 8 is simply

    y0 = y1;
    y1 = 0;
    

    Similarly y >> 9 can be optimized to

    y0 = y1 >> 1;
    y1 = 0;
    

    and hence is even faster than a shift by 3 on a char


    In conclusion, the shift time varies depending on the shift distance, but it's not necessarily slower for longer or non-power-of-2 values. Generally it'll take at most 3 instructions to shift within an 8-bit char

    Here are some demos from compiler explorer

    • A right shift by 4 is achieved by a swap and an and like above

        swap r24
        andi r24,lo8(15)
      
    • A right shift by 3 has to be done with 3 instructions

        lsr r24
        lsr r24
        lsr r24
      

    Left shifts are also optimized in the same manner

    See also Which is faster: x<<1 or x<<10?

    0 讨论(0)
  • 2020-12-06 19:47

    In the AVR instruction set, arithmetic shift right and left happen one bit at a time. So, for this particular microcontroller, shifting >> n means the compiler actually makes n many individual asr ops, and I guess >>3 is one faster than >>4.

    This makes the AVR fairly unsual, by the way.

    0 讨论(0)
  • 2020-12-06 19:48

    It depends on how the processor is built. If the processor has a barrel-rotate it can shift any number of bits in one operation, but that takes chip space and power budget. The most economical hardware would just be able to rotate right by one, with options regarding the wrap-around bit. Next would be one that could rotate by one either left or right. I can imagine a structure that would have a 1-shifter, 2-shifter, 4-shifter, etc. in which case 4 might be faster than 3.

    0 讨论(0)
  • 2020-12-06 19:49

    replacing a divide with a bit-shift

    This is not the same for negative numbers:

    char div2 (void)
    {
        return (-1) / 2;
        // ldi r24,0
    }
    
    char asr1 (void)
    {
        return (-1) >> 1;
        //  ldi r24,-1
    }
    
    0 讨论(0)
  • 2020-12-06 19:53

    You have to consult the documentation of your processor for this information. Even for a given instruction set, there may be different costs depending on the model. On a really small processor, shifting by one could conceivably be faster than by other values, for instance (it is the case for rotation instructions on some IA32 processors, but that's only because this instruction is so rarely produced by compilers).

    According to http://atmel.com/dyn/resources/prod_documents/8271S.pdf all logical shifts are done in one cycle for the ATMega328. But of course, as pointed out in the comments, all logical shifts are by one bit. So the cost of a shift by n is n cycles in n instructions.

    0 讨论(0)
  • 2020-12-06 20:00

    With all respect, you should not even start talking about performace until you start measuring. Compile you program with division. Run. Measure time. Repeat with shift.

    0 讨论(0)
提交回复
热议问题