Optimising this C (AVR) code

这一生的挚爱 提交于 2019-12-05 01:58:29
ndim

I can see a few areas to start working on, listed in no particular order:

1. Reduce the number of registers to push, as each push/pop pair takes four cycles. For example, avr-gcc allows you to remove a few registers from its register allocator, so you can just use them for register variables in that single ISR and be sure they still contain the value from last time. You might also get rid of the pushing of r1 and eor r1,r1 if your program never sets r1 to anything but 0.

2. Use a local temporary variable for the new value of the array index to save unnecessary load and store instructions to that volatile variable. Something like this:

volatile uint8_t amplitudePlace;

ISR() {
    uint8_t place = amplitudePlace;
    [ do all your stuff with place to avoid memory access to amplitudePlace ]
    amplitudePlace = place;
}

3. Count backwards from 59 to 0 instead of from 0 to 59 to avoid the separate comparison instruction (comparison with 0 happens anyway in subtraction). Pseudo code:

     sub  rXX,1
     goto Foo if non-zero
     movi rXX, 59
Foo:

instead of

     add  rXX,1
     compare rXX with 60
     goto Foo if >=
     movi rXX, 0
Foo:

4. Perhaps use pointers and pointer comparisons (with precalculated values!) instead of array indexes. It needs to be checked versus counting backwards which one is more efficient. Maybe align the arrays to 256 byte boundaries and use only 8-bit registers for the pointers to save on loading and saving the higher 8 bits of the addresses. (If you are running out of SRAM, you can still fit the content of 4 of those 60 byte arrays into one 256 byte array and still get the advantage of all addresses consisting of 8 constant high bits and the 8 variable lower bits.)

uint8_t array[60];
uint8_t *idx = array; /* shortcut for &array[0] */
const uint8_t *max_idx = &array[59];

ISR() {
    PORTFOO = *idx;
    ++idx;
    if (idx > max_idx) {
        idx = array;
    }
}

The problem is that pointers are 16 bit whereas your simple array index formerly was 8 bit in size. Helping with that might be a trick if you design your array addresses such that the higher 8 bits of the address are constants (in assembly code, hi8(array)), and you only deal with the lower 8 bits that actually change in the ISR. That does mean writing assembly code, though. The generated assembly code from above might be a good starting point for writing that version of the ISR in assembly.

5. If feasible from a timing point of view, adjust the sample buffer size to a power of 2 to replace the if-reset-to-zero part with a simple i = (i+1) & ((1 << POWER)-1);. If you want to go with the 8-bit/8-bit address split proposed in 4., perhaps even going to 256 for the power of two (and duplicating sample data as necessary to fill the 256 byte buffer) will even save you the AND instruction after the ADD.

6. In case the ISR only uses instructions which do not affect the status register, stop push and popping SREG.

General

The following might come in handy especially for manually checking all the other assembly code for assumptions:

firmware-%.lss: firmware-%.elf
        $(OBJDUMP) -h -S $< > $@

This generates a commented complete assembly language listing of the whole firmware image. You can use that to verify register (non-)usage. Note that startup code only run once long before you first enable interrupts will not interfere with your ISR's later exclusive use of registers.

If you decide to not write that ISR in assembly code directly, I would recommend you write the C code and check the generated assembly code after every compilation, in order to immediately observe what your changes end up generating.

You might end up writing a dozen or so variants of the ISR in C and assembly, adding up the cycles for each variant, and then chosing the best one.

Note Without doing any register reservation, I end up with something around 31 cycles for the ISR (excluding entering and leaving, which adds another 8 or 10 cycles). Completely getting rid of the register pushing would get the ISR down to 15 cycles. Changing to a sample buffer with a constant size of 256 bytes and giving the ISR exclusive use of four registers allows getting down to 6 cycles being spent in the ISR (plus 8 or 10 to enter/leave).

I'd say the best thing would be to write your ISR in pure assembler. It's very short and simple code, and you have the existing disassembler to guide you. But for something of this nature, you ought to be able to do better: e.g. use fewer registers, to save on push and pop; re-factor it so that it's not loading amplitudePlace from memory three separate times, etc.

Must you share all those variables with the rest of the program? Since every such variable you share must be volatile, the compiler isn't allowed optimize it. At least amplitudePlace looks like it could be changed to a local static variable, and then the compiler may be able to optimize it further.

To clarify, your interrupt should be this:

ISR(TIMER1_COMPA_vect) 
{
    PORTD = amplitudes[amplitudePlace++];
    amplitudePlace &= 63;
}

This will require your table to be 64 entries long. If you can choose the address of your table, you can get away with a single pointer, increment it, & it with 0xffBf.

If using variables instead of fixed constant is slowing things down, you can replace the pointer variable with a specific array:

PORTD = amplitudes13[amplitudePlace++];

Then you change the interrupt pointer to use a different function for each waveform. This is not likely to be a big savings, but we're getting down to 10's of cycles total.

As for the register usage thing. Once you get a really simple ISR like this, you can check the prolog and epilog of the ISR which push and pop the processor state. If your ISR only uses 1 register, you can do it in assembler and only save and restore that one register. This will reduce the interrupt overhead without affecting the rest of the program. Some compilers might do this for you, but I doubt it.

If there is time and space you can also create a long table and replace the ++ with +=freq where freq will cause the waveform to be an integer multiple of the base frequency (2x,3x,4x etc...) by skipping that many samples.

Instead of stepping through the table one entry at a time with varying interrupt rates, have you considered turning the problem around and stepping at a variable rate with a fixed interrupt frequency? That way the ISR itself would be heavier but you may afford to run it at a lower rate. Plus, with a little fixed-point arithmetic you can easily generate a wider spectrum of frequencies without messing around with multiple tables.

Anyway, there are a hundred and one ways of cheating to save cycles for this type of problem, if you can afford to bend your requirements a little to suite the hardware. For instance you could chain your timer's output to clock another hardware timer, and use the second timer's counter as your table index. You might reserve global registers or abuse unused I/Os to store variables. You can look up two entries at a time (or interpolate) in your COMPA interrupt and set up a tiny second COMPB interrupt in between to emit the buffered entry. And so on, and so forth.

With a little hardware abuse and carefully crafted assembly code you should be able to do this in 15 cycles or so without too much trouble. Whether you can make it play nice with the rest of the system is another question.

Maybe it suffices to get rid of the conditional and the comparison all together by using an arithmetic expression:

ISR(TIMER1_COMPA_vect) 
{
        PORTD = amplitudes[amplitudePlace];

        amplitudePlace = (amplitudePlace + 1) % numOfAmps;
}

If your CPU executes the modulo operation with reasonable speed, this should be much faster. If it still doesn't suffice, try writing this version in assembler.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!