I have a C Function which tries to copy a framebuffer to FSMC RAM.
The functions eats the frame rate of the game loop to 10FPS. I would like to know how to analyze
Not exactly answering your question, but I see you aspire for fast execution of the loops.
Here are some tips from the book: 'ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design)' http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745
Chapter 5 contains section named 'C looping structures'. Here is the summary of the section:
Writing Loops Efficiently
Based on the summary, your inner loop might look as below.
uinsigned int i = 240/4; // Use unsigned loop counters by default
// and the continuation condition i!=0
do
{
// Unroll important loops to reduce the loop overhead
LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
}
while ( i != 0 ) // Use do-while loops rather than for
// loops when you know the loop will
// iterate at least once
You might want to experiment also with 'pragmas', e.g. :
#pragma Otime
http://www.keil.com/support/man/docs/armcc/armcc_chr1359124989673.htm
#pragma unroll(n)
http://www.keil.com/support/man/docs/armcc/armcc_chr1359124992247.htm
And as it is Cortex-M3 try to find out if MCU hardware gives you chance to arrange the code/data to take advantage of its Harvard architecture (I experienced 30% speed increase).
see here my other answer
Maybe not everything may be applicable in your application (filling a buffer in reverse order). I just wanted to draw your attention to the book and possible points for optimization.