I try to profile a simple c prog using valgrind:
[zsun@nel6005001 ~]$ valgrind --tool=memcheck ./fl.out
==2238== Memcheck, a memory error dete
The problem is that you are using valgrind on a program compiled with -pg
. You cannot use valgrind and gprof together. The valgrind manual suggests using OProfile if you are on Linux and need to profile the actual emulation of the program under valgrind.
You are not going to be able to compute 10000!
like that. You will need some sort of bignum
implementation for computing factorials. This is because int
is "usually" 4 bytes long which means that "usually" it can hold 2^32 - 1
(signed int, 2^31
) - 13!
is more than that. Even if you used an unsigned long
("usually" 8 bytes) you'd overflow by the time you reached 21!
.
As for what it "profiling timer expired" means - it means valgrind received the signal SIGPROF
: http://en.wikipedia.org/wiki/SIGPROF (probably means your program took too long).
By the way, this isn't computing factorial.
If you're really trying to find out where the time goes, you could try stackshots. I put an infinite loop around your code and took 10 of them. Here's the code:
6: void forloop(void){
7: int fac=1;
8: int count=5;
9: int i,k;
10:
11: for (i = 1; i <= count; i++){
12: for(k=1;k<=count;k++){
13: fac = fac * i;
14: }
15: }
16: }
17:
18: int main(int argc, char* argv[])
19: {
20: int i;
21: for (;;){
22: forloop();
23: }
24: return 0;
25: }
And here are the stackshots, re-ordered with the most frequent at the top:
forloop() line 12
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 9 bytes
main() line 23
forloop() line 13 + 7 bytes
main() line 23
forloop() line 13 + 3 bytes
main() line 23
forloop() line 6 + 22 bytes
main() line 23
forloop() line 14
main() line 23
forloop() line 7
main() line 23
forloop() line 11 + 9 bytes
main() line 23
What does this tell you? It says that line 12 consumes about 40% of the time, and line 13 consumes about 20% of the time. It also tells you that line 23 consumes nearly 100% of the time.
That means unrolling the loop at line 12 might potentially give you a speedup factor of 100/(100-40) = 100/60 = 1.67x approximately. Of course there are other ways to speed up this code as well, such as by eliminating the inner loop, if you're really trying to compute factorial.
I'm just pointing this out because it's a bone-simple way to do profiling.