This is quite an interesting question so let me set the scene. I work at The National Museum of Computing, and we have just managed to get a Cray Y-MP EL super computer from
You're running one process on a multi-core machine - so it only runs on one core.
The solution is easy enough, since you're just trying to peg the processor - if you have N cores, run your program N times (in parallel, of course).
Here is some code that runs your program NUM_OF_CORES
times in parallel. It's POSIXy code - it uses fork
- so you should run that under Linux. If what I'm reading about the Cray is correct, it might be easier to port this code than the OpenMP code in the other answer.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#define NUM_OF_CORES 8
#define MAX_PRIME 100000
void do_primes()
{
unsigned long i, num, primes = 0;
for (num = 1; num <= MAX_PRIME; ++num) {
for (i = 2; (i <= num) && (num % i != 0); ++i);
if (i == num)
++primes;
}
printf("Calculated %d primes.\n", primes);
}
int main(int argc, char ** argv)
{
time_t start, end;
time_t run_time;
unsigned long i;
pid_t pids[NUM_OF_CORES];
/* start of test */
start = time(NULL);
for (i = 0; i < NUM_OF_CORES; ++i) {
if (!(pids[i] = fork())) {
do_primes();
exit(0);
}
if (pids[i] < 0) {
perror("Fork");
exit(1);
}
}
for (i = 0; i < NUM_OF_CORES; ++i) {
waitpid(pids[i], NULL, 0);
}
end = time(NULL);
run_time = (end - start);
printf("This machine calculated all prime numbers under %d %d times "
"in %d seconds\n", MAX_PRIME, NUM_OF_CORES, run_time);
return 0;
}
$ ./primes
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
This machine calculated all prime numbers under 100000 8 times in 8 seconds
Try to parallelize your program using, e.g., OpenMP. It is a very simple and effective framework for making up parallel programs.
Also be very aware how you're loading the CPU. A CPU can do a lot of different tasks, and while many of them will be reported as "loading the CPU 100%" they may each use 100% of different parts of the CPU. In other words, it's very hard to compare two different CPUs for performance, and especially two different CPU architectures. Executing task A may favor one CPU over another, while executing task B it can easily be the other way around (since the two CPUs may have different resources internally and may execute code very differently).
This is the reason software is just as important for making computers perform optimal as hardware is. This is indeed very true for "supercomputers" as well.
One measure for CPU performance could be instructions per second, but then again instructions aren't created equal on different CPU architectures. Another measure could be cache IO performance, but cache infrastructure is not equal either. Then a measure could be number of instructions per watt used, as power delivery and dissipation is often a limiting factor when designing a cluster computer.
So your first question should be: Which performance parameter is important to you? What do you want to measure? If you want to see which machine gets the most FPS out of Quake 4, the answer is easy; your gaming rig will, as the Cray can't run that program at all ;-)
Cheers, Steen