问题
Description of Problem
Below I have a program that is performing two simple addition and multiplication operations. I am then storing the sum of these two simple operations in two respective variables called total1 and total2. In terms of computation total2 will take more time to be fully executed. The way I implemented the code, I am currently timing the entire simulation of both mathematical operations.
Question
Is it possible to time solely the end result of total1 and total 2 separately? I am asking so as I wish to get the specific time of total1 and total2 in a separate manner.
Purpose of Task
I am fully aware that long long is expensive with regards to memory and is not the most efficient way to save up memory. The sole purpose of this code and question is timing and not code optimization.
C Code
#include <stdio.h>
#include <time.h>
int main()
{
long long total1 = 0, total2 = 0, i = 0;
double simulation_time = 0;
clock_t Start = clock();
do
{
total1 += i + i;
total2 += i * i * i * i;
i++;
} while (i < 1000000000);
clock_t End = clock();
printf("Total 1 = %u \n", total1);
printf("Total 2 = %u \n", total2);
simulation_time = (double)(End - Start) / CLOCKS_PER_SEC;
printf("Runtime of Whole Simulation using clock_t: %f\n", simulation_time);
return 0;
}
回答1:
I am not sure I understand your problem, but to time each operation separately you simply have to make two separate loops.
#include <stdio.h>
#include <time.h>
int main()
{
long long total1 = 0, total2 = 0, i = 0, j = 1000000000;
double simulation_time1, simulation_time2;
clock_t Start, End;
/* addition */
Start = clock();
do
{
total1 += i + i;
i++;
} while (i < j);
End = clock();
simulation_time1 = (double)(End - Start) / CLOCKS_PER_SEC;
/* multiplication */
Start = clock();
do
{
total2 += i * i * i * i;
i++;
} while (i < j);
End = clock();
simulation_time2 = (double)(End - Start) / CLOCKS_PER_SEC;
printf("Total 1 = %u \n", total1);
printf("Total 2 = %u \n", total2);
printf("Runtime of Whole Simulation: %f\n"
"Runtime of Addition: %f\n"
"Runtime of Multiplication: %f\n",
simulation_time1 + simulation_time2,
simulation_time1, simulation_time2);
return 0;
}
回答2:
You have two operations you wish to time separately. The first is accumulation of i+i
, and the second is accumulation of i*i*i*i
.
I am going to assume you are using GCC on x86-64 with -O2
.
If we comment out total2
, the generated assembly for the calculation of total1
is:
movabs rdx, 999999999000000000
Clever compiler! It does the entire computation at compile time. So the time taken by that is basically zero.
If we instead comment out total1
, the assembly for the loop to calculate total2
is:
.L2:
mov rdx, rax
imul rdx, rax ; i squared
add rax, 1
imul rdx, rdx ; i squared squared
add rsi, rdx ; accumulate
cmp rax, 1000000000 ; loop condition
jne .L2
Rather than trying to microbenchmark single lines of code, we can consult Agner Fog's instruction tables: http://www.agner.org/optimize/instruction_tables.pdf
Assuming you're using Intel Haswell, and doing a little port allocation by hand, the tables tell us:
.L2: ; ports cycles latency
mov rdx, rax ; p0 0.25 1
imul rdx, rax ; p1 1 3
add rax, 1 ; p0 0.25 1
imul rdx, rdx ; p1 1 3
add rsi, rdx ; p0 0.25 1
cmp rax, 1000000000 ; p5 0.25 1
jne .L2 ; p6 1-2
Some of those instructions can overlap, so this should be roughly 3-4 core cycles per iteration. On a 3-4 GHz processor, it will take about 1 second to do one billion iterations of the loop.
来源:https://stackoverflow.com/questions/51009275/getting-the-timing-of-a-specific-part-of-code-in-a-loop-in-c