UVA Problem no. 10055, Hashmat the Brave Warrior, probably the easiest problem there. The input consists of a series of pairs of unsigned integers ≤ 2^32 (thus mandating the use of 64bit integers…) For each pair the task is to print out the difference between the greater and the lesser integer.
According to the statistics, the fastest solutions run in below 0.01 sec. However, all my attempts to solve this typically run in 0.02 sec, with probably random deviations of ± 0.01 sec.
I tried:
#include <cstdint>
#include <iostream>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
uint_fast64_t i, j;
while(cin >> i >> j) {
if(i > j)
cout << i-j << '\n';
else
cout << j-i << '\n';
}
}
And also:
#include <cstdlib>
#include <cstdint>
#include <iostream>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int_fast64_t i, j;
while(cin >> i >> j) {
cout << abs(i-j) << '\n';
}
}
And also:
#include <algorithm>
#include <cstdint>
#include <iostream>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
uint_fast64_t i, j;
while(cin >> i >> j) {
cout << max(i,j)-min(i,j) << '\n';
}
}
All with same results.
I also tried using printf()
/scanf()
instead of cin/cout
, still with same results (besides, my benchmarks were showing that cin/cout
preceded by cin.tie(nullptr)
can be even a little faster than printf()/scanf()
– at least unless there are some ways to optimize the performance of cstdio
I’m not aware of).
Is there any way to optimize this down to below 0.01 sec., or should I assume that guys who’ve achieved this time are either extremely lucky or cheaters printing out a precomputed answer to the judge’s input?
The programs are compiled with C++11 5.3.0 - GNU C++ Compiler with options: -lm -lcrypt -O2 -std=c++11 -pipe -DONLINE_JUDGE
.
EDIT: This is my attempt to combine the advices of @Sorin and @MSalters:
#include <stdio.h>
#include <stdint.h>
unsigned long long divisors[] = {
1000000000,
1000000000,
1000000000,
1000000000,
100000000,
100000000,
100000000,
10000000,
10000000,
10000000,
1000000,
1000000,
1000000,
1000000,
100000,
100000,
100000,
10000,
10000,
10000,
1000,
1000,
1000,
1000,
100,
100,
100,
10,
10,
10,
1,
1,
1
};
int main()
{
unsigned long long int i, j, res;
unsigned char inbuff[2500000]; /* To be certain there's no overflow here */
unsigned char *in = inbuff;
char outbuff[2500000]; /* To be certain there's no overflow here */
char *out = outbuff;
int c = 0;
while(1) {
i = j = 0;
inbuff[fread(inbuff, 1, 2500000, stdin)] = '\0';
/* Skip whitespace before first number and check if end of input */
do {
c = *(in++);
} while(c != '\0' && !(c >= '0' && c <= '9'));
/* If end of input, print answer and return */
if(c == '\0') {
*(--out) = '\0';
puts(outbuff);
return 0;
}
/* Read first integer */
do {
i = 10 * i + (c - '0');
c = *(in++);
} while(c >= '0' && c <= '9');
/* Skip whitespace between first and second integer */
do {
c = *(in++);
} while(!(c >= '0' && c <= '9'));
/* Read second integer */
do {
j = 10 * j + (c - '0');
c = *(in++);
} while(c >= '0' && c <= '9');
if(i > j)
res = i-j;
else
res = j-i;
/* Buffer answer */
if(res == 0) {
*(out++) = '0';
} else {
unsigned long long divisor = divisors[__builtin_clzll(res)-31];
/* Skip trailing 0 */
if(res < divisor) {
divisor /= 10;
}
/* Buffer digits */
while(divisor != 0) {
unsigned long long digit = res / divisor;
*(out++) = digit + '0';
res -= divisor * digit;
divisor /= 10;
}
}
*(out++) = '\n';
}
}
Still 0.02sec.
I would try to eliminate IO operations. Read one block of data (as big as you can). Compute the outputs, write them to another string then write that string out.
You sscanf or stringstream equivalents to read/write from your memory blocks.
IO usually needs to go through the kernel so there's a small chance that you would loose the CPU for a bit. There's also some cost(time) associated with it. It's small but you are trying to run in less than 10ms.
printf
is a swiss army knife. It knows many ways to format its arguments, and that can be any number. In this case, you want a single dedicated function, so you don't wast time scanning for the single occurrence of %d
. (BTW, this is a speed benefit of std::cout <<
- the compiler sorts out the overloading at compile time).
Once you have that single formatting function, make it output to a single char[]
and call puts
on that. As puts
does no formatting of its own, it can be much faster than printf
.
Here is my variant with assembler routines.
#include <iostream>
#include <string>
using namespace std;
int main()
{
unsigned long long i, j;
string outv;
while(cin >> i >> j) {
asm("movq %0, %%rax;"
"movq %1, %%rdx;"
"subq %%rax, %%rdx;"
"jns .L10;"
"notq %%rdx;"
"addq $0b1, %%rdx;"
".L10: movq %%rdx, %0": : "g"(i), "g"(j) );
string str = to_string(i);
outv += str + "\n";
}
cout << outv;
}
The trick is using :
unsafe Input : https://www.quora.com/What-is-the-fastest-input-output-method-in-C++ . On Windows use Microsoft Thread unsafe version https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getchar-nolock-getwchar-nolock , as this Codeforces submission: http://codeforces.com/contest/339/submission/27533017 .
On Linux and Mac OS, for GCC and clang use https://linux.die.net/man/3/unlocked_stdio POSIX Standard thread unsafe version (Unlocked Stdio).
Custom input: or sometimes called Naive Input is faster than standard functions. It is about getting characters from input and converting it to integer. To optimize inputting from console, read: http://stackoverflow.com/questions/705303/faster-i-o-in-c/705378 . To optimize string to integer, read article: http://tinodidriksen.com/2010/02/16/cpp-convert-string-to-int-speed/ , and read code: http://tinodidriksen.com/uploads/code/cpp/speed-string-to-int.cpp . For Speed comparison read: http://codeforces.com/blog/entry/5217 and code: https://bitbucket.org/andreyv/cppiotest/src/tip/iotest.cpp?fileviewer=file-view-default
This solution which runs in less than 0.001 seconds , is based on UVa Online Judge submission http://ideone.com/ca8sDu that was solved by http://uhunt.felix-halim.net/id/779215 ; However this Solution is Abridged and modified #include
#define pll(n) printf("%lld ",(n))
#define plln(n) printf("%lld\n",(n))
typedef long long ll;
#if defined(_WINDOWS) // On Windows GCC, use the slow thread safe version
inline int getchar_unlocked() {
return getchar();
}
#elif defined (_MSC_VER)// On Visual Studio
inline int getchar_unlocked(){
return _getchar_nolock(); // use Microsoft Thread unsafe version
}
#endif
inline int scn( ll & n){
n = 0;
int c = getchar_unlocked(),t=0;
if (c == EOF)
return 0;
while(c < '0' || c > '9') {
if(c==45)
t=1;
c = getchar_unlocked();
}
while(c >= '0' && c <= '9'){
n = n *10+ c - '0';
c = getchar_unlocked();
}
if(t!=0)
n *=-1;
return 1;
}
int main(){
ll n, m;
while(scn(n)+scn(m)==2){
if (n>m)
plln(n - m);
else
plln(m - n);
}
return 0;
}
来源:https://stackoverflow.com/questions/44108745/how-to-optimize-printing-out-the-difference-between-the-greater-and-the-lesser-o