As mentioned in the title, I\'m looking for something that can give me more performance than atoi. Presently, the fastest way I know is
atoi(mystring.c_str(
Quite a few of the code examples here are quite complex and do unnecessary work, meaning the code could be slimmer and faster.
Conversion loops are often written to do three different things with each character:
First observation: there is no need to check for the end-of-string character separately, since it is not a digit. Hence the check for 'digitness' covers the EOS condition implicitly.
Second observation: double conditions for range testing as in (c >= '0' && c <= '9')
can be converted to a single test condition by using an unsigned type and anchoring the range at zero; that way there can be no unwanted values below the beginning of the range, all unwanted values are mapped to the range above the upper limit: (uint8_t(c - '0') <= 9)
It just so happens that c - '0'
needs to be computed here anyway...
Hence the inner conversion loop can be slimmed to
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
The code here is called with the precondition that p
be pointing at a digit, which is why the first digit is extracted without further ado (which also avoids a superfluous MUL).
That precondition is less outlandish than might appear at first, since p
pointing at a digit is the reason why this code is called by the parser in the first place. In my code the whole shebang looks like this (assertions and other production-quality noise elided):
unsigned digit_value (char c)
{
return unsigned(c - '0');
}
bool is_digit (char c)
{
return digit_value(c) <= 9;
}
uint64_t extract_uint64 (char const **read_ptr)
{
char const *p = *read_ptr;
uint64_t n = digit_value(*p);
unsigned d;
while ((d = digit_value(*++p)) <= 9)
{
n = n * 10 + d;
}
*read_ptr = p;
return n;
}
The first call to digit_value()
is often elided by the compiler, if the code gets inlined and the calling code has already computed that value by calling is_digit()
.
n * 10
happens to be faster than manual shifting (e.g. n = (n << 3) + (n << 1) + d
), at least on my machine with gcc 4.8.1 and VC++ 2013. My guess is that both compilers use LEA
with index scaling for adding up to three values in one go and scaling one of them by 2, 4, or 8.
In any case that's exactly how it should be: we write nice clean code in separate functions and express the desired logic (n * 10, x % CHAR_BIT, whatever) and the compiler converts it to shifting, masking, LEAing and so on, inlines everything into the big bad parser loop and takes care of all the required messiness under the hood to make things fast. We don't even have to stick inline
in front of everything anymore. If anything then we have to do the opposite, by using __declspec(noinline)
judiciously when compilers get over-eager.
I'm using the above code in a program that reads billions of numbers from text files and pipes; it converts 115 million uints per second if the length is 9..10 digits, and 60 million/s for length 19..20 digits (gcc 4.8.1). That's more than ten times as fast as strtoull()
(and just barely enough for my purposes, but I digress...). That's the timing for converting text blobs containing 10 million numbers each (100..200 MB), meaning that memory timings make these numbers appear a bit worse than they would be in a synthetic benchmark running from cache.
Here's the entirety of the atoi function in gcc:
long atoi(const char *str)
{
long num = 0;
int neg = 0;
while (isspace(*str)) str++;
if (*str == '-')
{
neg=1;
str++;
}
while (isdigit(*str))
{
num = 10*num + (*str - '0');
str++;
}
if (neg)
num = -num;
return num;
}
The whitespace and negative check are superfluous in your case, but also only use nanoseconds.
isdigit is almost certainly inlined, so that's not costing you any time.
I really don't see room for improvement here.
Why not use a stringstream? I'm not sure of its particular overhead, but you could define:
int myInt;
string myString = "1561";
stringstream ss;
ss(myString);
ss >> myInt;
Of course, you'd need to
#include <stringstream>
A faster convert function only for positive integers without error checking.
Multiplication is always slower that sum and shift, therefore change multiply with shift.
int fast_atoi( const char * str )
{
int val = 0;
while( *str ) {
val = (val << 3) + (val << 1) + (*str++ - '0');
}
return val;
}
Here is mine. Atoi is the fastest I could come up with. I compiled with msvc 2010 so it might be possible to combine both templates. In msvc 2010, when I combined templates it made the case where you provide a cb argument slower.
Atoi handles nearly all the special atoi cases, and is as fast or faster than this:
int val = 0;
while( *str )
val = val*10 + (*str++ - '0');
Here is the code:
#define EQ1(a,a1) (BYTE(a) == BYTE(a1))
#define EQ1(a,a1,a2) (BYTE(a) == BYTE(a1) && EQ1(a,a2))
#define EQ1(a,a1,a2,a3) (BYTE(a) == BYTE(a1) && EQ1(a,a2,a3))
// Atoi is 4x faster than atoi. There is also an overload that takes a cb argument.
template <typename T>
T Atoi(LPCSTR sz) {
T n = 0;
bool fNeg = false; // for unsigned T, this is removed by optimizer
const BYTE* p = (const BYTE*)sz;
BYTE ch;
// test for most exceptions in the leading chars. Most of the time
// this test is skipped. Note we skip over leading zeros to avoid the
// useless math in the second loop. We expect leading 0 to be the most
// likely case, so we test it first, however the cpu might reorder that.
for ( ; (ch=*p-'1') >= 9 ; ++p) { // unsigned trick for range compare
// ignore leading 0's, spaces, and '+'
if (EQ1(ch, '0'-'1', ' '-'1', '+'-'1'))
continue;
// for unsigned T this is removed by optimizer
if (!((T)-1 > 0) && ch==BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
// atoi ignores these. Remove this code for a small perf increase.
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r. unsigned trick for range compare
break;
}
// deal with rest of digits, stop loop on non digit.
for ( ; (ch=*p-'0') <= 9 ; ++p) // unsigned trick for range compare
n = n*10 + ch;
// for unsigned T, (fNeg) test is removed by optimizer
return (fNeg) ? -n : n;
}
// you could go with a single template that took a cb argument, but I could not
// get the optimizer to create good code when both the cb and !cb case were combined.
// above code contains the comments.
template <typename T>
T Atoi(LPCSTR sz, BYTE cb) {
T n = 0;
bool fNeg = false;
const BYTE* p = (const BYTE*)sz;
const BYTE* p1 = p + cb;
BYTE ch;
for ( ; p<p1 && (ch=*p-'1') >= 9 ; ++p) {
if (EQ1(ch,BYTE('0'-'1'),BYTE(' '-'1'),BYTE('+'-'1')))
continue;
if (!((T)-1 > 0) && ch == BYTE('-'-'1')) {
fNeg = !fNeg;
continue;
}
if (BYTE(*p-9) > 4) // \t, \n, 11, 12, \r
break;
}
for ( ; p<p1 && (ch=*p-'0') <= 9 ; ++p)
n = n*10 + ch;
return (fNeg) ? -n : n;
}
This page compares conversion speed between different string->int functions using different compilers. The naive function, which offers no error checking, offers speeds roughly twice as fast as atoi(), according to the results presented.
// Taken from http://tinodidriksen.com/uploads/code/cpp/speed-string-to-int.cpp
int naive(const char *p) {
int x = 0;
bool neg = false;
if (*p == '-') {
neg = true;
++p;
}
while (*p >= '0' && *p <= '9') {
x = (x*10) + (*p - '0');
++p;
}
if (neg) {
x = -x;
}
return x;
}
it is always positive
Remove the negative checks in the above code for a micro optimization.
If you can guarantee the string will not have anything but numeric characters, you can micro optimize further by changing the loop
while (*p >= '0' && *p <= '9') {
to
while (*p != '\0' ) {
Which leaves you with
unsigned int naive(const char *p) {
unsigned int x = 0;
while (*p != '\0') {
x = (x*10) + (*p - '0');
++p;
}
return x;
}