I\'m learning C, and am currently studying String Handling. From where I\'m studying, strcmp()
is defined as-
This is a function which c
Here is a simple implementation of strcmp()
in C from libc from Apple:
int
strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}
FreeBSD's libc implementation:
int
strcmp(const char *s1, const char *s2)
{
while (*s1 == *s2++)
if (*s1++ == '\0')
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
Here is the implementation from GNU libc, which returns the difference between characters:
int
strcmp (p1, p2)
const char *p1;
const char *p2;
{
const unsigned char *s1 = (const unsigned char *) p1;
const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);
return c1 - c2;
}
That's why most comparisons that I've read are written in < 0
, == 0
and > 0
if it does not need to know the exact difference between the characters in string.
0, 1, -1
are like standard values; however you should think about these like: zero, positive, negative
.
In that case, the meanings are:
Zero
(0) means that strings are equal.Negative
(-1 or any other) means that first string is less.Positive
(1 or any other) means that first string is more.The C language specification is a document written in English.
The member of the standardization committee carefully choose their words to permit implementors to make their own implementation choices.
On some hardware (or implementation), returning any integers (respecting the constraints of the specification) could be faster (or simpler, or smaller code) than returning only -1, 0, 1 (like the function proposed in dvm's answer). FWIW, musl-libc's strcmp.c is shorter, and can return integers outside of -1, 0, 1; but it is conforming to the standard.
BTW, with GCC & GNU libc (e.g. on most Linux systems) the strcmp
function may be handled -notably when optimizing- as a compiler builtin - __builtin_strcmp
... It can then be sometimes replaced by some very efficient code.
Try compiling the following function (in a file abc.c
)
#include <string.h>
int isabc(const char*s) { return strcmp(s, "abc"); }
with optimizations enabled and look at the assembly code. On my Debian/Sid/x86-64 with GCC 4.9.1, compiling with gcc -fverbose-asm -S -O2 abc.c
I see no function calls at all in the produced abc.s
(but that isabc
may return other numbers than -1, 0, 1).
You should care about portable code, hence you should not expect a particular value (as long as your vendor's strcmp
obeys its imprecise and fuzzy specification)
Read also about undefined behavior, it is a related idea: the language specification is voluntarily imprecise to permit various implementors to do different implementation choices
Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.
And you write:
So, after reading all this, I'm inclined to think that 0, 1 or -1 are the only possible outcomes the strcmp() function.
Why? It's exactly that the actual value of the returned integer is not specified, only its sign.