问题
Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.
Initially I thought that program1 and program2 would give me the same result.
//Program 1
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
//Output: -4
//Program 2
printf("%d", strcmp("abcd", "efgh"));
//Output: -1
Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char *
as the argument of the strcmp()
function.
Why there is a difference between the behaviour of these seemingly same program?
Platform: Linux mint compiler: g++
Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.
回答1:
This is your C code:
int x1()
{
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
}
int x2()
{
printf("%d", strcmp("abcd", "efgh"));
}
And this is the generated assembly output for both functions:
.LC0:
.string "abcd"
.LC1:
.string "efgh"
.LC2:
.string "%d"
x1:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-16], OFFSET FLAT:.LC1
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call strcmp // the strcmp function is actually called
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
leave
ret
x2:
push rbp
mov rbp, rsp
mov esi, -1 // strcmp is never called, the compiler
// knows what the result will be and it just
// uses -1
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
pop rbp
ret
When the compiler sees strcmp("abcd", "efgh")
it knows the result beforehand, because it knows that "abcd"
comes before "efgh"
.
But if it sees strcmp(a,b)
it does not know and hence generates code that actually calls strcmp
.
With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.
回答2:
It is indeed surprising that strcmp
returns 2 different values for these calls, but it is not incompatible with the C Standard:
strcmp()
returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.
As pointed by others, the code generated for the different calls is different:
- the compiler generates a call to the library function in the first program
- the compiler is able to determine the result of the comparison and generates an explicit result of
-1
for the second case where both arguments are string literals.
In order to perform this compile time evaluation, strcmp
must be defined in a subtile way in <string.h>
so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.
Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf
to generate the even more compact code puts("-1");
. They seem to convert printf
to puts
only for string literal formats without arguments.
回答3:
I believe (would need to see (and interpret) machine code) one version works without calling code in the library (as if you wrote printf("%d", -1);
).
来源:https://stackoverflow.com/questions/60306258/ambiguous-behaviour-of-strcmp