I tried to write my own implementation of the strchr() method.
It now looks like this:
char *mystrchr(const char *s, int c) {
while (*s != (char) c) {
if (!*s++) {
return NULL;
}
}
return (char *)s;
}
The last line originally was
return s;
But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?
I believe this is actually a flaw in the C Standard's definition of the strchr()
function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)
Here's what the C standard says:
char *strchr(const char *s, int c);
The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.
Which means that this program:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *s = "hello";
char *p = strchr(s, 'l');
*p = 'L';
return 0;
}
even though it carefully defines the pointer to the string literal as a pointer to const
char
, has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.
The problem is that strchr()
takes a const char*
argument, which means it promises not to modify the data that s
points to -- but it returns a plain char*
, which permits the caller to modify the same data.
Here's another example; it doesn't have undefined behavior, but it quietly modifies a const
qualified object without any casts (which, on further thought, I believe has undefined behavior):
#include <stdio.h>
#include <string.h>
int main(void) {
const char s[] = "hello";
char *p = strchr(s, 'l');
*p = 'L';
printf("s = \"%s\"\n", s);
return 0;
}
Which means, I think, (to answer your question) that a C implementation of strchr()
has to cast its result to convert it from const char*
to char*
, or do something equivalent.
This is why C++, in one of the few changes it makes to the C standard library, replaces strchr()
with two overloaded functions of the same name:
const char * strchr ( const char * str, int character );
char * strchr ( char * str, int character );
Of course C can't do this.
An alternative would have been to replace strchr
by two functions, one taking a const char*
and returning a const char*
, and another taking a char*
and returning a char*
. Unlike in C++, the two functions would have to have different names, perhaps strchr
and strcchr
.
(Historically, const
was added to C after strchr()
had already been defined. This was probably the only way to keep strchr()
without breaking existing code.)
strchr()
is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:
void *memchr(const void *s, int c, size_t n);
char *strchr(const char *s, int c);
char *strpbrk(const char *s1, const char *s2);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);
(all declared in <string.h>
) and:
void *bsearch(const void *key, const void *base,
size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
(declared in <stdlib.h>
). All these functions take a pointer to const
data that points to the initial element of an array, and return a non-const
pointer to an element of that array.
The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established.
The reationale here is simple: strchr
by itself is a non-modifying operation. Yet we need strchr
functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness.
In C++ you wild be able to use function overloading by declaring two functions with the same name
const char *strchr(const char *s, int c);
char *strchr(char *s, int c);
In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like
const char *strchr_c(const char *s, int c);
char *strchr(char *s, int c);
Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function
char *strchr(const char *s, int c);
which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", i.e. uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly.
I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing.
Now, as for your implementation of strchr
, it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner if
to catch the early termination condition
for (; *s != '\0'; ++s)
if (*s == c)
return (char *) s;
return NULL;
But things like that are always a matter of personal preference. Someone might prefer to just
for (; *s != '\0' && *s != c; ++s)
;
return *s == c ? (char *) s : NULL;
Some might say that modifying function parameter (s
) inside the function is a bad practice.
The const
keyword means that the parameter cannot be modified.
You couldn't return s
directly because s
is declared as const char *s
and the return type of the function is char *
. If the compiler allowed you to do that, it would be possible to override the const
restriction.
Adding a explicit cast to char*
tells the compiler that you know what you're doing (though as Eric explained, it would be better if you didn't do it).
UPDATE: For the sake of context I'm quoting Eric's answer, since he seems to have deleted it:
You should not be modifying s since it is a const char *.
Instead, define a local variable that represents the result of type char * and use that in place of s in the method body.
The Function Return Value should be a Constant Pointer to a Character:
strchr
accepts a const char*
and should return const char*
also. You are returning a non constant which is potentially dangerous since the return value points into the input character array (the caller might be expecting the constant argument to remain constant, but it is modifiable if any part of it is returned as as a char *
pointer).
The Function return Value should be NULL if No matching Character is Found:
Also strchr
is supposed to return NULL
if the sought character is not found. If it returns non-NULL when the character is not found, or s in this case, the caller (if he thinks the behavior is the same as strchr)
might assume that the first character in the result actually matches (without the NULL return value
there is no way to tell whether there was a match or not).
(I'm not sure if that is what you intended to do.)
Here is an Example of a Function that Does This:
I wrote and ran several tests on this function; I added a few really obvious sanity checks to avoid potential crashes:
const char *mystrchr1(const char *s, int c) {
if (s == NULL) {
return NULL;
}
if ((c > 255) || (c < 0)) {
return NULL;
}
int s_len;
int i;
s_len = strlen(s);
for (i = 0; i < s_len; i++) {
if ((char) c == s[i]) {
return (const char*) &s[i];
}
}
return NULL;
}
来源:https://stackoverflow.com/questions/14367727/how-does-strchr-implementation-work