How does strchr implementation work

冷暖自知 提交于 2019-11-27 01:21:35

问题


I tried to write my own implementation of the strchr() method.

It now looks like this:

char *mystrchr(const char *s, int c) {
    while (*s != (char) c) {
        if (!*s++) {
            return NULL;
        }
    }
    return (char *)s;
}

The last line originally was

return s;

But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?


回答1:


I believe this is actually a flaw in the C Standard's definition of the strchr() function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)

Here's what the C standard says:

char *strchr(const char *s, int c);

The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.

Which means that this program:

#include <stdio.h>
#include <string.h>

int main(void) {
    const char *s = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    return 0;
}

even though it carefully defines the pointer to the string literal as a pointer to const char, has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.

The problem is that strchr() takes a const char* argument, which means it promises not to modify the data that s points to -- but it returns a plain char*, which permits the caller to modify the same data.

Here's another example; it doesn't have undefined behavior, but it quietly modifies a const qualified object without any casts (which, on further thought, I believe has undefined behavior):

#include <stdio.h>
#include <string.h>

int main(void) {
    const char s[] = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    printf("s = \"%s\"\n", s);
    return 0;
}

Which means, I think, (to answer your question) that a C implementation of strchr() has to cast its result to convert it from const char* to char*, or do something equivalent.

This is why C++, in one of the few changes it makes to the C standard library, replaces strchr() with two overloaded functions of the same name:

const char * strchr ( const char * str, int character );
      char * strchr (       char * str, int character );

Of course C can't do this.

An alternative would have been to replace strchr by two functions, one taking a const char* and returning a const char*, and another taking a char* and returning a char*. Unlike in C++, the two functions would have to have different names, perhaps strchr and strcchr.

(Historically, const was added to C after strchr() had already been defined. This was probably the only way to keep strchr() without breaking existing code.)

strchr() is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:

void *memchr(const void *s, int c, size_t n);
char *strchr(const char *s, int c);
char *strpbrk(const char *s1, const char *s2);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);

(all declared in <string.h>) and:

void *bsearch(const void *key, const void *base,
    size_t nmemb, size_t size,
    int (*compar)(const void *, const void *));

(declared in <stdlib.h>). All these functions take a pointer to const data that points to the initial element of an array, and return a non-const pointer to an element of that array.




回答2:


The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established.

The reationale here is simple: strchr by itself is a non-modifying operation. Yet we need strchr functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness.

In C++ you wild be able to use function overloading by declaring two functions with the same name

const char *strchr(const char *s, int c);
char *strchr(char *s, int c);

In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like

const char *strchr_c(const char *s, int c);
char *strchr(char *s, int c);

Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function

char *strchr(const char *s, int c);

which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", i.e. uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly.

I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing.

Now, as for your implementation of strchr, it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner if to catch the early termination condition

for (; *s != '\0'; ++s)
  if (*s == c)
    return (char *) s;

return NULL;

But things like that are always a matter of personal preference. Someone might prefer to just

for (; *s != '\0' && *s != c; ++s)
  ;

return *s == c ? (char *) s : NULL;

Some might say that modifying function parameter (s) inside the function is a bad practice.




回答3:


The const keyword means that the parameter cannot be modified.

You couldn't return s directly because s is declared as const char *s and the return type of the function is char *. If the compiler allowed you to do that, it would be possible to override the const restriction.

Adding a explicit cast to char* tells the compiler that you know what you're doing (though as Eric explained, it would be better if you didn't do it).

UPDATE: For the sake of context I'm quoting Eric's answer, since he seems to have deleted it:

You should not be modifying s since it is a const char *.

Instead, define a local variable that represents the result of type char * and use that in place of s in the method body.




回答4:


The Function Return Value should be a Constant Pointer to a Character:

strchr accepts a const char* and should return const char* also. You are returning a non constant which is potentially dangerous since the return value points into the input character array (the caller might be expecting the constant argument to remain constant, but it is modifiable if any part of it is returned as as a char * pointer).

The Function return Value should be NULL if No matching Character is Found:

Also strchr is supposed to return NULL if the sought character is not found. If it returns non-NULL when the character is not found, or s in this case, the caller (if he thinks the behavior is the same as strchr) might assume that the first character in the result actually matches (without the NULL return value there is no way to tell whether there was a match or not).

(I'm not sure if that is what you intended to do.)

Here is an Example of a Function that Does This:

I wrote and ran several tests on this function; I added a few really obvious sanity checks to avoid potential crashes:

const char *mystrchr1(const char *s, int c) {
    if (s == NULL) {
        return NULL;
    }
    if ((c > 255) || (c < 0)) {
        return NULL;
    }
    int s_len;
    int i;
    s_len = strlen(s);
    for (i = 0; i < s_len; i++) {
        if ((char) c == s[i]) {
            return (const char*) &s[i];
        }
    }
    return NULL;
}


来源:https://stackoverflow.com/questions/14367727/how-does-strchr-implementation-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!