Which string manipulation functions should I use?

后端 未结 6 825
暗喜
暗喜 2021-02-13 00:01

On my Windows/Visual C environment there\'s a wide number of alternatives for doing the same basic string manipulation tasks.

For example, for doing a string copy I coul

6条回答
  •  忘掉有多难
    2021-02-13 00:22

    Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.

    If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.

    If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:

    strcpy(out, str1);
    strcat(out, str2);
    strcat(out, str3);
    ...
    

    You should be doing something like:

    size_t l, n = outsize;
    char *s = out;
    
    l = strlen(str1);
    if (l>=outsize) goto error;
    strcpy(s, str1);
    s += l;
    n -= l;
    
    l = strlen(str2);
    if (l>=outsize) goto error;
    strcpy(s, str2);
    s += l;
    n -= l;
    
    ...
    

    Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.

    Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:

    if (!my_strcpy(&s, &n, str1)) goto error;
    

    Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.

    Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:

    char *tmp = malloc(match.end-match.start+1);
    memcpy(tmp, src+match.start, match.end-match.start);
    tmp[match.end-match.start] = 0;
    printf("%s\n", tmp);
    free(tmp);
    

    The same thing can be accomplished with:

    printf("%.*s\m", match.end-match.start, src+match.start);
    

    No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

提交回复
热议问题