Which string manipulation functions should I use?

后端 未结 6 2214
轮回少年
轮回少年 2021-02-13 00:07

On my Windows/Visual C environment there\'s a wide number of alternatives for doing the same basic string manipulation tasks.

For example, for doing a string copy I coul

相关标签:
6条回答
  • 2021-02-13 00:13

    Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.

    If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.

    If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:

    strcpy(out, str1);
    strcat(out, str2);
    strcat(out, str3);
    ...
    

    You should be doing something like:

    size_t l, n = outsize;
    char *s = out;
    
    l = strlen(str1);
    if (l>=outsize) goto error;
    strcpy(s, str1);
    s += l;
    n -= l;
    
    l = strlen(str2);
    if (l>=outsize) goto error;
    strcpy(s, str2);
    s += l;
    n -= l;
    
    ...
    

    Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.

    Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:

    if (!my_strcpy(&s, &n, str1)) goto error;
    

    Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.

    Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:

    char *tmp = malloc(match.end-match.start+1);
    memcpy(tmp, src+match.start, match.end-match.start);
    tmp[match.end-match.start] = 0;
    printf("%s\n", tmp);
    free(tmp);
    

    The same thing can be accomplished with:

    printf("%.*s\m", match.end-match.start, src+match.start);
    

    No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

    0 讨论(0)
  • 2021-02-13 00:15

    I would suggest using functions from the standard library, or functions from cross-platform libraries.

    0 讨论(0)
  • 2021-02-13 00:24

    First of all, let's review pros and cons of each function set:

    ANSI C standard library function (CRT)

    Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, may it be a wise thing to have a separation of portable vs. OS-dependent code.
    These functions have often assembly level optimization and are therefore very fast.
    There are some drawbacks:

    • they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
    • there are some archaisms like the infamous strncpy

    Kernel32 string functions

    Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:

    • avoiding the CRT payload for an ultra lightweight executable (unusual these days but not 10 years ago!)
    • avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).

    Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 9 optimized for a Core i13, kernel32 could use an assembly-optimized version.

    Shell Lightweight Utility Functions

    Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.

    StrSafe Function

    The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).

    Security-Enhanced CRT functions

    These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:

    • although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
    • when fail, they don't just return an error code but execute an invalid parameter handler

    Conclusions

    While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.

    In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.

    0 讨论(0)
  • 2021-02-13 00:26

    My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.

    0 讨论(0)
  • 2021-02-13 00:34

    I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.

    Then if your code just has to run under Windows you can use the "safe string" variants.

    If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g glib or libapr or other "safe string libraries" like e.g: SafeStrLibrary

    0 讨论(0)
  • 2021-02-13 00:38

    I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.

    But these are just tips, it's a subjective question.

    0 讨论(0)
提交回复
热议问题