strcpy() return value

前端 未结 6 757
攒了一身酷
攒了一身酷 2020-11-30 02:56

A lot of the functions from the standard C library, especially the ones for string manipulation, and most notably strcpy(), share the following prototype:

ch         


        
相关标签:
6条回答
  • 2020-11-30 03:40

    char *stpcpy(char *dest, const char *src); returns a pointer to the end of the string, and is part of POSIX.1-2008. Before that, it was a GNU libc extension since 1992. If first appeared in Lattice C AmigaDOS in 1986.

    gcc -O3 will in some cases optimize strcpy + strcat to use stpcpy or strlen + inline copying, see below.


    C's standard library was designed very early, and it's very easy to argue that the str* functions are not optimally designed. The I/O functions were definitely designed very early, in 1972 before C even had a preprocessor, which is why fopen(3) takes a mode string instead of a flag bitmap like Unix open(2).

    I haven't been able to find a list of functions included in Mike Lesk's "portable I/O package", so I don't know whether strcpy in its current form dates all the way back to there or if those functions were added later. (The only real source I've found is Dennis Ritchie's widely-known C History article, which is excellent but not that in depth. I didn't find any documentation or source code for the actual I/O package itself.)

    They do appear in their current form in K&R first edition, 1978.


    Functions should return the result of computation they do, if it's potentially useful to the caller, instead of throwing it away. Either as a pointer to the end of the string, or an integer length. (A pointer would be natural.)

    As @R says:

    We all wish these functions returned a pointer to the terminating null byte (which would reduce a lot of O(n) operations to O(1))

    e.g. calling strcat(bigstr, newstr[i]) in a loop to build up a long string from many short (O(1) length) strings has approximately O(n^2) complexity, but strlen/memcpy will only look at each character twice (once in strlen, once in memcpy).

    Using only the ANSI C standard library, there's no way to efficiently only look at every character once. You could manually write a byte-at-a-time loop, but for strings longer than a few bytes, that's worse than looking at each character twice with current compilers (which won't auto-vectorize a search loop) on modern HW, given efficient libc-provided SIMD strlen and memcpy. You could use length = sprintf(bigstr, "%s", newstr[i]); bigstr+=length;, but sprintf() has to parse its format string and is not fast.

    There isn't even a version of strcmp or memcmp that returns the position of the difference. If that's what you want, you have the same problem as Why is string comparison so fast in python?: an optimized library function that runs faster than anything you can do with a compiled loop (unless you have hand-optimized asm for every target platform you care about), which you can use to get close to the differing byte before falling back to a regular loop once you get close.

    It seems that C's string library was designed without regard to the O(n) cost of any operation, not just finding the end of implicit-length strings, and strcpy's behaviour is definitely not the only example.

    They basically treat implicit-length strings as whole opaque objects, always returning pointers to the start, never to the end or to a position inside one after searching or appending.


    History guesswork

    In early C on a PDP-11, I suspect that strcpy was no more efficient than while(*dst++ = *src++) {} (and was probably implemented that way).

    In fact, K&R first edition (page 101) shows that implementation of strcpy and says:

    Although this may seem cryptic at first sight, the notational convenience is considerable, and the idiom should be mastered, if for no other reason than that you will see it frequently in C programs.

    This implies they fully expected programmers to write their own loops in cases where you wanted the final value of dst or src. And thus maybe they didn't see a need to redesign the standard library API until it was too late to expose more useful APIs for hand-optimized asm library functions.


    But does returning the original value of dst make any sense?

    strcpy(dst, src) returning dst is analogous to x=y evaluating to the x. So it makes strcpy work like a string assignment operator.

    As other answers point out, this allows nesting, like foo( strcpy(buf,input) );. Early computers were very memory-constrained. Keeping your source code compact was common practice. Punch cards and slow terminals were probably a factor in this. I don't know historical coding standards or style guides or what was considered too much to put on one line.

    Crusty old compilers were also maybe a factor. With modern optimizing compilers, char *tmp = foo(); / bar(tmp); is no slower than bar(foo());, but it is with gcc -O0. I don't know if very early compilers could optimize variables away completely (not reserving stack space for them), but hopefully they could at least keep them in registers in simple cases (unlike modern gcc -O0 which on purpose spills/reloads everything for consistent debugging). i.e. gcc -O0 isn't a good model for ancient compilers, because it's anti-optimizing on purpose for consistent debugging.


    Possible compiler-generated-asm motivation

    Given the lack of care about efficiency in the general API design of the C string library, this might be unlikely. But perhaps there was a code-size benefit. (On early computers, code-size was more of a hard limit than CPU time).

    I don't know much about the quality of early C compilers, but it's a safe bet that they were not awesome at optimizing, even for a nice simple / orthogonal architecture like PDP-11.

    It's common to want the string pointer after the function call. At an asm level, you (the compiler) probably has it in a register before the call. Depending on calling convention, you either push it on the stack or you copy it to the right register where the calling convention says the first arg goes. (i.e. where strcpy is expecting it). Or if you're planning ahead, you already had the pointer in the right register for the calling convention.

    But function calls clobber some registers, including all the arg-passing registers. (So when a function gets an arg in a register, it can increment it there instead of copying to a scratch register.)

    So as the caller, your code-gen option for keeping something across a function call include:

    • store/reload it to local stack memory. (Or just reload it if an up-to-date copy is still in memory).
    • save/restore a call-preserved register at the start/end of your whole function, and copy the pointer to one of those registers before the function call.
    • the function returns the value in a register for you. (Of course, this only works if the C source is written to use the return value instead of the input variable. e.g. dst = strcpy(dst, src); if you aren't nesting it).

    All calling conventions on all architectures I'm aware of return pointer-sized return values in a register, so having maybe one extra instruction in the library function can save code-size in all callers that want to use that return value.

    You probably got better asm from primitive early C compilers by using the return value of strcpy (already in a register) than by making the compiler save the pointer around the call in a call-preserved register or spill it to the stack. This may still be the case.

    BTW, on many ISAs, the return-value register is not the first arg-passing register. And unless you use base+index addressing modes, it does cost an extra instruction (and tie up another reg) for strcpy to copy the register for a pointer-increment loop.

    PDP-11 toolchains normally used some kind of stack-args calling convention, always pushing args on the stack. I'm not sure how many call-preserved vs. call-clobbered registers were normal, but only 5 or 6 GP regs were available (R7 being the program counter, R6 being the stack pointer, R5 often used as a frame pointer). So it's similar to but even more cramped than 32-bit x86.

    char *bar(char *dst, const char *str1, const char *str2)
    {
        //return strcat(strcat(strcpy(dst, str1), "separator"), str2);
    
        // more readable to modern eyes:
        dst = strcpy(dst, str1);
        dst = strcat(dst, "separator");
    //    dst = strcat(dst, str2);
    
        return dst;  // simulates further use of dst
    }
    
      # x86 32-bit gcc output, optimized for size (not speed)
      # gcc8.1 -Os  -fverbose-asm -m32
      # input args are on the stack, above the return address
    
        push    ebp     #
        mov     ebp, esp  #,      Create a stack frame.
    
        sub     esp, 16   #,      This looks like a missed optimization, wasted insn
        push    DWORD PTR [ebp+12]      # str1
        push    DWORD PTR [ebp+8]       # dst
        call    strcpy  #
        add     esp, 16   #,
    
        mov     DWORD PTR [ebp+12], OFFSET FLAT:.LC0      # store new args over our incoming args
        mov     DWORD PTR [ebp+8], eax    #  EAX = dst.
        leave   
        jmp     strcat                  # optimized tailcall of the last strcat
    

    This is significantly more compact than a version which doesn't use dst =, and instead reuses the input arg for the strcat. (See both on the Godbolt compiler explorer.)

    The -O3 output is very different: gcc for the version that doesn't use the return value uses stpcpy (returns a pointer to the tail) and then mov-immediate to store the literal string data directly to the right place.

    But unfortunately, the dst = strcpy(dst, src) -O3 version still uses regular strcpy, then inlines strcat as strlen + mov-immediate.


    To C-string or not to C-string

    C implicit-length strings aren't always inherently bad, and have interesting advantages (e.g. a suffix is also a valid string, without having to copy it).

    But the C string library is not designed in a way that makes efficient code possible, because char-at-a-time loops typically don't auto-vectorize and the library functions throw away results of work they have to do.

    gcc and clang never auto-vectorize loops unless the iteration count is known before the first iteration, e.g. for(int i=0; i<n ;i++). ICC can vectorize search loops, but it's still unlikely to do as well as hand-written asm.


    strncpy and so on are basically a disaster. e.g. strncpy doesn't copy the terminating '\0' if it reaches the buffer size limit. It appears to have been designed for writing into the middle of larger strings, not for avoiding buffer overflows. Not returning a pointer to the end means you have to arr[n] = 0; before or afterwards, potentially touching a page of memory that never needed to be touched.

    A few functions like snprintf are usable and do always nul-terminate. Remembering which does which is hard, and a huge risk if you remember wrong, so you have to check every time in cases where it matters for correctness.

    As Bruce Dawson says: Stop using strncpy already!. Apparently some MSVC extensions like _snprintf are even worse.

    0 讨论(0)
  • 2020-11-30 03:45

    as Evan pointed out, it is possible to do something like

    char* s = strcpy(malloc(10), "test");
    

    e.g. assign malloc()ed memory a value, without using helper variable.

    (this example isn't the best one, it will crash on out of memory conditions, but the idea is obvious)

    0 讨论(0)
  • 2020-11-30 03:45

    I don't think this is really set up this way for nesting purposes, but more for error checking. If memory serves none of the c standard library functions do much error checking on their own and therefor it makes more sense that this would be to determine if something went awry during the strcpy call.

    if(strcpy(dest, source) == NULL) {
      // Something went horribly wrong, now we deal with it
    }
    
    0 讨论(0)
  • 2020-11-30 04:00

    I believe that your guess is correct, it makes it easier to nest the call.

    0 讨论(0)
  • 2020-11-30 04:01

    Its also extremely easy to code.

    The return value is typically left in the AX register (it is not mandatory, but it is frequently the case). And the destination is put in the AX register when the function starts. To return the destination, the programmer needs to do.... exactly nothing! Just leave the value where it is.

    The programmer could declare the function as void. But that return value is already in the right spot, just waiting to be returned, and it doesn't even cost an extra instruction to return it! No matter how small the improvement, it is handy in some cases.

    0 讨论(0)
  • 2020-11-30 04:02

    Same concept as Fluent Interfaces. Just making code quicker/easier to read.

    0 讨论(0)
提交回复
热议问题