What happens to memory after '\0' in a C string?

前端 未结 11 1556
难免孤独
难免孤独 2020-12-23 11:25

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of th

相关标签:
11条回答
  • 2020-12-23 11:38

    You could do what some of the MS Windows APIs do where you (the caller) pass a pointer and the size of the memory you allocated. If the size isn't enough, you're told how many bytes to allocate. If it was enough, the memory is used and the result is the number of bytes used.

    Thus the decision about how to efficiently use memory is left to the caller. They can allocate a fixed 255 bytes (common when working with paths in Windows) and use the result from the function call to know whether more bytes are needed (not the case with paths due to MAX_PATH being 255 without bypassing Win32 API) or whether most of the bytes can be ignored... The caller could also pass zero as the memory size and be told exactly how much needs to be allocated - not as efficient processing-wise, but could be more efficient space-wise.

    0 讨论(0)
  • 2020-12-23 11:39

    To elaborate on the use of the NULL terminator in C: You cannot allocate a "C string" you can allocate a char array and store a string in it, but malloc and free just see it as an array of the requested length.

    A C string is not a data type but a convention for using a char array where the null character '\0' is treated as the string terminator. This is a way to pass strings around without having to pass a length value as a separate argument. Some other programming languages have explicit string types that store a length along with the character data to allow passing strings in a single parameter.

    Functions that document their arguments as "C strings" are passed char arrays but have no way of knowing how big the array is without the null terminator so if it is not there things will go horribly wrong.

    You will notice functions that expect char arrays that are not necessarily treated as strings will always require a buffer length parameter to be passed. For example if you want to process char data where a zero byte is a valid value you can't use '\0' as a terminator character.

    0 讨论(0)
  • 2020-12-23 11:40

    Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?

    There's nothing magical about \0. You have to call realloc if you want to "shrink" the allocated memory. Otherwise the memory will just sit there until you call free.

    If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly

    Whatever you do in that memory free will always work properly if you pass it the exact same pointer returned by malloc. Of course if you write outside it all bets are off.

    0 讨论(0)
  • 2020-12-23 11:40

    You can certainly preallocate to an upperbound, and use all or something less. Just make sure you actually use all or something less.

    Making two passes is also fine.

    You asked the right questions about the tradeoffs.

    How do you decide?

    Use two passes, initially, because:

    1. you'll know you aren't wasting memory.
    2. you're going to profile to find out where
       you need to optimize for speed anyway.
    3. upperbounds are hard to get right before
       you've written and tested and modified and
       used and updated the code in response to new
       requirements for a while.
    4. simplest thing that could possibly work.
    

    You might tighten up the code a little, too. Shorter is usually better. And the more the code takes advantage of known truths, the more comfortable I am that it does what it says.

    char* copyWithoutDuplicateChains(const char* str)
        {
        if (str == NULL) return NULL;
    
        const char* s = str;
        char prev = *s;               // [prev][s+1]...
        unsigned int outlen = 1;      // first character counted
    
        // Determine length necessary by mimicking processing
    
        while (*s)
            { while (*++s == prev);  // skip duplicates
              ++outlen;              // new character encountered
              prev = *s;             // restart chain
            }
    
        // Construct output
    
        char* outstr = (char*)malloc(outlen);
        s = str;
        *outstr++ = *s;               // first character copied
        while (*s)
            { while (*++s == prev);   // skip duplicates
              *outstr++ = *s;         // copy new character
            }
    
        // done
    
        return outstr;
        }
    
    0 讨论(0)
  • 2020-12-23 11:41

    Generally, memory is memory is memory. It doesn't care what you write into it. BUT it has a race, or if you prefer a flavor (malloc, new, VirtualAlloc, HeapAlloc, etc). This means that the party that allocates a piece of memory must also provide the means to deallocate it. If your API comes in a DLL, then it should provide a free function of some sort. This of course puts a burden on the caller right? So why not put the WHOLE burden on the caller? The BEST way to deal with dynamically allocated memory is to NOT allocate it yourself. Have the caller allocate it and pass it on to you. He knows what flavor he allocated, and he is responsible to free it whenever he is done using it.

    How does the caller know how much to allocate? Like many Windows APIs have your function return the required amount of bytes when called e.g. with a NULL pointer, then do the job when provided with a non-NULL pointer (using IsBadWritePtr if it is suitable for your case to double-check accessibility).

    This can also be much much more efficient. Memory allocations COST a lot. Too many memory allocations cause heap fragmentation and then the allocations cost even more. That's why in kernel mode we use the so called "look-aside lists". To minimize the number of memory allocations done, we reuse the blocks we have already allocated and "freed", using services that the NT Kernel provides to driver writers. If you pass on the responsibility for memory allocation to your caller, then he might be passing you cheap memory from the stack (_alloca), or passing you the same memory over and over again without any additional allocations. You don't care of course, but you DO allow your caller to be in charge of optimal memory handling.

    0 讨论(0)
  • 2020-12-23 11:43

    \0 is just one more character from malloc and free perspective, they don't care what data you put in the memory. So free will still work whether you add \0 in the middle or don't add \0 at all. The extra space allocated will still be there, it won't be returned back to the process as soon as you add \0 to the memory. I personally would prefer to allocate only the required amount of memory instead of allocating at some upper bound as that will just wasting the resource.

    0 讨论(0)
提交回复
热议问题