Crash when handling char * init'd with string literal, but not with malloc

左心房为你撑大大i 提交于 2020-01-04 05:04:36

问题


I was reading a book on C today, and it mentioned that the following was true; I was so curious as to why that I made this program to verify; and then ultimately post it here so someone smarter than me can teach me why these two cases are different at runtime.

The specifics of the question related to the difference at runtime between how a (char *) is handled based on whether it is pointing to a string created as a literal vs. created with malloc and manual population.

why is the memory allocated by the memory more protected like this? Also, does the answer explain the meaning of "bus error"?

Here is a program I wrote which asks the user if they would like to crash or not, to illustrate that the program compiles fine; and to highlight that in my head the code in both options is conceptually identical; but that's why I'm here, to understand why they are not.

// demonstrate the difference between initializing a (char *) 
// with a literal, vs malloc
// and the mutability of the contents thereafter
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main() {
    char cause_crash;
    char *myString;

    printf("Cause crash? "); 
    scanf("%c", &cause_crash);

    if(cause_crash == 'y') {
        myString = "ab";
        printf("%s\n", myString); // ab
        *myString = 'x'; // CRASH!
        printf("%s\n", myString);   
    } else {
        myString = malloc(3 * sizeof(char));
        myString[0] = 'a';
        myString[1] = 'b';
        myString[2] = '\0';
        printf("%s\n", myString); // ab
        *myString = 'x';
        printf("%s\n", myString); // xb     
    }
    return 0;
}

edit: conclusions

There are several good answers below, but I want to summarize what I have come to understand succinctly here.

The basic answer seems to be this:

When a compiler sees a "string literal" being assigned to a (char *) variable, the pointer will point to memory which is static (perhaps actually part of the binary, but usually enforced as read only by a lower-level system than your runtime. In other words, the memory is probably not dynamically allocated at that part of the program, but instead the pointer is simply set to point to an area of static memory which houses the contents of your literal.

There are a few things I want to call out about this resolution:

1. Optimization may be a possible motive: With my compiler, two different (char *) variables initialized with the same string literal actually point to the same address:

char *myString = "hello";
char *mySecond = "hello"; // the pointers are identical! This is a cool optimization.

2 Interstingly, if the variable is actually an array of chars (instead of a (char *)), this (#1) is not true. this was interesting to me because I was under the impression that (post-compilation) arrays where identical to pointers-to-chars.

char myArString[] = "hello";
char myArSecond[] = "hello"; // the pointers are NOT the same

3 to summarize what several answers hinted at: char *myString = "Hello, World!" does not allocate new memory, it just sets myString to point to memory which already existed; perhaps in the binary, perhaps in a special read-only block of memory... etc.

4 I found through testing that char myString[] = "Hello, World!" does allocate new memory; I think... what I know is that the string is mutable when created this way.


回答1:


When you set a variable to a string literal, you are setting it to a value stored in the read only data section of the assembly program. These data items are constant, and attempts to use them differently will most likely crash.

When you use malloc to get the memory, you are getting a pointer to read/write heap memory that you can do anything to.

This is caused by a couple of reasons. For one thing, the actual type of "Hello, world" is char[13], or constant pointer to 13 characters. You can not assign a value to a constant character. But when you do something like what you do, which is casting away the constness. That means that the compiler wont prevent you from changing the memory, but the C standard calls is undefined behavior. Undefined behavior can be anything, but it is usually a crash.

If you want to assign a literal value to char* memory, do this:

char* data = malloc (42);
memcpy(data, "Hi!", 4);



回答2:


You really should have declared myString as a const char*. Literals are stored in readonly memory, they cannot be modified. Use a char[] if you need to modify it.




回答3:


What

myString = "ab";

does is assign the address of the constant string literal which lives in readonly memory to the char pointer myString.

If you write to this memory now, you get a crash.

OTOH, you can, of course, happily write on malloc()ed memory, so that works.




回答4:


C standards specify that literal strings are static and that attempts to modify them result in undefined behavior. In other words they should be considered read-only.

The memory that you've allocated with malloc belongs to you and you can modify it in any way you like.

The actual differences can be implementation-dependent, but typically each type of string is located in two different types/areas of memory:

  • the heap in the case of data obtained using malloc, and
  • the (read-only) data section in the case of string literals.



回答5:


What if you wrote this:

&mystring = &"ab";

What would that mean to you?

Would you think that you could then modify "ab" somehow? Where is &"ab"?

ANS: &"ab" is in read-only memory. When the compiler see that QUOTE it puts that string in immutable memory. Why? Probably faster somehow if the runtime doesn't have to bounds check and check for segfault,etc. on string data that really should never change.



来源:https://stackoverflow.com/questions/11379412/crash-when-handling-char-initd-with-string-literal-but-not-with-malloc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!