C's strtok() and read only string literals

前端 未结 5 1474
感情败类
感情败类 2020-11-27 08:27

char *strtok(char *s1, const char *s2)

repeated calls to this function break string s1 into \"tokens\"--that is the string is broken into substr

相关标签:
5条回答
  • 2020-11-27 08:56

    What did you initialize the char * to?

    If something like

    char *text = "foobar";
    

    then you have a pointer to some read-only characters

    For

    char text[7] = "foobar";
    

    then you have a seven element array of characters that you can do what you like with.

    strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.

    Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.

    Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)

    0 讨论(0)
  • 2020-11-27 09:07

    An important point that's inferred but not stated explicitly:

    Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.

    As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:

    int x = 0;

    The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.

    When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.

    Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.

    When you have a statement like:

    char *name = "Tom";

    The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.

    But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.

    When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).

    So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.

    I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.

    I hope this was helpful. ;)

    0 讨论(0)
  • 2020-11-27 09:07

    If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.

    0 讨论(0)
  • 2020-11-27 09:14

    In brief:

    char *s = "HAPPY DAY";
    printf("\n %s ", s);
    
    s = "NEW YEAR"; /* Valid */
    printf("\n %s ", s);
    
    s[0] = 'c'; /* Invalid */
    
    0 讨论(0)
  • 2020-11-27 09:17

    I blame the C standard.

    char *s = "abc";
    

    could have been defined to give the same error as

    const char *cs = "abc";
    char *s = cs;
    

    on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]

    If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.

    It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.

    [Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]

    0 讨论(0)
提交回复
热议问题