In which data segment is the C string stored?

I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".

As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.

From my understanding,

in case of char s[] = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.

  3. We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.

in case of char *s = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.
  3. s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".

Am I right?


You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.

In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.

For first case wikipedia says,

The values for these variables are initially stored within the read-only memory (typically within .text) and are copied into the .data segment during the start-up routine of the program.

Let us examine segment.c file.

char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];

int main(int argc, char*argv[]) {
        char psar[] = "parhello"; // local/private string array
        char*ps = "phello"; // private string
        content[0] = 1;
        sar[3] = 1; // OK
        // sar++; // not allowed
        // s[2] = 1; // segmentation fault
        s = sar;
        s[2] = 1; // OK
        psar[3] = 1; // OK
        // ps[2] = 1; // segmentation fault
        ps = psar;
        ps[2] = 1; // OK
        return 0;

Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.

    .file   "segment.c"
    .globl  s
    .section    .rodata
    .string "hello"
    .align 8
    .type   s, @object
    .size   s, 8
    .quad   .LC0
    .globl  sar
    .type   sar, @object
    .size   sar, 6
    .string "hello"
    .comm   content,32,32
    .section    .rodata
    .string "phello"
    .globl  main
    .type   main, @function
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $64, %rsp
    movl    %edi, -52(%rbp)
    movq    %rsi, -64(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movl    $1752326512, -32(%rbp)
    movl    $1869376613, -28(%rbp)
    movb    $0, -24(%rbp)
    movq    $.LC1, -40(%rbp)
    movb    $1, content(%rip)
    movb    $1, sar+3(%rip)
    movq    $sar, s(%rip)
    movq    s(%rip), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movb    $1, -29(%rbp)
    leaq    -32(%rbp), %rax
    movq    %rax, -40(%rbp)
    movq    -40(%rbp), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movl    $0, %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L2
    call    __stack_chk_fail
    .cfi_def_cfa 7, 8
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.

Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,

<   movl    $1886153829, -28(%rbp)
>   movl    $1869376613, -28(%rbp)

So there is no separate immutable store for psar . It is turned into integers in the code segment.


answer to your first question:

char s[] = "hello";

s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:

#include <stdio.h>

void reverse(char *p){
    char c;
    char* q = p;
    while (*q) q++; 
    q--; // point to the end
    while (p < q) {
        c = *p;
        *p++ = *q;
        *q-- = c;

int main(){
    char s[]  = "DCBA";
    reverse( s);
    printf("%s\n", s); // ABCD

which reverses the text "DCBA" and produces "ABCD".

char *p = "hello"

p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:

#include <stdio.h>
int main(){
    char* s  = "DCBA";  
    s[0]='D'; // compile ok but runtime error
    printf("%s\n", s); // ABCD

this compiles, but not runs.

const char* const s = "DCBA";

With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:

#include <stdio.h>
int main(){
    const char* const s  = "DCBA";  
    s[0]='D'; // compile error
    printf("%s\n", s); // ABCD

The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.

The Stack is RAM memory used for local variables in functions.

The Heap is RAM memory used for global variables and heap-initialized data.

BSS contains all global variables and static variables that are initialized to zero or not initialized vars.

With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).

This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.

