Why does Linux on x86 use different segments for user processes and the kernel?

前端 未结 4 371
旧时难觅i
旧时难觅i 2020-12-24 03:16

So, I know that Linux uses four default segments for an x86 processor (kernel code, kernel data, user code, user data), but they all have the same base and limit (0x00000000

相关标签:
4条回答
  • 2020-12-24 03:28

    Kernel memory should not be readable from programs running in user space.

    Program data is often not executable (DEP, a processor feature, which helps guard against executing an overflowed buffer and other malicious attacks).

    It's all about access control - different segments have different rights. That's why accessing the wrong segment will give you a "segmentation fault".

    0 讨论(0)
  • 2020-12-24 03:38

    in X86 - linux segment registers are used for buffer overflow check [see the below code snippet which have defined some char arrays in stack] :

    static void
    printint(int xx, int base, int sgn)
    {
        char digits[] = "0123456789ABCDEF";
        char buf[16];
        int i, neg;
        uint x;
    
        neg = 0;
        if(sgn && xx < 0){
            neg = 1;
            x = -xx;
        } else {
            x = xx;
        }
    
        i = 0;
        do{
            buf[i++] = digits[x % base];
        }while((x /= base) != 0);
        if(neg)
            buf[i++] = '-';
    
        while(--i >= 0)
            my_putc(buf[i]);
    }
    

    Now if we see the dis-assembly of the code gcc-generated code.

    Dump of assembler code for function printint:

     0x00000000004005a6 <+0>:   push   %rbp
       0x00000000004005a7 <+1>: mov    %rsp,%rbp
       0x00000000004005aa <+4>: sub    $0x50,%rsp
       0x00000000004005ae <+8>: mov    %edi,-0x44(%rbp)
    
    
      0x00000000004005b1 <+11>: mov    %esi,-0x48(%rbp)
       0x00000000004005b4 <+14>:    mov    %edx,-0x4c(%rbp)
       0x00000000004005b7 <+17>:    mov    %fs:0x28,%rax  ------> obtaining an 8 byte guard from based on a fixed offset from fs segment register [from the descriptor base in the corresponding gdt entry]
       0x00000000004005c0 <+26>:    mov    %rax,-0x8(%rbp) -----> pushing it as the first local variable on to stack
       0x00000000004005c4 <+30>:    xor    %eax,%eax
       0x00000000004005c6 <+32>:    movl   $0x33323130,-0x20(%rbp)
       0x00000000004005cd <+39>:    movl   $0x37363534,-0x1c(%rbp)
       0x00000000004005d4 <+46>:    movl   $0x42413938,-0x18(%rbp)
       0x00000000004005db <+53>:    movl   $0x46454443,-0x14(%rbp)
    
    ...
    ...
      // function end
    
       0x0000000000400686 <+224>:   jns    0x40066a <printint+196>
       0x0000000000400688 <+226>:   mov    -0x8(%rbp),%rax -------> verifying if the stack was smashed
       0x000000000040068c <+230>:   xor    %fs:0x28,%rax  --> checking the value on stack is matching the original one based on fs
       0x0000000000400695 <+239>:   je     0x40069c <printint+246>
       0x0000000000400697 <+241>:   callq  0x400460 <__stack_chk_fail@plt>
       0x000000000040069c <+246>:   leaveq 
       0x000000000040069d <+247>:   retq 
    

    Now if we remove the stack based char arrays from this function , gcc won't generate this guard check .

    I have seen the same generated by gcc even for kernel modules. Basically I was seeing a crash while botrapping some kernel code and it was faulting with virtual address 0x28. Later I figured that thought i had initialized the stack pointer correctly and loaded the program correctly, I am not having the right entries in gdt, which would translate the fs based offset into a valid virtual address.

    However in case of kernel code it was simply ignoring , the error instead of jumping to something like __stack_chk_fail@plt>.

    The relevant compiler option which adds this guard in gcc is -fstack-protector . I think this is enabled by default which compiling a user app.

    For kernel , we can enable this gcc flag via config CC_STACKPROTECTOR option.

    config CC_STACKPROTECTOR
     699        bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
     700        depends on SUPERH32
     701        help
     702          This option turns on the -fstack-protector GCC feature. This
     703          feature puts, at the beginning of functions, a canary value on
     704          the stack just before the return address, and validates
     705          the value just before actually returning.  Stack based buffer
     706          overflows (that need to overwrite this return address) now also
     707          overwrite the canary, which gets detected and the attack is then
     708          neutralized via a kernel panic.
     709
     710          This feature requires gcc version 4.2 or above.
    

    The relevant kernel file where this gs / fs is linux/arch/x86/include/asm/stackprotector.h

    0 讨论(0)
  • 2020-12-24 03:45

    The x86 architecture associates a type and a privilege level with each segment descriptor. The type of a descriptor allows segments to be made read only, read/write, executable, etc., but the main reason for different segments having the same base and limit is to allow a different descriptor privilege level (DPL) to be used.

    The DPL is two bits, allowing the values 0 through 3 to be encoded. When the privilege level is 0, then it is said to be ring 0, which is the most privileged. The segment descriptors for the Linux kernel are ring 0 whereas the segment descriptors for user space are ring 3 (least privileged). This is true for most segmented operating systems; the core of the operating system is ring 0 and the rest is ring 3.

    The Linux kernel sets up, as you mentioned, four segments:

    • __KERNEL_CS (Kernel code segment, base=0, limit=4GB, type=10, DPL=0)
    • __KERNEL_DS (Kernel data segment, base=0, limit=4GB, type=2, DPL=0)
    • __USER_CS (User code segment, base=0, limit=4GB, type=10, DPL=3)
    • __USER_DS (User data segment, base=0, limit=4GB, type=2, DPL=3)

    The base and limit of all four are the same, but the kernel segments are DPL 0, the user segments are DPL 3, the code segments are executable and readable (not writable), and the data segments are readable and writable (not executable).

    See also:

    • Segmentation in Linux
    • x86 Segmentation for the 15-410 Student
    • 5.1.1 Descriptors
    • 6.3.1 Descriptors Store Protection Parameters
    0 讨论(0)
  • 2020-12-24 03:52

    The x86 memory management architecture uses both segmentation and paging. Very roughly speaking, a segment is a partition of a process's address space that has its own protection policy. So, in the x86 architecture, it is possible to split the range of memory addresses that a process sees into multiple contiguous segments, and assign different protection modes to each. Paging is a technique for mapping small (usually 4KB) regions of a process's address space to chunks of real, physical memory. Paging thus controls how regions inside a segment are mapped onto physical RAM.

    All processes have two segments:

    1. one segment (addresses 0x00000000 through 0xBFFFFFFF) for user-level, process-specific data such as the program's code, static data, heap, and stack. Every process has its own, independent user segment.

    2. one segment (addresses 0xC0000000 through 0xFFFFFFFF), which contains kernel-specific data such as the kernel instructions, data, some stacks on which kernel code can execute, and more interestingly, a region in this segment is directly mapped to physical memory, so that the kernel can directly access physical memory locations without having to worry about address translation. The same kernel segment is mapped into every process, but processes can access it only when executing in protected kernel mode.

    So, in user-mode, the process may only access addresses less than 0xC0000000; any access to an address higher than this results in a fault. However, when a user-mode process begins executing in the kernel (for instance, after having made a system call), the protection bit in the CPU is changed to supervisor mode (and some segmentation registers are changed), meaning that the process is thereby able to access addresses above 0xC0000000.

    Refer ed from: HERE

    0 讨论(0)
提交回复
热议问题