So, I know that Linux uses four default segments for an x86 processor (kernel code, kernel data, user code, user data), but they all have the same base and limit (0x00000000
Kernel memory should not be readable from programs running in user space.
Program data is often not executable (DEP, a processor feature, which helps guard against executing an overflowed buffer and other malicious attacks).
It's all about access control - different segments have different rights. That's why accessing the wrong segment will give you a "segmentation fault".
in X86 - linux segment registers are used for buffer overflow check [see the below code snippet which have defined some char arrays in stack] :
static void
printint(int xx, int base, int sgn)
{
char digits[] = "0123456789ABCDEF";
char buf[16];
int i, neg;
uint x;
neg = 0;
if(sgn && xx < 0){
neg = 1;
x = -xx;
} else {
x = xx;
}
i = 0;
do{
buf[i++] = digits[x % base];
}while((x /= base) != 0);
if(neg)
buf[i++] = '-';
while(--i >= 0)
my_putc(buf[i]);
}
Now if we see the dis-assembly of the code gcc-generated code.
Dump of assembler code for function printint:
0x00000000004005a6 <+0>: push %rbp
0x00000000004005a7 <+1>: mov %rsp,%rbp
0x00000000004005aa <+4>: sub $0x50,%rsp
0x00000000004005ae <+8>: mov %edi,-0x44(%rbp)
0x00000000004005b1 <+11>: mov %esi,-0x48(%rbp)
0x00000000004005b4 <+14>: mov %edx,-0x4c(%rbp)
0x00000000004005b7 <+17>: mov %fs:0x28,%rax ------> obtaining an 8 byte guard from based on a fixed offset from fs segment register [from the descriptor base in the corresponding gdt entry]
0x00000000004005c0 <+26>: mov %rax,-0x8(%rbp) -----> pushing it as the first local variable on to stack
0x00000000004005c4 <+30>: xor %eax,%eax
0x00000000004005c6 <+32>: movl $0x33323130,-0x20(%rbp)
0x00000000004005cd <+39>: movl $0x37363534,-0x1c(%rbp)
0x00000000004005d4 <+46>: movl $0x42413938,-0x18(%rbp)
0x00000000004005db <+53>: movl $0x46454443,-0x14(%rbp)
...
...
// function end
0x0000000000400686 <+224>: jns 0x40066a <printint+196>
0x0000000000400688 <+226>: mov -0x8(%rbp),%rax -------> verifying if the stack was smashed
0x000000000040068c <+230>: xor %fs:0x28,%rax --> checking the value on stack is matching the original one based on fs
0x0000000000400695 <+239>: je 0x40069c <printint+246>
0x0000000000400697 <+241>: callq 0x400460 <__stack_chk_fail@plt>
0x000000000040069c <+246>: leaveq
0x000000000040069d <+247>: retq
Now if we remove the stack based char arrays from this function , gcc won't generate this guard check .
I have seen the same generated by gcc even for kernel modules. Basically I was seeing a crash while botrapping some kernel code and it was faulting with virtual address 0x28. Later I figured that thought i had initialized the stack pointer correctly and loaded the program correctly, I am not having the right entries in gdt, which would translate the fs based offset into a valid virtual address.
However in case of kernel code it was simply ignoring , the error instead of jumping to something like __stack_chk_fail@plt>.
The relevant compiler option which adds this guard in gcc is -fstack-protector . I think this is enabled by default which compiling a user app.
For kernel , we can enable this gcc flag via config CC_STACKPROTECTOR option.
config CC_STACKPROTECTOR 699 bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)" 700 depends on SUPERH32 701 help 702 This option turns on the -fstack-protector GCC feature. This 703 feature puts, at the beginning of functions, a canary value on 704 the stack just before the return address, and validates 705 the value just before actually returning. Stack based buffer 706 overflows (that need to overwrite this return address) now also 707 overwrite the canary, which gets detected and the attack is then 708 neutralized via a kernel panic. 709 710 This feature requires gcc version 4.2 or above.
The relevant kernel file where this gs / fs is linux/arch/x86/include/asm/stackprotector.h
The x86 architecture associates a type and a privilege level with each segment descriptor. The type of a descriptor allows segments to be made read only, read/write, executable, etc., but the main reason for different segments having the same base and limit is to allow a different descriptor privilege level (DPL) to be used.
The DPL is two bits, allowing the values 0 through 3 to be encoded. When the privilege level is 0, then it is said to be ring 0, which is the most privileged. The segment descriptors for the Linux kernel are ring 0 whereas the segment descriptors for user space are ring 3 (least privileged). This is true for most segmented operating systems; the core of the operating system is ring 0 and the rest is ring 3.
The Linux kernel sets up, as you mentioned, four segments:
The base and limit of all four are the same, but the kernel segments are DPL 0, the user segments are DPL 3, the code segments are executable and readable (not writable), and the data segments are readable and writable (not executable).
See also:
The x86 memory management architecture uses both segmentation and paging. Very roughly speaking, a segment is a partition of a process's address space that has its own protection policy. So, in the x86 architecture, it is possible to split the range of memory addresses that a process sees into multiple contiguous segments, and assign different protection modes to each. Paging is a technique for mapping small (usually 4KB) regions of a process's address space to chunks of real, physical memory. Paging thus controls how regions inside a segment are mapped onto physical RAM.
All processes have two segments:
one segment (addresses 0x00000000 through 0xBFFFFFFF) for user-level, process-specific data such as the program's code, static data, heap, and stack. Every process has its own, independent user segment.
one segment (addresses 0xC0000000 through 0xFFFFFFFF), which contains kernel-specific data such as the kernel instructions, data, some stacks on which kernel code can execute, and more interestingly, a region in this segment is directly mapped to physical memory, so that the kernel can directly access physical memory locations without having to worry about address translation. The same kernel segment is mapped into every process, but processes can access it only when executing in protected kernel mode.
So, in user-mode, the process may only access addresses less than 0xC0000000; any access to an address higher than this results in a fault. However, when a user-mode process begins executing in the kernel (for instance, after having made a system call), the protection bit in the CPU is changed to supervisor mode (and some segmentation registers are changed), meaning that the process is thereby able to access addresses above 0xC0000000.
Refer ed from: HERE