What happens in OS when we dereference a NULL pointer in C?

后端 未结 5 863
天涯浪人
天涯浪人 2020-11-27 02:56

Let\'s say there is a pointer and we initialize it with NULL.

int* ptr = NULL;
*ptr = 10;

Now , the program will crash since ptr

相关标签:
5条回答
  • 2020-11-27 03:21

    As a side note, just to compel the differences in architectures, a certain OS developed and maintained by a company known for their three-letter acronym name and often referred to as a large primary color has a most-fasicnating NULL determination.

    They utilize a 128-bit linear address space for ALL data (memory AND disk) in one giant "thing". In accordance with their OS, a "valid" pointer must be placed on a 128-bit boundary within that address space. This, btw, causes fascinating side effects for structs, packed or not, that house pointers. Anyway, tucked away in a per-process dedicated page is a bitmap that assigns one bit for every valid location in a process address space where a valid pointer can lay. ALL opcodes on their hardware and OS that can generate and return a valid memory address and assign it to a pointer will set the bit that represents the memory address where that pointer (the target pointer) is located.

    So why should anyone care? For this simple reason:

    int a = 0;
    int *p = &a;
    int *q = p-1;
    
    if (p)
    {
    // p is valid, p's bit is lit, this code will run.
    }
    
    if (q)
    {
       // the address stored in q is not valid. q's bit is not lit. this will NOT run.
    }
    

    What is truly interesting is this.

    if (p == NULL)
    {
       // p is valid. this will NOT run.
    }
    
    if (q == NULL)
    {
       // q is not valid, and therefore treated as NULL, this WILL run.
    }
    
    if (!p)
    {
       // same as before. p is valid, therefore this won't run
    }
    
    if (!q)
    {
       // same as before, q is NOT valid, therefore this WILL run.
    }
    

    Its something you have to see to believe. I can't even imagine the housekeeping done to maintain that bit map, especially when copying pointer values or freeing dynamic memory.

    0 讨论(0)
  • 2020-11-27 03:23

    Short answer: it depends on a lot of factors, including the compiler, processor architecture, specific processor model, and the OS, among others.

    Long answer (x86 and x86-64): Let's go down to the lowest level: the CPU. On x86 and x86-64, that code will typically compile into an instruction or instruction sequence like this:

    movl $10, 0x00000000
    

    Which says to "store the constant integer 10 at virtual memory address 0". The Intel® 64 and IA-32 Architectures Software Developer Manuals describe in detail what happens when this instruction gets executed, so I'm going to summarize it for you.

    The CPU can operate in several different modes, several of which are for backwards compatibility with much older CPUs. Modern operating systems run user-level code in a mode called protected mode, which uses paging to convert virtual addresses into physical addresses.

    For each process, the OS keeps a page table which dictates how the addresses are mapped. The page table is stored in memory in a specific format (and protected so that they can not be modified by the user code) that the CPU understands. For every memory access that happens, the CPU translates it according to the page table. If the translation succeeds, it performs the corresponding read/write to the physical memory location.

    The interesting things happen when the address translation fails. Not all addresses are valid, and if any memory access generates an invalid address, the processor raises a page fault exception. This triggers a transition from user mode (aka current privilege level (CPL) 3 on x86/x86-64) into kernel mode (aka CPL 0) to a specific location in the kernel's code, as defined by the interrupt descriptor table (IDT).

    The kernel regains control and, based on the information from the exception and the process's page table, figures out what happened. In this case, it realizes that the user-level process accessed an invalid memory location, and then it reacts accordingly. On Windows, it will invoke structured exception handling to allow the user code to handle the exception. On POSIX systems, the OS will deliver a SIGSEGV signal to the process.

    In other cases, the OS will handle the page fault internally and restart the process from its current location as if nothing happened. For example, guard pages are placed at the bottom of the stack to allow the stack to grow on demand up to a limit, instead of preallocating a large amount of memory for the stack. Similar mechanisms are used for achieving copy-on-write memory.

    In modern OSes, the page tables are usually set up to make the address 0 an invalid virtual address. But sometimes it's possible to change that, e.g. on Linux by writing 0 to the pseudofile /proc/sys/vm/mmap_min_addr, after which it's possible to use mmap(2) to map the virtual address 0. In that case, dereferencing a null pointer would not cause a page fault.

    The above discussion is all about what happens when the original code is running in user space. But this could also happen inside the kernel. The kernel can (and is certainly much more likely than user code to) map the virtual address 0, so such a memory access would be normal. But if it's not mapped, then what happens then is largely similar: the CPU raises a page fault error which traps into a predefined point at the kernel, the kernel examines what happened, and reacts accordingly. If the kernel can't recover from the exception, it will typically panic in some fashion (kernel panic, kernel oops, or a BSOD on Windows, e.g.) by printing out some debug information to the console or serial port and then halting.

    See also Much ado about NULL: Exploiting a kernel NULL dereference for an example of how an attacker could exploit a null pointer dereference bug from inside the kernel in order to gain root privileges on a Linux machine.

    0 讨论(0)
  • 2020-11-27 03:31

    On CPU which support virtual mermory, a page fault exception will be usually issued if you try to read at memory address 0x0. The OS page fault handler will be invoked, the OS will then decide that the page is invalid and aborts your program.

    Note that on some CPU you can also safely access memory address 0x0.

    As the C Standard says dereferencing a null pointer is undefined, if the compiler is able to detect at compile time (or even runtime) that your are dereferencing a null pointer it can do whatever it wants, like aborting the program with a verbose error message.

    (C99, 6.5.3.2.p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.87)"

    87): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

    0 讨论(0)
  • 2020-11-27 03:41

    In a typical case, int *ptr = NULL; will set ptr to point to address 0. The C standard (and the C++ standard) is very careful to not require that, but it's extremely common nonetheless.

    When you do *ptr = 10;, the CPU would normally generate 0 on the address lines, and 10 on the data lines, while setting a R/W line to indicate a write (and, if the bus has such a thing, assert the memory vs. I/O line to indicate a write to memory, not I/O).

    Assuming the CPU supports memory protection (and you're using an OS that enables it), the CPU will check that (attempted) access before it happens though. For example, a modern Intel/AMD CPU will use paging tables that map virtual addresses to physical addresses. In a typical case, address 0 won't be mapped to any physical address. In this case, the CPU will generate an access violation exception. For one fairly typical example, Microsoft Windows leaves the first 4 megabytes un-mapped, so any address in that range will normally result in an access violation.

    On an older CPU (or an older operating system that doesn't enable the CPUs protection features) the attempted write will often succeed. For example, under MS-DOS, writing through a NULL pointer would simply write to address zero. In small or medium model (with 16-bit addresses for data) most compilers would write some known pattern to the first few bytes of the data segment, and when the program ended, they'd check to see if that pattern remained intact (and do something to indicate that you'd written via a NULL pointer if it failed). In compact or large model (20-bit data addresses) they'd generally just write to address zero without warning.

    0 讨论(0)
  • 2020-11-27 03:45

    I imagine that this is platform and compiler dependent. The NULL pointer could be implemented by using a NULL page, in which case you'd have a page fault, or it could be below the segment limit for an expand-down segment, in which case you'd have a segmentation fault.

    This is not a definitive answer, just my conjecture.

    0 讨论(0)
提交回复
热议问题