What is the simplest standard conform way to produce a Segfault in C?

后端 未结 10 1571
迷失自我
迷失自我 2020-11-29 21:59

I think the question says it all. An example covering most standards from C89 to C11 would be helpful. I though of this one, but I guess it is just undefined behaviour:

相关标签:
10条回答
  • 2020-11-29 22:26

    On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system. For instance, allocating a large object with malloc can appear to succeed, but later, when the object is accessed, it will crash.

    Note that such a program is not strictly conforming; programs which meet that definition have to stay within each of the minimum implementation limits.

    A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.

    The SIGSEGV signal can be raised explicitly, but there is no SIGSEGV symbol in the standard C library.

    (In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.")

    0 讨论(0)
  • 2020-11-29 22:27
     main;
    

    That's it.

    Really.

    Essentially, what this does is it defines main as a variable. In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.

    However, the problem rests in how the system runs executables. In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main".

    In this particular case, however, main is a variable, so it is placed in a non-executable section of memory called .bss, intended for variables (as opposed to .text for the code). Trying to execute code in .bss violates its specific segmentation, so the system throws a segmentation fault.

    To illustrate, here's (part of) an objdump of the resulting file:

    # (unimportant)
    
    Disassembly of section .text:
    
    0000000000001020 <_start>:
        1020:   f3 0f 1e fa             endbr64 
        1024:   31 ed                   xor    %ebp,%ebp
        1026:   49 89 d1                mov    %rdx,%r9
        1029:   5e                      pop    %rsi
        102a:   48 89 e2                mov    %rsp,%rdx
        102d:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
        1031:   50                      push   %rax
        1032:   54                      push   %rsp
        1033:   4c 8d 05 56 01 00 00    lea    0x156(%rip),%r8        # 1190 <__libc_csu_fini>
        103a:   48 8d 0d df 00 00 00    lea    0xdf(%rip),%rcx        # 1120 <__libc_csu_init>
    
        # This is where the program should call main
        1041:   48 8d 3d e4 2f 00 00    lea    0x2fe4(%rip),%rdi      # 402c <main> 
        1048:   ff 15 92 2f 00 00       callq  *0x2f92(%rip)          # 3fe0 <__libc_start_main@GLIBC_2.2.5>
        104e:   f4                      hlt    
        104f:   90                      nop
    
    # (nice things we still don't care about)
    
    Disassembly of section .data:
    
    0000000000004018 <__data_start>:
        ...
    
    0000000000004020 <__dso_handle>:
        4020:   20 40 00                and    %al,0x0(%rax)
        4023:   00 00                   add    %al,(%rax)
        4025:   00 00                   add    %al,(%rax)
        ...
    
    Disassembly of section .bss:
    
    0000000000004028 <__bss_start>:
        4028:   00 00                   add    %al,(%rax)
        ...
    
    # main is in .bss (variables) instead of .text (code)
    
    000000000000402c <main>:
        402c:   00 00                   add    %al,(%rax)
        ...
    
    # aaand that's it! 
    

    PS: This won't work if you compile to a flat executable. Instead, you will cause undefined behaviour.

    0 讨论(0)
  • 2020-11-29 22:29

    A correct program doesn't produce a segfault. And you cannot describe deterministic behaviour of an incorrect program.

    A "segmentation fault" is a thing that an x86 CPU does. You get it by attempting to reference memory in an incorrect way. It can also refer to a situation where memory access causes a page fault (i.e. trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory. To trigger those conditions, you need to program directly for your OS and your hardware. It is nothing that is specified by the C language.

    0 讨论(0)
  • 2020-11-29 22:32

    A segmentation fault is an implementation defined behavior. The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant. To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document. Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.

    If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says (emphasis mine going forward):

    behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

    and in paragraph 2 says:

    NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

    If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV) should accomplish that goal. Although, strictly speaking, SIGSEGV is defined as follows:

    SIGSEGV an invalid access to storage

    and §7.14 Signal handling <signal.h> says:

    An implementation need not generate any of these signals, except as a result of explicit calls to the raise function. Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation. The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive.

    0 讨论(0)
  • 2020-11-29 22:32

    If we assume we are not raising a signal calling raise, segmentation fault is likely to come from undefined behavior. Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations. Moreover a program which invokes undefined behavior is an erroneous program.

    But this one is the shortest I can get that segfault on my system:

    main(){main();}
    

    (I compile with gcc and -std=c89 -O0).

    And by the way, does this program really invokes undefined bevahior?

    0 讨论(0)
  • 2020-11-29 22:32

    Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault. (Since C99 it includes the signal number SIGSEGV, but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV), which as discussed in other answers doesn't count.)

    Therefore, there is no "strictly conforming" program (i.e. program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.

    Segmentation faults are defined by a different standard, POSIX. This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" (SIGBUS), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf, posix_memalign and mprotect succeed. My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming.

    #define _XOPEN_SOURCE 700
    #include <sys/mman.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <errno.h>
    
    int main(void)
    {
        size_t pagesize = sysconf(_SC_PAGESIZE);
        if (pagesize == (size_t)-1) {
            fprintf(stderr, "sysconf: %s\n", strerror(errno));
            return 1;
        }
        void *page;
        int err = posix_memalign(&page, pagesize, pagesize);
        if (err || !page) {
            fprintf(stderr, "posix_memalign: %s\n", strerror(err));
            return 1;
        }
        if (mprotect(page, pagesize, PROT_NONE)) {
            fprintf(stderr, "mprotect: %s\n", strerror(errno));
            return 1;
        }
        *(long *)page = 0xDEADBEEF;
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题