I think the question says it all. An example covering most standards from C89 to C11 would be helpful. I though of this one, but I guess it is just undefined behaviour:
On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system. For instance, allocating a large object with malloc
can appear to succeed, but later, when the object is accessed, it will crash.
Note that such a program is not strictly conforming; programs which meet that definition have to stay within each of the minimum implementation limits.
A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.
The SIGSEGV
signal can be raised explicitly, but there is no SIGSEGV
symbol in the standard C library.
(In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.")
main;
That's it.
Really.
Essentially, what this does is it defines main
as a variable.
In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.
However, the problem rests in how the system runs executables. In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main
".
In this particular case, however, main
is a variable, so it is placed in a non-executable section of memory called .bss
, intended for variables (as opposed to .text
for the code). Trying to execute code in .bss
violates its specific segmentation, so the system throws a segmentation fault.
To illustrate, here's (part of) an objdump
of the resulting file:
# (unimportant)
Disassembly of section .text:
0000000000001020 <_start>:
1020: f3 0f 1e fa endbr64
1024: 31 ed xor %ebp,%ebp
1026: 49 89 d1 mov %rdx,%r9
1029: 5e pop %rsi
102a: 48 89 e2 mov %rsp,%rdx
102d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1031: 50 push %rax
1032: 54 push %rsp
1033: 4c 8d 05 56 01 00 00 lea 0x156(%rip),%r8 # 1190 <__libc_csu_fini>
103a: 48 8d 0d df 00 00 00 lea 0xdf(%rip),%rcx # 1120 <__libc_csu_init>
# This is where the program should call main
1041: 48 8d 3d e4 2f 00 00 lea 0x2fe4(%rip),%rdi # 402c <main>
1048: ff 15 92 2f 00 00 callq *0x2f92(%rip) # 3fe0 <__libc_start_main@GLIBC_2.2.5>
104e: f4 hlt
104f: 90 nop
# (nice things we still don't care about)
Disassembly of section .data:
0000000000004018 <__data_start>:
...
0000000000004020 <__dso_handle>:
4020: 20 40 00 and %al,0x0(%rax)
4023: 00 00 add %al,(%rax)
4025: 00 00 add %al,(%rax)
...
Disassembly of section .bss:
0000000000004028 <__bss_start>:
4028: 00 00 add %al,(%rax)
...
# main is in .bss (variables) instead of .text (code)
000000000000402c <main>:
402c: 00 00 add %al,(%rax)
...
# aaand that's it!
PS: This won't work if you compile to a flat executable. Instead, you will cause undefined behaviour.
A correct program doesn't produce a segfault. And you cannot describe deterministic behaviour of an incorrect program.
A "segmentation fault" is a thing that an x86 CPU does. You get it by attempting to reference memory in an incorrect way. It can also refer to a situation where memory access causes a page fault (i.e. trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory. To trigger those conditions, you need to program directly for your OS and your hardware. It is nothing that is specified by the C language.
A segmentation fault is an implementation defined behavior. The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant. To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document. Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.
If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says (emphasis mine going forward):
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
and in paragraph 2 says:
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV)
should accomplish that goal. Although, strictly speaking, SIGSEGV
is defined as follows:
SIGSEGV an invalid access to storage
and §7.14 Signal handling <signal.h>
says:
An implementation need not generate any of these signals, except as a result of explicit calls to the raise function. Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation. The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive.
If we assume we are not raising a signal calling raise
, segmentation fault is likely to come from undefined behavior. Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations. Moreover a program which invokes undefined behavior is an erroneous program.
But this one is the shortest I can get that segfault on my system:
main(){main();}
(I compile with gcc
and -std=c89 -O0
).
And by the way, does this program really invokes undefined bevahior?
Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault. (Since C99 it includes the signal number SIGSEGV
, but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV)
, which as discussed in other answers doesn't count.)
Therefore, there is no "strictly conforming" program (i.e. program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.
Segmentation faults are defined by a different standard, POSIX. This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" (SIGBUS
), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf
, posix_memalign
and mprotect
succeed. My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming.
#define _XOPEN_SOURCE 700
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(void)
{
size_t pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == (size_t)-1) {
fprintf(stderr, "sysconf: %s\n", strerror(errno));
return 1;
}
void *page;
int err = posix_memalign(&page, pagesize, pagesize);
if (err || !page) {
fprintf(stderr, "posix_memalign: %s\n", strerror(err));
return 1;
}
if (mprotect(page, pagesize, PROT_NONE)) {
fprintf(stderr, "mprotect: %s\n", strerror(errno));
return 1;
}
*(long *)page = 0xDEADBEEF;
return 0;
}