When I try to research about return values of system calls of the kernel, I find tables that describe them and what do I need to put in the different registers to let them w
See also this excellent LWN article about system calls which assumes C knowledge.
Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.
sys_read
means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read
system call is a kernel function called sys_read()
. You can't call it with a call
instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read
" to distinguish it from the libc function call. However, it's ok to say read
even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.
Also note that syscall.h
defines constants like SYS_read
with the actual system call number. (The value you put in EAX before an int 0x80
or syscall
instruction).
Linux system call return values (in EAX
/RAX
on x86) are either a non-negative value for success, or a negative error code. e.g. -EFAULT
if you pass an invalid pointer.
This behaviour is documented in the syscalls(2) man page.
Actually, -1 to -4095 means error, anything else means success. glibc's generic syscall(2) wrapper uses this sequence: cmp rax, -4095
/ jae SYSCALL_ERROR_LABEL
, which is apparently guaranteed to be future-proof for all Linux system calls. Interesting cases include mmap
where valid addresses can have the sign bit set, but must be page aligned, and getpriority
where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.
Update, yes it's definitely guaranteed for all syscalls that -4095
.. -1
is the range of errors on all architectures Linux runs on. See AOSP non-obvious syscall() implementation for more details. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linux's don't-break-userspace policy of keeping kernel ABIs stable.)
To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #define
d. See my answer on a question about that for details.
The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read
is the raw system call that the glibc read()
function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.
RETURN VALUE
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.On error, -1 is returned, and errno is set appropriately. In this case, it is left unspecified whether the file position (if any)
changes.
Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX
if the raw system call's return value is negative, so errno=EFAULT
and return -1
if the raw system call returned -EFAULT
.
And there's a whole section listing all the possible error codes that read()
is allowed to return, and what they mean specifically for read()
. (POSIX standardizes most of this behaviour.)
I'm not sure exactly where glibc decodes return values for mmap(2), where the return value is not a signed type. It probably uses the same method as the generic syscall wrapper (checking for unsigned value > -4096UL
), but the specific wrappers for each system call don't have the overhead of actually shuffling the args between registers and calling that function.
I'm not seeing it in the glibc source tree; presumably it's buried under some layers of macros. e.g. in the x86-64 macro