I came across this paragraph from this answer by @zwol recently:
The
__libc_
prefix onread
is because there are actually three
Kaz and R.. have explained why a C library will, in general, need to have two names for functions such as read
, that are called by both applications and other functions within the C library. One of those names will be the official, documented name (e.g. read
) and one of them will have a prefix that makes it a name reserved for the implementation (e.g. __read
).
The GNU C Library has three names for some of its functions: the official name (read
) plus two different reserved names (e.g. both __read
and __libc_read
). This is not because of any requirements made by the C standard; it's a hack to squeeze a little extra performance out of some heavily-used internal code paths.
The compiled code of GNU libc, on disk, is split into several shared objects: libc.so.6
, ld.so.1
, libpthread.so.0
, libm.so.6
, libdl.so.2
, etc. (exact names may vary depending on the underlying CPU and OS). The functions in each shared object often need to call other functions defined within the same shared object; less often, they need to call functions defined within a different shared object.
Function calls within a single shared object are more efficient if the callee's name is hidden—only usable by callers within that same shared object. This is because globally visible names can be interposed. Suppose that both the main executable and a shared object define the name __read
. Which one will be used? The ELF specification says that the definition in the main executable wins, and all calls to that name from anywhere must resolve to that definition. (The ELF specification is language-agnostic and does not make any use of the C standard's distinction between reserved and non-reserved identifiers.)
Interposition is implemented by sending all calls to globally visible symbols through the procedure linkage table, which involves an extra layer of indirection and a runtime-variable final destination. Calls to hidden symbols, on the other hand, can be made directly.
read
is defined in libc.so.6
. It is called by other functions within libc.so.6
; it's also called by functions within other shared objects that are also part of GNU libc; and finally it's called by applications. So, it is given three names:
__libc_read
, a hidden name used by callers from within libc.so.6
. (nm --dynamic /lib/libc.so.6 | grep read
will not show this name.)__read
, a visible reserved name, used by callers from within libpthread.so.0
and other components of glibc.read
, a visible normal name, used by callers from applications.Sometimes the hidden name has a __libc
prefix and the visible implementation name has just two underscores; sometimes it's the other way around. This doesn't mean anything. It's because GNU libc has been under continuous development since the 1990s and its developers have changed their minds about internal conventions several times, but haven't always bothered to fix up all the old-style code to match the new convention (sometimes compatibility requirements mean we can't fix up the old code, even).