You actually don't need an ABI at all if--
- Your program doesn't have functions, and--
- Your program is a single executable that is running alone (i.e. an embedded system) where it's literally the only thing running and it doesn't need to talk to anything else.
An oversimplified summary:
API: "Here are all the functions you may call."
ABI: "This is how to call a function."
The ABI is set of rules that compilers and linkers adhere to in order to compile your program so that will work properly. ABIs cover multiple topics:
- Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code.
- ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling".
- ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.
Taking a deeper look at calling convention, which I consider to be the core of an ABI:
The machine itself has no concept of "functions". When you write a function in a high-level language like c, the compiler generates a line of assembly code like _MyFunction1:
. This is a label, which will eventually get resolved into an address by the assembler. This label marks the "start" of your "function" in the assembly code. In high-level code, when you "call" that function, what you're really doing is causing the CPU to jump to the address of that label and continue executing there.
In preparation for the jump, the compiler must do a bunch of important stuff. The calling convention is like a checklist that the compiler follows to do all this stuff:
- First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing.
- Next, the compiler generates assembly code to pass the arguments.
- Some calling conventions dictate that arguments should be put on the stack (in a particular order of course).
- Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course).
- Still other conventions dictate that a specific combination of stack and registers should be used.
- Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them.
- Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (
_MyFunction1:
). At this point, you can consider the CPU to be "in" your "function".
- At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack.
- Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code.
- Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code.
- Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.
There are many different ABIs / calling conventions. Some main ones are:
- For the x86 or x86-64 CPU (32-bit environment):
- CDECL
- STDCALL
- FASTCALL
- VECTORCALL
- THISCALL
- For the x86-64 CPU (64-bit environment):
- SYSTEMV
- MSNATIVE
- VECTORCALL
- For the ARM CPU (32-bit)
- For the ARM CPU (64-bit)
Here is a great page that actually shows the differences in the assembly generated when compiling for different ABIs.
Another thing to mention is that an ABI isn't only relevant inside your program's executable module. It's also used by the linker to make sure your program calls library functions correctly. You have multiple shared libraries running on your computer, and as long as your compiler knows what ABI they each use, it can call functions from them properly without blowing up the stack.
Your compiler understanding how to call library functions is extremely important. On a hosted platform (that is, one where an OS loads programs), your program can't even blink without making a kernel call.