What is an application binary interface (ABI)?

后端 未结 16 1499
情歌与酒
情歌与酒 2020-11-22 11:41

I never clearly understood what an ABI is. Please don\'t point me to a Wikipedia article. If I could understand it, I wouldn\'t be here posting such a lengthy post.

相关标签:
16条回答
  • 2020-11-22 12:23

    You actually don't need an ABI at all if--

    • Your program doesn't have functions, and--
    • Your program is a single executable that is running alone (i.e. an embedded system) where it's literally the only thing running and it doesn't need to talk to anything else.

    An oversimplified summary:

    API: "Here are all the functions you may call."

    ABI: "This is how to call a function."

    The ABI is set of rules that compilers and linkers adhere to in order to compile your program so that will work properly. ABIs cover multiple topics:

    • Arguably the biggest and most important part of an ABI is the procedure call standard sometimes known as the "calling convention". Calling conventions standardize how "functions" are translated to assembly code.
    • ABIs also dictate the how the names of exposed functions in libraries should be represented so that other code can call those libraries and know what arguments should be passed. This is called "name mangling".
    • ABIs also dictate what type of data types can be used, how they must be aligned, and other low-level details.

    Taking a deeper look at calling convention, which I consider to be the core of an ABI:

    The machine itself has no concept of "functions". When you write a function in a high-level language like c, the compiler generates a line of assembly code like _MyFunction1:. This is a label, which will eventually get resolved into an address by the assembler. This label marks the "start" of your "function" in the assembly code. In high-level code, when you "call" that function, what you're really doing is causing the CPU to jump to the address of that label and continue executing there.

    In preparation for the jump, the compiler must do a bunch of important stuff. The calling convention is like a checklist that the compiler follows to do all this stuff:

    • First, the compiler inserts a little bit of assembly code to save the current address, so that when your "function" is done, the CPU can jump back to the right place and continue executing.
    • Next, the compiler generates assembly code to pass the arguments.
      • Some calling conventions dictate that arguments should be put on the stack (in a particular order of course).
      • Other conventions dictate that the arguments should be put in particular registers (depending on their data types of course).
      • Still other conventions dictate that a specific combination of stack and registers should be used.
    • Of course, if there was anything important in those registers before, those values are now overwritten and lost forever, so some calling conventions may dictate that the compiler should save some of those registers prior to putting the arguments in them.
    • Now the compiler inserts a jump instruction telling the CPU to go to that label it made previously (_MyFunction1:). At this point, you can consider the CPU to be "in" your "function".
    • At the end of the function, the compiler puts some assembly code that will make the CPU write the return value in the correct place. The calling convention will dictate whether the return value should be put into a particular register (depending on its type), or on the stack.
    • Now it's time for clean-up. The calling convention will dictate where the compiler places the cleanup assembly code.
      • Some conventions say that the caller must clean up the stack. This means that after the "function" is done and the CPU jumps back to where it was before, the very next code to be executed should be some very specific cleanup code.
      • Other conventions say that the some particular parts of the cleanup code should be at the end of the "function" before the jump back.

    There are many different ABIs / calling conventions. Some main ones are:

    • For the x86 or x86-64 CPU (32-bit environment):
      • CDECL
      • STDCALL
      • FASTCALL
      • VECTORCALL
      • THISCALL
    • For the x86-64 CPU (64-bit environment):
      • SYSTEMV
      • MSNATIVE
      • VECTORCALL
    • For the ARM CPU (32-bit)
      • AAPCS
    • For the ARM CPU (64-bit)
      • AAPCS64

    Here is a great page that actually shows the differences in the assembly generated when compiling for different ABIs.

    Another thing to mention is that an ABI isn't only relevant inside your program's executable module. It's also used by the linker to make sure your program calls library functions correctly. You have multiple shared libraries running on your computer, and as long as your compiler knows what ABI they each use, it can call functions from them properly without blowing up the stack.

    Your compiler understanding how to call library functions is extremely important. On a hosted platform (that is, one where an OS loads programs), your program can't even blink without making a kernel call.

    0 讨论(0)
  • 2020-11-22 12:23

    I was also trying to understand ABI and JesperE’s answer was very helpful.

    From a very simple perspective, we may try to understand ABI by considering binary compatibility.

    KDE wiki defines a library as binary compatible “if a program linked dynamically to a former version of the library continues running with newer versions of the library without the need to recompile.” For more on dynamic linking, refer Static linking vs dynamic linking

    Now, let’s try to look at just the most basic aspects needed for a library to be binary compatibility (assuming there are no source code changes to the library):

    1. Same/backward compatible instruction set architecture (processor instructions, register file structure, stack organization, memory access types, along with sizes, layout, and alignment of basic data types the processor can directly access)
    2. Same calling conventions
    3. Same name mangling convention (this might be needed if say a Fortran program needs to call some C++ library function).

    Sure, there are many other details but this is mostly what the ABI also covers.

    More specifically to answer your question, from the above, we can deduce:

    ABI functionality: binary compatibility

    existing entities: existing program/libraries/OS

    consumer: libraries, OS

    Hope this helps!

    0 讨论(0)
  • 2020-11-22 12:26

    The best way to differentiate between ABI and API is to know why and what is it used for:

    For x86-64 there is generally one ABI (and for x86 32-bit there is another set):

    http://www.x86-64.org/documentation/abi.pdf

    https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html

    http://people.freebsd.org/~obrien/amd64-elf-abi.pdf

    Linux + FreeBSD + MacOSX follow it with some slight variations. And Windows x64 have its own ABI:

    http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

    Knowing the ABI and assuming other compiler follows it as well, then the binaries theoretically know how to call each other (libraries API in particular) and pass parameters over the stack or by registers etc. Or what registers will be changed upon calling the functions etc. Essentially these knowledge will help software to integrate with one another. Knowing the order of the registers / stack layout I can easily piece together different software written in assemblies together without much problem.

    But API are different:

    It is a high level functions names, with argument defined, such that if different software pieces build using these API, MAY be able to call into one another. But an additional requirement of SAME ABI must be adhered to.

    For example, Windows used to be POSIX API compliant:

    https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

    https://en.wikipedia.org/wiki/POSIX

    And Linux is POSIX compliant as well. But the binaries cannot be just moved over and run immediately. But because they used the same NAMES in the POSIX compliant API, you can take the same software in C, recompile it in the different OS, and immediately get it running.

    API are meant to ease integration of software - pre-compilation stage. So after compilation the software can look totally different - if the ABI are different.

    ABI are meant to define exact integration of software at the binary / assembly level.

    0 讨论(0)
  • 2020-11-22 12:28

    In order to call code in shared libraries, or call code between compilation units, the object file needs to contain labels for the calls. C++ mangles the names of method labels in order to enforce data hiding and allow for overloaded methods. That is why you cannot mix files from different C++ compilers unless they explicitly support the same ABI.

    0 讨论(0)
提交回复
热议问题