What is an application binary interface (ABI)?

后端 未结 16 1488
情歌与酒
情歌与酒 2020-11-22 11:41

I never clearly understood what an ABI is. Please don\'t point me to a Wikipedia article. If I could understand it, I wouldn\'t be here posting such a lengthy post.

相关标签:
16条回答
  • 2020-11-22 12:12

    If you know assembly and how things work at the OS-level, you are conforming to a certain ABI. The ABI govern things like how parameters are passed, where return values are placed. For many platforms there is only one ABI to choose from, and in those cases the ABI is just "how things work".

    However, the ABI also govern things like how classes/objects are laid out in C++. This is necessary if you want to be able to pass object references across module boundaries or if you want to mix code compiled with different compilers.

    Also, if you have an 64-bit OS which can execute 32-bit binaries, you will have different ABIs for 32- and 64-bit code.

    In general, any code you link into the same executable must conform to the same ABI. If you want to communicate between code using different ABIs, you must use some form of RPC or serialization protocols.

    I think you are trying too hard to squeeze in different types of interfaces into a fixed set of characteristics. For example, an interface doesn't necessarily have to be split into consumers and producers. An interface is just a convention by which two entities interact.

    ABIs can be (partially) ISA-agnostic. Some aspects (such as calling conventions) depend on the ISA, while other aspects (such as C++ class layout) do not.

    A well defined ABI is very important for people writing compilers. Without a well defined ABI, it would be impossible to generate interoperable code.

    EDIT: Some notes to clarify:

    • "Binary" in ABI does not exclude the use of strings or text. If you want to link a DLL exporting a C++ class, somewhere in it the methods and type signatures must be encoded. That's where C++ name-mangling comes in.
    • The reason why you never provided an ABI is that the vast majority of programmers will never do it. ABIs are provided by the same people designing the platform (i.e. operating system), and very few programmers will ever have the privilege to design a widely-used ABI.
    0 讨论(0)
  • 2020-11-22 12:14

    Functionality: A set of contracts which affect the compiler, assembly writers, the linker, and the operating system. The contracts specify how functions are laid out, where parameters are passed, how parameters are passed, how function returns work. These are generally specific to a (processor architecture, operating system) tuple.

    Existing entities: parameter layout, function semantics, register allocation. For instance, the ARM architectures has numerous ABIs (APCS, EABI, GNU-EABI, never mind a bunch of historical cases) - using the a mixed ABI will result in your code simply not working when calling across boundaries.

    Consumer: The compiler, assembly writers, operating system, CPU specific architecture.

    Who needs these details? The compiler, assembly writers, linkers which do code generation (or alignment requirements), operating system (interrupt handling, syscall interface). If you did assembly programming, you were conforming to an ABI!

    C++ name mangling is a special case - its a linker and dynamic linker centered issue - if name mangling is not standardized, then dynamic linking will not work. Henceforth, the C++ ABI is called just that, the C++ ABI. It is not a linker level issue, but instead a code generation issue. Once you have a C++ binary, it is not possible to make it compatible with another C++ ABI (name mangling, exception handling) without recompiling from source.

    ELF is a file format for the use of a loader and dynamic linker. ELF is a container format for binary code and data, and as such specifies the ABI of a piece of code. I would not consider ELF to be an ABI in the strict sense, as PE executables are not an ABI.

    All ABIs are instruction set specific. An ARM ABI will not make sense on an MSP430 or x86_64 processor.

    Windows has several ABIs - for instance, fastcall and stdcall are two common use ABIs. The syscall ABI is different again.

    0 讨论(0)
  • 2020-11-22 12:14

    Application binary interface (ABI)

    Functionality:

    • Translation from the programmer's model to the underlying system's domain data type, size, alignment, the calling convention, which controls how functions' arguments are passed and return values retrieved; the system call numbers and how an application should make system calls to the operating system; the high-level language compilers' name mangling scheme, exception propagation, and calling convention between compilers on the same platform, but do not require cross-platform compatibility...

    Existing entities:

    • Logical blocks that directly participate in program's execution: ALU, general purpose registers, registers for memory/ I/O mapping of I/O, etc...

    consumer:

    • Language processors linker, assembler...

    These are needed by whoever has to ensure that build tool-chains work as a whole. If you write one module in assembly language, another in Python, and instead of your own boot-loader want to use an operating system, then your "application" modules are working across "binary" boundaries and require agreement of such "interface".

    C++ name mangling because object files from different high-level languages might be required to be linked in your application. Consider using GCC standard library making system calls to Windows built with Visual C++.

    ELF is one possible expectation of the linker from an object file for interpretation, though JVM might have some other idea.

    For a Windows RT Store app, try searching for ARM ABI if you really wish to make some build tool-chain work together.

    0 讨论(0)
  • 2020-11-22 12:15

    ABI - Application Binary Interface is about a machine code communication in runtime between two binary parts like - application, library, OS... ABI describes how objects are saved in memory, how functions are called(calling convention), mangling...

    A good example of API and ABI is iOS ecosystem with Swift language.

    • Application layer - When you create an application using different languages. For example you can create application using Swift and Objective-C[Mixing Swift and Objective-C]

    • Application - OS layer - runtime - Swift runtime and standard libraries are parts of OS and they should not be included into each bundle(e.g. app, framework). It is the same as like Objective-C uses

    • Library layer - Module Stability case - compile time - you will be able to import a framework which was built with another version of Swift's compiler. It means that it is safety to create a closed-source(pre-build) binary which will be consumed by a different version of compiler( .swiftinterface is used with .swiftmodule) and you will not get

      Module compiled with _ cannot be imported by the _ compiler
      
    • Library layer - Library Evolution case

    1. Compile time - if a dependency was changed, a client has not to be recompiled.
    2. Runtime - a system library or a dynamic framework can be hot-swapped by a new one.

    [API vs ABI]
    [Swift Module and Library stability]

    0 讨论(0)
  • 2020-11-22 12:16

    Linux shared library minimal runnable ABI example

    In the context of shared libraries, the most important implication of "having a stable ABI" is that you don't need to recompile your programs after the library changes.

    So for example:

    • if you are selling a shared library, you save your users the annoyance of recompiling everything that depends on your library for every new release

    • if you are selling closed source program that depends on a shared library present in the user's distribution, you could release and test less prebuilts if you are certain that ABI is stable across certain versions of the target OS.

      This is specially important in the case of the C standard library, which many many programs in your system link to.

    Now I want to provide a minimal concrete runnable example of this.

    main.c

    #include <assert.h>
    #include <stdlib.h>
    
    #include "mylib.h"
    
    int main(void) {
        mylib_mystruct *myobject = mylib_init(1);
        assert(myobject->old_field == 1);
        free(myobject);
        return EXIT_SUCCESS;
    }
    

    mylib.c

    #include <stdlib.h>
    
    #include "mylib.h"
    
    mylib_mystruct* mylib_init(int old_field) {
        mylib_mystruct *myobject;
        myobject = malloc(sizeof(mylib_mystruct));
        myobject->old_field = old_field;
        return myobject;
    }
    

    mylib.h

    #ifndef MYLIB_H
    #define MYLIB_H
    
    typedef struct {
        int old_field;
    } mylib_mystruct;
    
    mylib_mystruct* mylib_init(int old_field);
    
    #endif
    

    Compiles and runs fine with:

    cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
    $cc -fPIC -c -o mylib.o mylib.c
    $cc -L . -shared -o libmylib.so mylib.o
    $cc -L . -o main.out main.c -lmylib
    LD_LIBRARY_PATH=. ./main.out
    

    Now, suppose that for v2 of the library, we want to add a new field to mylib_mystruct called new_field.

    If we added the field before old_field as in:

    typedef struct {
        int new_field;
        int old_field;
    } mylib_mystruct;
    

    and rebuilt the library but not main.out, then the assert fails!

    This is because the line:

    myobject->old_field == 1
    

    had generated assembly that is trying to access the very first int of the struct, which is now new_field instead of the expected old_field.

    Therefore this change broke the ABI.

    If, however, we add new_field after old_field:

    typedef struct {
        int old_field;
        int new_field;
    } mylib_mystruct;
    

    then the old generated assembly still accesses the first int of the struct, and the program still works, because we kept the ABI stable.

    Here is a fully automated version of this example on GitHub.

    Another way to keep this ABI stable would have been to treat mylib_mystruct as an opaque struct, and only access its fields through method helpers. This makes it easier to keep the ABI stable, but would incur a performance overhead as we'd do more function calls.

    API vs ABI

    In the previous example, it is interesting to note that adding the new_field before old_field, only broke the ABI, but not the API.

    What this means, is that if we had recompiled our main.c program against the library, it would have worked regardless.

    We would also have broken the API however if we had changed for example the function signature:

    mylib_mystruct* mylib_init(int old_field, int new_field);
    

    since in that case, main.c would stop compiling altogether.

    Semantic API vs Programming API

    We can also classify API changes in a third type: semantic changes.

    The semantic API, is usually a natural language description of what the API is supposed to do, usually included in the API documentation.

    It is therefore possible to break the semantic API without breaking the program build itself.

    For example, if we had modified

    myobject->old_field = old_field;
    

    to:

    myobject->old_field = old_field + 1;
    

    then this would have broken neither programming API, nor ABI, but main.c the semantic API would break.

    There are two ways to programmatically check the contract API:

    • test a bunch of corner cases. Easy to do, but you might always miss one.
    • formal verification. Harder to do, but produces mathematical proof of correctness, essentially unifying documentation and tests into a "human" / machine verifiable manner! As long as there isn't a bug in your formal description of course ;-)

      This concept is closely related to the formalization of Mathematics itself: https://math.stackexchange.com/questions/53969/what-does-formal-mean/3297537#3297537

    List of everything that breaks C / C++ shared library ABIs

    TODO: find / create the ultimate list:

    • https://github.com/lvc/abi-compliance-checker automated tool to check it
    • https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B KDE C++ ABI guidelines
    • https://plan99.net/~mike/writing-shared-libraries.html

    Java minimal runnable example

    What is binary compatibility in Java?

    Tested in Ubuntu 18.10, GCC 8.2.0.

    0 讨论(0)
  • 2020-11-22 12:20

    Let me at least answer a part of your question. With an example of how the Linux ABI affects the systemcalls, and why that is usefull.

    A systemcall is a way for a userspace program to ask the kernelspace for something. It works by putting the numeric code for the call and the argument in a certain register and triggering an interrupt. Than a switch occurs to kernelspace and the kernel looks up the numeric code and the argument, handles the request, puts the result back into a register and triggers a switch back to userspace. This is needed for example when the application wants to allocate memory or open a file (syscalls "brk" and "open").

    Now the syscalls have short names "brk", etc. and corresponding opcodes, these are defined in a system specific header file. As long as these opcodes stay the same you can run the same compiled userland programs with different updated kernels without having to recompile. So you have an interface used by precompiled binarys, hence ABI.

    0 讨论(0)
提交回复
热议问题