What is an application binary interface (ABI)?

后端 未结 16 1497
情歌与酒
情歌与酒 2020-11-22 11:41

I never clearly understood what an ABI is. Please don\'t point me to a Wikipedia article. If I could understand it, I wouldn\'t be here posting such a lengthy post.

相关标签:
16条回答
  • 2020-11-22 12:04

    The ABI needs to be consistent between caller and callee to be certain that the call succeeds. Stack use, register use, end-of-routine stack pop. All these are the most important parts of the ABI.

    0 讨论(0)
  • 2020-11-22 12:07

    In short and in philosophy, only things of a kind can get along well, and the ABI could be seen as the kind of which software stuff work together.

    0 讨论(0)
  • 2020-11-22 12:08

    One easy way to understand "ABI" is to compare it to "API".

    You are already familiar with the concept of an API. If you want to use the features of, say, some library or your OS, you will program against an API. The API consists of data types/structures, constants, functions, etc that you can use in your code to access the functionality of that external component.

    An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.

    ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.

    Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-compiled to use the new version of the library. If the ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.

    For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

    An ABI isn't necessarily something you will explicitly provide unless you are doing very low-level systems design work. It isn't language-specific either, since (for example) a C application and a Pascal application can use the same ABI after they are compiled.

    Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

    IIRC, Windows currently uses the Portable Executable (or, PE) format. There are links in the "external links" section of that Wikipedia page with more information about the PE format.

    Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c" in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.

    0 讨论(0)
  • 2020-11-22 12:09

    The term ABI is used to refer to two distinct but related concepts.

    When talking about compilers it refers to the rules used to translate from source-level constructs to binary constructs. How big are the data types? how does the stack work? how do I pass parameters to functions? which registers should be saved by the caller vs the callee?

    When talking about libraries it refers to the binary interface presented by a compiled library. This interface is the result of a number of factors including the source code of the library, the rules used by the compiler and in some cases definitions picked up from other libraries.

    Changes to a library can break the ABI without breaking the API. Consider for example a library with an interface like.

    void initfoo(FOO * foo)
    int usefoo(FOO * foo, int bar)
    void cleanupfoo(FOO * foo)
    

    and the application programmer writes code like

    int dostuffwithfoo(int bar) {
      FOO foo;
      initfoo(&foo);
      int result = usefoo(&foo,bar)
      cleanupfoo(&foo);
      return result;
    }
    

    The application programmer doesn't care about the size or layout of FOO, but the application binary ends up with a hardcoded size of foo. If the library programmer adds an extra field to foo and someone uses the new library binary with the old application binary then the library may make out of bounds memory accesses.

    OTOH if the library author had designed their API like.

    FOO * newfoo(void)
    int usefoo(FOO * foo, int bar)
    void deletefoo((FOO * foo, int bar))
    

    and the application programmer writes code like

    int dostuffwithfoo(int bar) {
      FOO * foo;
      foo = newfoo();
      int result = usefoo(foo,bar)
      deletefoo(foo);
      return result;
    }
    

    Then the application binary does not need to know anything about the structure of FOO, that can all be hidden inside the library. The price you pay for that though is that heap operations are involved.

    0 讨论(0)
  • 2020-11-22 12:10

    Summary

    There are various interpretation and strong opinions of the exact layer that define an ABI (application binary interface).

    In my view an ABI is a subjective convention of what is considered a given/platform for a specific API. The ABI is the "rest" of conventions that "will not change" for a specific API or that will be addressed by the runtime environment: executors, tools, linkers, compilers, jvm, and OS.

    Defining an Interface: ABI, API

    If you want to use a library like joda-time you must declare a dependency on joda-time-<major>.<minor>.<patch>.jar. The library follows best practices and use Semantic Versioning. This defines the API compatibility at three levels:

    1. Patch - You don't need to change at all your code. The library just fixes some bugs.
    2. Minor - You don't need to change your code since the additions
    3. Major - The interface (API) is changed and you might need to change your code.

    In order for you to use a new major release of the same library a lot of other conventions are still to be respected:

    • The binary language used for the libraries (in Java cases the JVM target version that defines the Java bytecode)
    • Calling conventions
    • JVM conventions
    • Linking conventions
    • Runtime conventions All these are defined and managed by the tools we use.

    Examples

    Java case study

    For example, Java standardized all these conventions, not in a tool, but in a formal JVM specification. The specification allowed other vendors to provide a different set of tools that can output compatible libraries.

    Java provides two other interesting case studies for ABI: Scala versions and Dalvik virtual machine.

    Dalvik virtual machine broke the ABI

    The Dalvik VM needs a different type of bytecode than the Java bytecode. The Dalvik libraries are obtained by converting the Java bytecode (with same API) for Dalvik. In this way you can get two versions of the same API: defined by the original joda-time-1.7.2.jar. We could call it joda-time-1.7.2.jar and joda-time-1.7.2-dalvik.jar. They use a different ABI one is for the stack-oriented standard Java vms: Oracle's one, IBM's one, open Java or any other; and the second ABI is the one around Dalvik.

    Scala successive releases are incompatible

    Scala doesn't have binary compatibility between minor Scala versions: 2.X . For this reason the same API "io.reactivex" %% "rxscala" % "0.26.5" has three versions (in the future more): for Scala 2.10, 2.11 and 2.12. What is changed? I don't know for now, but the binaries are not compatible. Probably the latest versions adds things that make the libraries unusable on the old virtual machines, probably things related to linking/naming/parameter conventions.

    Java successive releases are incompatible

    Java has problems with the major releases of the JVM too: 4,5,6,7,8,9. They offer only backward compatibility. Jvm9 knows how to run code compiled/targeted (javac's -target option) for all other versions, while JVM 4 doesn't know how to run code targeted for JVM 5. All these while you have one joda-library. This incompatibility flies bellow the radar thanks to different solutions:

    1. Semantic versioning: when libraries target higher JVM they usually change the major version.
    2. Use JVM 4 as the ABI, and you're safe.
    3. Java 9 adds a specification on how you can include bytecode for specific targeted JVM in the same library.

    Why did I start with the API definition?

    API and ABI are just conventions on how you define compatibility. The lower layers are generic in respect of a plethora of high level semantics. That's why it's easy to make some conventions. The first kind of conventions are about memory alignment, byte encoding, calling conventions, big and little endian encodings, etc. On top of them you get the executable conventions like others described, linking conventions, intermediate byte code like the one used by Java or LLVM IR used by GCC. Third you get conventions on how to find libraries, how to load them (see Java classloaders). As you go higher and higher in concepts you have new conventions that you consider as a given. That's why they didn't made it to the semantic versioning. They are implicit or collapsed in the major version. We could amend semantic versioning with <major>-<minor>-<patch>-<platform/ABI>. This is what is actually happening already: platform is already a rpm, dll, jar (JVM bytecode), war(jvm+web server), apk, 2.11 (specific Scala version) and so on. When you say APK you already talk about a specific ABI part of your API.

    API can be ported to different ABI

    The top level of an abstraction (the sources written against the highest API can be recompiled/ported to any other lower level abstraction.

    Let's say I have some sources for rxscala. If the Scala tools are changed I can recompile them to that. If the JVM changes I could have automatic conversions from the old machine to the new one without bothering with the high level concepts. While porting might be difficult will help any other client. If a new operating system is created using a totally different assembler code a translator can be created.

    APIs ported across languages

    There are APIs that are ported in multiple languages like reactive streams. In general they define mappings to specific languages/platforms. I would argue that the API is the master specification formally defined in human language or even a specific programming language. All the other "mappings" are ABI in a sense, else more API than the usual ABI. The same is happening with the REST interfaces.

    0 讨论(0)
  • 2020-11-22 12:11

    An application binary interface (ABI) is similar to an API, but the function is not accessible to the caller at source code level. Only a binary representation is accessible/available.

    ABIs may be defined at the processor-architecture level or at the OS level. The ABIs are standards to be followed by the code-generator phase of the compiler. The standard is fixed either by the OS or by the processor.

    Functionality: Define the mechanism/standard to make function calls independent of the implementation language or a specific compiler/linker/toolchain. Provide the mechanism which allows JNI, or a Python-C interface, etc.

    Existing entities: Functions in machine code form.

    Consumer: Another function (including one in another language, compiled by another compiler, or linked by another linker).

    0 讨论(0)
提交回复
热议问题