Determine source language from a binary?

后端 未结 8 647
傲寒
傲寒 2020-12-05 20:26

I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would stri

相关标签:
8条回答
  • 2020-12-05 20:32

    I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.

    Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.

    As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.

    0 讨论(0)
  • 2020-12-05 20:34

    Well, C is initially converted the ASM, so you could write all C code in ASM.

    0 讨论(0)
  • 2020-12-05 20:35

    No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.

    0 讨论(0)
  • 2020-12-05 20:36

    I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.

    You have excluded all byte-code languages so this issue is going to be less common than expected.

    0 讨论(0)
  • 2020-12-05 20:41

    First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.

    Also, there's often a "map" to the various library functions. That's a big hint, also.

    When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.

    Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.

    0 讨论(0)
  • 2020-12-05 20:47

    The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).

    0 讨论(0)
提交回复
热议问题