How to programatically find the bytecode (CIL) in a .Net executable/dll?

问题

I would like to open a PE file (which i know is a .Net assembly) and find where the .Net bytecode is (ideally starting at the entrypoint). I know that the PE header data (entrypoint RVA) take me just to a stub which calls CorExeMain from mscoree.dll.

This is not what i'm looking for though. I would like to find the bytecode that gets run by mscorlib. How can i do that using C++ and no external tools like ildasm, dumpbin etc. ? I can already parse the PE header and know what image base/RVA means. I just cannot figure out where to find sufficient info about the location of the IL bytecode.

回答1:

Have a look at ECMA-335 - the details of the file format are in there, in partition II section 22-25. I seem to remember finding a few bugs in it when I tried to write a parser a while ago, but with a bit of perseverance it's all doable.

回答2:

I would probably grab the code from mono (cil_coff.h, pedump.c) rather than writing one from scratch.

回答3:

On Windows, there's a COM API for it, IMetaDataImport (extended by IMetaDataImport2 with support for generics). Examples of its use are rather lacking though. The seemingly not-so-maintained but open source IL debugger/editor dile (only supports .NET 4 so-so in weekly builds) uses it, so you could check out its code. Dile's author also wrote a blog post on using that API as well, but it's rather winded. Link not given because of spam rules. Google for "Reading types from assembly".

Microsoft's mdbg, which comes with .NET SDK also uses it. Unfortunately, the mdbg sources have been released only for the .NET 2.0 version. Google for mdbgSample21.EXE.

来源：https://stackoverflow.com/questions/3707295/how-to-programatically-find-the-bytecode-cil-in-a-net-executable-dll

标签

.net

assemblies

bytecode

portable-executable