Compiled Haskell program to LLVM IR is missing main

前端 未结 1 672
悲&欢浪女
悲&欢浪女 2021-01-23 10:31

following this SO post regarding the compilation of Haskell programs to LLVM IR, I took the same Haskell program and tried to run its resulting LLVM IR code:

qui         


        
相关标签:
1条回答
  • 2021-01-23 11:24

    tl;dr. The entry point is (probably) named ZCMain_main_closure, and it's a data structure that references a block of code, rather than a block of code itself. Still, it's interpretable by the Haskell runtime, and it corresponds directly to the Haskell "value" of the function main :: IO () in your main.hs program.

    The longer answer involves more than you ever wanted to know about linking programs, but here's the deal. When you take a C program like:

    #include <stdio.h>
    int main()
    {
            printf("I like C!\n");
    }
    

    compile it to an object file with gcc:

    $ gcc -Wall -c hello.c
    

    and inspect the object file's symbol table:

    $ nm hello.o
    0000000000000000 T main
                     U printf
    

    you will see that it contains a definition of the symbol main and an (undefined) reference to an external symbol printf.

    Now, you might imagine that main is the "entry point" of this program. Hah hah hah! What a naive and silly thing for you to think!

    In fact, real Linux gurus know that the entry point to your program isn't in the object file hello.o at all. Where is it? Well, it's in the "C runtime", a little file that gets linked in by gcc when you actually create your executable:

    $ nm /usr/lib/x86_64-linux-gnu/crt1.o
    0000000000000000 D __data_start
    0000000000000000 W data_start
    0000000000000000 R _IO_stdin_used
                     U __libc_csu_fini
                     U __libc_csu_init
                     U __libc_start_main
                     U main
    0000000000000000 T _start
    $
    

    Note that this object file has an undefined reference to main which will be linked to your so-called entry point in hello.o. It's this little stub defines the real entry point, namely _start. You can tell this is the actual entry point because if you link the program into an executable, you'll see that the location of the _start symbol and the ELF entry point (which is the address to which the kernel actually first transfers control when you execve() your program) will coincide:

    $ gcc -o hello hello.o
    $ nm hello | egrep 'T _start'
    0000000000400430 T _start
    $ readelf -h hello | egrep Entry
    Entry point address:               0x400430
    

    All this is to say, the "entry point" of a program is actually a pretty complex concept.

    When you compile and run a C program with the LLVM toolchain instead of GCC, the situation is all pretty similar. That's by design to keep everything compatible with GCC. The so-called entry point in your hello.ll file is just the C function main, and it's not the real entry point of your program. That's still provided by the crt1.o stub.

    Now, if we (finally) switch from talking about C to talking about Haskell, the Haskell runtime is, obviously, about a billion times more complicated than the C runtime, but it's been built on top of the C runtime. So, when you compile a Haskell program the normal way:

    $ ghc main.hs
    stack ghc -- main.hs
    [1 of 1] Compiling Main             ( main.hs, main.o )
    Linking main ...
    $
    

    you can see that the executable has an entry point named _start:

    $ nm main | egrep 'T _start'
    0000000000406560 T _start
    

    which is actually the same C runtime stub as before that calls the C entry point:

    $ nm main | egrep 'T main'
    0000000000406dc4 T main
    $ 
    

    but this main is not your Haskell main. This main is a C main function in a program dynamically created by GHC at link time. You can look at such a program by running:

    $ ghc -v -keep-tmp-files -fforce-recomp main.hs
    

    and rummaging around for a file named ghc_4.c somewhere in a /tmp subdirectory:

    $ cat /tmp/ghc10915_0/ghc_4.c
    #include "Rts.h"
    extern StgClosure ZCMain_main_closure;
    int main(int argc, char *argv[])
    {
     RtsConfig __conf = defaultRtsConfig;
     __conf.rts_opts_enabled = RtsOptsSafeOnly;
     __conf.rts_opts_suggestions = true;
     __conf.rts_hs_main = true;
     return hs_main(argc,argv,&ZCMain_main_closure,__conf);
    }
    

    Now, do you see that external reference to ZCMain_main_closure? That, believe it or not, is the Haskell entry point for your program, and you should find it in main.o, whether you compiled using the vanilla GHC pipeline or via the LLVM backend:

    $ egrep ZCMain_main_closure main.ll
    %ZCMain_main_closure_struct = type <{i64, i64, i64, i64}>
    ...
    

    Now, it's not a "function". It's a specially formatted data structure (a closure) that the Haskell runtime system understands. The hs_main() function above (yet another entry point!) is the main entry point into the Haskell runtime:

    $ nm ~/.stack/programs/x86_64-linux/ghc-8.4.3/lib/ghc-8.4.3/rts/libHSrts.a | egrep hs_main
    0000000000000000 T hs_main
    $
    

    and it accepts a closure for a Haskell main function as the Haskell entry point to begin executing your program.

    So, if you went through all this trouble in the hopes of isolating a Haskell program in an *.ll file that you could somehow run directly by jumping to its entry point, then I've got some bad news for you... ;)

    0 讨论(0)
提交回复
热议问题