Compiled Haskell program to LLVM IR is missing main

别说谁变了你拦得住时间么 提交于 2019-12-02 07:13:15

问题


following this SO post regarding the compilation of Haskell programs to LLVM IR, I took the same Haskell program and tried to run its resulting LLVM IR code:

quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
  where
    lesser  = filter (<  p) xs
    greater = filter (>= p) xs

main = print(quicksort([5,2,1,0,8,3]))

I first compiled it to LLVM IR with

$ ghc -keep-llvm-files main.hs

Then I transformed it to bitcode with:

$ llvm-as main.ll

However, when I tried to run it with lli I get the following error regarding a missing main:

$ lli main.bc
'main' function not found in module.

Am I doing something wrong? thanks.

EDIT: (from answer by K. A. Buhr)

$ ls -l main*
main.hs
$ ghc -keep-llvm-files main.hs
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
$ ls -l main*
main
main.hi
main.hs
main.ll
main.o
$ rm main main.hi main.o
$ llvm-as main.ll
$ llc main.bc -filetype=obj -o main.o
$ ghc -o main main.o
$ ./main
[0,1,2,3,5,8]

回答1:


tl;dr. The entry point is (probably) named ZCMain_main_closure, and it's a data structure that references a block of code, rather than a block of code itself. Still, it's interpretable by the Haskell runtime, and it corresponds directly to the Haskell "value" of the function main :: IO () in your main.hs program.

The longer answer involves more than you ever wanted to know about linking programs, but here's the deal. When you take a C program like:

#include <stdio.h>
int main()
{
        printf("I like C!\n");
}

compile it to an object file with gcc:

$ gcc -Wall -c hello.c

and inspect the object file's symbol table:

$ nm hello.o
0000000000000000 T main
                 U printf

you will see that it contains a definition of the symbol main and an (undefined) reference to an external symbol printf.

Now, you might imagine that main is the "entry point" of this program. Hah hah hah! What a naive and silly thing for you to think!

In fact, real Linux gurus know that the entry point to your program isn't in the object file hello.o at all. Where is it? Well, it's in the "C runtime", a little file that gets linked in by gcc when you actually create your executable:

$ nm /usr/lib/x86_64-linux-gnu/crt1.o
0000000000000000 D __data_start
0000000000000000 W data_start
0000000000000000 R _IO_stdin_used
                 U __libc_csu_fini
                 U __libc_csu_init
                 U __libc_start_main
                 U main
0000000000000000 T _start
$

Note that this object file has an undefined reference to main which will be linked to your so-called entry point in hello.o. It's this little stub defines the real entry point, namely _start. You can tell this is the actual entry point because if you link the program into an executable, you'll see that the location of the _start symbol and the ELF entry point (which is the address to which the kernel actually first transfers control when you execve() your program) will coincide:

$ gcc -o hello hello.o
$ nm hello | egrep 'T _start'
0000000000400430 T _start
$ readelf -h hello | egrep Entry
Entry point address:               0x400430

All this is to say, the "entry point" of a program is actually a pretty complex concept.

When you compile and run a C program with the LLVM toolchain instead of GCC, the situation is all pretty similar. That's by design to keep everything compatible with GCC. The so-called entry point in your hello.ll file is just the C function main, and it's not the real entry point of your program. That's still provided by the crt1.o stub.

Now, if we (finally) switch from talking about C to talking about Haskell, the Haskell runtime is, obviously, about a billion times more complicated than the C runtime, but it's been built on top of the C runtime. So, when you compile a Haskell program the normal way:

$ ghc main.hs
stack ghc -- main.hs
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
$

you can see that the executable has an entry point named _start:

$ nm main | egrep 'T _start'
0000000000406560 T _start

which is actually the same C runtime stub as before that calls the C entry point:

$ nm main | egrep 'T main'
0000000000406dc4 T main
$ 

but this main is not your Haskell main. This main is a C main function in a program dynamically created by GHC at link time. You can look at such a program by running:

$ ghc -v -keep-tmp-files -fforce-recomp main.hs

and rummaging around for a file named ghc_4.c somewhere in a /tmp subdirectory:

$ cat /tmp/ghc10915_0/ghc_4.c
#include "Rts.h"
extern StgClosure ZCMain_main_closure;
int main(int argc, char *argv[])
{
 RtsConfig __conf = defaultRtsConfig;
 __conf.rts_opts_enabled = RtsOptsSafeOnly;
 __conf.rts_opts_suggestions = true;
 __conf.rts_hs_main = true;
 return hs_main(argc,argv,&ZCMain_main_closure,__conf);
}

Now, do you see that external reference to ZCMain_main_closure? That, believe it or not, is the Haskell entry point for your program, and you should find it in main.o, whether you compiled using the vanilla GHC pipeline or via the LLVM backend:

$ egrep ZCMain_main_closure main.ll
%ZCMain_main_closure_struct = type <{i64, i64, i64, i64}>
...

Now, it's not a "function". It's a specially formatted data structure (a closure) that the Haskell runtime system understands. The hs_main() function above (yet another entry point!) is the main entry point into the Haskell runtime:

$ nm ~/.stack/programs/x86_64-linux/ghc-8.4.3/lib/ghc-8.4.3/rts/libHSrts.a | egrep hs_main
0000000000000000 T hs_main
$

and it accepts a closure for a Haskell main function as the Haskell entry point to begin executing your program.

So, if you went through all this trouble in the hopes of isolating a Haskell program in an *.ll file that you could somehow run directly by jumping to its entry point, then I've got some bad news for you... ;)



来源:https://stackoverflow.com/questions/52068999/compiled-haskell-program-to-llvm-ir-is-missing-main

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!