Using higher-order Haskell types in C#

前端 未结 1 1659
刺人心
刺人心 2021-02-11 20:48

How can I use and call Haskell functions with higher-order type signatures from C# (DLLImport), like...

double :: (Int -> Int) -> Int -> Int -- higher o         


        
1条回答
  •  死守一世寂寞
    2021-02-11 21:12

    I'll elaborate here on my comment on FUZxxl's post.
    The examples you posted are all possible using FFI. Once you export your functions using FFI you can as you've already figured out compile the program into a DLL.

    .NET was designed with the intention of being able to interface easily with C, C++, COM, etc. This means that once you're able to compile your functions to a DLL, you can call it (relatively) easy from .NET. As I've mentioned before in my other post that you've linked to, keep in mind which calling convention you specify when exporting your functions. The standard in .NET is stdcall, while (most) examples of Haskell FFI export using ccall.

    So far the only limitation I've found on what can be exported by FFI is polymorphic types, or types that are not fully applied. e.g. anything other than kind * (You can't export Maybe but you can export Maybe Int for instance).

    I've written a tool Hs2lib that would cover and export automatically any of the functions you have in your example. It also has the option of generating unsafe C# code which makes it pretty much "plug and play". The reason I've choosen unsafe code is because it's easier to handle pointers with, which in turn makes it easier to do the marshalling for datastructures.

    To be complete I'll detail how the tool handles your examples and how I plan on handling polymorphic types.

    • Higher order functions

    When exporting higher order functions, the function needs to be slightly changed. The higher-order arguments need to become elements of FunPtr. Basically They're treated as explicit function pointers (or delegates in c#), which is how higher orderedness is typically done in imperative languages.
    Assuming we convert Int into CInt the type of double is transformed from

    (Int -> Int) -> Int -> Int
    

    into

    FunPtr (CInt -> CInt) -> CInt -> IO CInt
    

    These types are generated for a wrapper function (doubleA in this case) which is exported instead of double itself. The wrapper functions maps between the exported values and the expected input values for the original function. The IO is needed because constructing a FunPtr is not a pure operation.
    One thing to remember is that the only way to construct or dereference a FunPtr is by statically creating imports which instruct GHC to create stubs for this.

    foreign import stdcall "wrapper" mkFunPtr  :: (Cint -> CInt) -> IO (FunPtr (CInt -> CInt))
    foreign import stdcall "dynamic" dynFunPtr :: FunPtr (CInt -> CInt) -> CInt -> CInt
    

    The "wrapper" function allows us to create a FunPtr and the "dynamic" FunPtr allows one to deference one.

    In C# we declare the input as a IntPtr and then use the Marshaller helper function Marshal.GetDelegateForFunctionPointer to create a function pointer that we can call, or the inverse function to create a IntPtr from a function pointer.

    Also remember that the calling convention of the function being passed as an argument to the FunPtr must match the calling convention of the function to which the argument is being passed to. In other words, passing &foo to bar requires foo and bar to have the same calling convention.

    • User datatypes

    Exporting a user datatype is actually quite straight forward. For every datatype that needs to be exported a Storable instance has to be created for this type. This instances specifies the marshalling information that GHC needs in order to be able to export/import this type. Among other things you would need to define the size and alignment of the type, along with how to read/write to a pointer the values of the type. I partially use Hsc2hs for this task (hence the C macros in the file).

    newtypes or datatypes with just one constructor is easy. These become a flat struct since there's only one possible alternative when constructing/destructing these types. Types with multiple constructors become a union (a struct with Layout attribute set to Explicit in C#). However we also need to include an enum to identify which construct is being used.

    in general, the datatype Single defined as

    data Single = Single  { sint   ::  Int
                          , schar  ::  Char
                          }
    

    creates the following Storable instance

    instance Storable Single where
        sizeOf    _ = 8
        alignment _ = #alignment Single_t
    
        poke ptr (Single a1 a2) = do
            a1x <- toNative a1 :: IO CInt
            (#poke Single_t, sint) ptr a1x
            a2x <- toNative a2 :: IO CWchar
            (#poke Single_t, schar) ptr a2x
    
        peek ptr = do 
            a1' <- (#peek Single_t, sint) ptr :: IO CInt
            a2' <- (#peek Single_t, schar) ptr :: IO CWchar
            x1 <- fromNative a1' :: IO Int
            x2 <- fromNative a2' :: IO Char
            return $ Single x1 x2
    

    and the C struct

    typedef struct Single Single_t;
    
    struct Single {
         int sint;
         wchar_t schar;
    } ;
    

    The function foo :: Int -> Single would be exported as foo :: CInt -> Ptr Single While a datatype with multiple constructor

    data Multi  = Demi  {  mints    ::  [Int]
                        ,  mstring  ::  String
                        }
                | Semi  {  semi :: [Single]
                        }
    

    generates the following C code:

    enum ListMulti {cMultiDemi, cMultiSemi};
    
    typedef struct Multi Multi_t;
    typedef struct Demi Demi_t;
    typedef struct Semi Semi_t;
    
    struct Multi {
        enum ListMulti tag;
        union MultiUnion* elt;
    } ;
    
    struct Demi {
         int* mints;
         int mints_Size;
         wchar_t* mstring;
    } ;
    
    struct Semi {
         Single_t** semi;
         int semi_Size;
    } ;
    
    union MultiUnion {
        struct Demi var_Demi;
        struct Semi var_Semi;
    } ;
    

    The Storable instance is relatively straight forward and should follow easier from the C struct definition.

    • Applied types

    My dependency tracer would for emit for for the type Maybe Int the dependency on both the type Int and Maybe. This means, that when generating the Storable instance for Maybe Int the head looks like

    instance Storable Int => Storable (Maybe Int) where
    

    That is, aslong as there's a Storable instance for the arguments of the application the type itself can also be exported.

    Since Maybe a is defined as having a polymorphic argument Just a, when creating the structs, some type information is lost. The structs would contain a void* argument, which you have to manually convert to the right type. The alternative was too cumbersome in my opinion, which was to create specialized structs aswell. E.g. struct MaybeInt. But the amount of specialized structures that could be generated from a normal module can quickly explode this way. (might add this as a flag later on).

    To ease this loss of information my tool will export any Haddock documentation found for the function as comments in the generated includes. It will also place the original Haskell type signature in the comment as well. An IDE would then present these as part of its Intellisense (code compeletion).

    As with all of these examples I've ommited the code for the .NET side of things, If you're interested in that you can just view the output of Hs2lib.

    There are a few other types that need special treatment. In particular Lists and Tuples.

    1. Lists need to get passed the size of the array from which to marshall from, since we're interfacing with unmanaged languages where the size of the arrays are not implicitly known. Conversly when we return a list, we also need to return the size of the list.
    2. Tuples are special build in types, In order to export them, we have to first map them to a "normal" datatype, and export those. In the tool this is done up untill 8-tuples.

      • Polymorphic types

    The problem with polymorphic types e.g. map :: (a -> b) -> [a] -> [b] is that the size of a and b are not know. That is, there's no way to reserve space for the arguments and return value since we don't know what they are. I plan to support this by allowing you to specify possible values for a and b and create specialized wrapper function for these types. On the other size, in the imperative language I would use overloading to present the types you've chosen to the user.

    As for classes, Haskell's open world assumption is usually a problem (e.g. an instance can be added any time). However at the time of compilation only a statically known list of instances is available. I intend to offer an option that would automatically export as much specialized instances as possible using these list. e.g. export (+) exports a specialized function for all known Num instances at compile time (e.g. Int, Double, etc).

    The tool is also rather trusting. Since I can't really inspect the code for purity, I always trust that the programmer is honest. E.g. you don't pass a function that has side-effects to a function that expects a pure function. Be honest and mark the higher-ordered argument as being impure to avoid problems.

    I hope this helps, and I hope this wasn't too long.

    Update : There's somewhat of a big gotcha that I've recently discovered. We have to remember that the String type in .NET is immutable. So when the marshaller sends it to out Haskell code, the CWString we get there is a copy of the original. We have to free this. When GC is performed in C# it won't affect the the CWString, which is a copy.

    The problem however is that when we free it in the Haskell code we can't use freeCWString. The pointer was not allocated with C (msvcrt.dll)'s alloc. There are three ways (that I know of) to solve this.

    • use char* in your C# code instead of String when calling a Haskell function. You then have the pointer to free when you call returns, or initialize the function using fixed.
    • import CoTaskMemFree in Haskell and free the pointer in Haskell
    • use StringBuilder instead of String. I'm not entirely sure about this one, but the idea is that since StringBuilder is implemented as a native pointer, the Marshaller just passes this pointer to your Haskell code (which can also update it btw). When GC is performed after the call returns, the StringBuilder should be freed.

    0 讨论(0)
提交回复
热议问题