GADTs for a DSL: swings and roundabouts?

跟風遠走 提交于 2019-12-22 05:53:06

问题


The typical examples for the benefits of a GADT are representing the syntax for a DSL; say here on the wiki or the PLDI 2005 paper.

I can see that if you have a AST that's type-correct by construction, writing an eval function is easy.

How to build the GADT handling into a REPL? Or more specifically into a Read-Parse-Typecheck-Eval-Print-Loop? I'm seeing that you just push the complexity from the eval step into earlier steps.

Does GHCi use a GADT internally to represent expressions it's evaluating? (The expressions are a lot chunkier than a typical DSL.)

  • For one thing, you can't derive Show for a GADT, so for the Print step you either hand-roll Show instances or something like this:

    {-# LANGUAGE  GADTs, StandaloneDeriving  #-}
    
    data Term a where
      Lit :: Int -> Term Int
      Inc :: Term Int -> Term Int
      IsZ :: Term Int -> Term Bool
      If :: Term Bool -> Term a -> Term a -> Term a
      Pair :: (Show a, Show b) => Term a -> Term b -> Term (a,b)
      Fst :: (Show b) => Term (a,b) -> Term a
      Snd :: (Show a) => Term (a,b) -> Term b
    
    deriving instance (Show a) => Show (Term a)
    

    (It seems to me those Show constraints tangled in the constructors are already failing to separate concerns.)

I'm more thinking about the user-experience for somebody entering DSL expressions, rather than the programmer's convenience of the eval function. Either:

  • The user enters expressions directly using the GADT constructors. It's easy to make a syntactically correct but ill-typed mistake (say a mis-placed parens). Then GHCi gives rather unfriendly rejection messages. Or
  • The REPL takes the input as text and parses it. But for a GADT like that, getting a Read instance is real hard work. So perhaps
  • The application has two data structures: one being type-mistake-tolerant; the other being the GADT; and the validate step constructs the GADT AST, if it can do so type-safely.

At that last bullet, I seem to be back with 'smart constructors', that GADTs are supposed to improve on(?) What's more I've doubled the work somewhere.

I don't have a 'better way' to approach it. I'm wondering how to approach DSL applications in practice. (For context: I'm thinking about a database query environment, where type inference has to look at the types of the fields in the database to validate what operations on them.)

Addit: after working through the answer from @Alec

I see the code for pretty printing in glambda involves several layers of classes and instances. Something feels wrong here as opposed to what are the claimed advantages of a GADT for an AST. The idea of a (well-typed) AST is you can equally easily: eval it; or pretty-print it; or optimise it; or code-generate from it; etc.

glambda's seems to be aimed at eval'ing (which is fair enough given the purpose of the exercise). I'm wondering ...

  • Why the need to express the whole syntax for the (E)DSL in one datatype? (The wikibook example starts its straw man doing that data Expr = ...; and rapidly runs into type trouble. Well of course it does; that's never going to work; almost anything would work better than that; I feel cheated.)

  • If we end up writing classes and instances anyway, why not make each syntax production a separate datatype: data Lit = Lit Int ... data If b a1 a2 = If b a1 a2 ... Then a class IsTerm a c | a -> c where ... (i.e. a FunDep or maybe a Type Family whose instances tell us the Term's result-type.)

  • Now the EDSL uses the same constructors (the user doesn't care they're from different datatypes); and they apply 'sloppy' type-checking. Pretty printing/error reporting also doesn't need tight typechecking. Eval does, and insists on the IsTerm instances all lining up.

I didn't suggest this approach before, because it seemed to involve too much crufty code. But actually it's no worse than glambda -- that is, when you consider the whole functionality, not just the eval step.

It seems to me a big advantage to express the syntax only once. Furthermore it seems more extensible: add a new datatype per syntax production, rather than breaking open an existing datatype. Oh, and because they're H98 datatypes (no existentials), deriving works fine.


回答1:


Note that GHCi does not use GADTs to represent expressions. Even GHC's internal core expression type Expr is not a GADT.

DSLs

For the purpose of having a larger more fleshed out example of your Term type, consider glambda. Its Exp type even tracks variables at the type level.

  • There is a second UExp data type which, as you observed yourself, is what gets actually parsed from the REPL. This type then gets typechecked into Exp and passed on to a continuation with:

    check :: (MonadError Doc m, MonadReader Globals m)
          => UExp -> (forall t. STy t -> Exp '[] t -> m r)
          -> m r
    
  • Pretty-printing of UExp and Exp is hand-written, but at least uses the same code (it does this via a PrettyExp class).

  • The evaluation code itself is beautiful, but I doubt I need to sell you on that. :)

EDSLs

As I understand it, GADTs are splendid for EDSLs (embedded DSLs), since these are just portions of code in a large Haskell program. Yes, type errors can be complicated (and will come from GHC directly), but that's the price you pay for being able to maintain type-level invariants in your code. Consider, for instance, hoopl's representation of basic blocks in a CFG:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

Sure, you open yourself up to nasty type errors, but you also have the ability to track fallthrough information at the type-level. This makes it much easier to think about dataflow problems.

So what...?

The point I'm trying to make is: if your GADT is being constructed from a String (or a custom REPL), you'll have a rough time performing the translation. That's unavoidable because what you are doing is essentially re-implementing a simple type-checker. Your best bet is to confront this head on (as glambda does) and distinguish the parsing from the type-checking.

However, if you can afford to stay within the bounds of Haskell code, you can just hand parsing and typechecking to GHC. IMHO, EDSLs are way cooler and more practical that non-embedded DSLs.



来源:https://stackoverflow.com/questions/54919548/gadts-for-a-dsl-swings-and-roundabouts

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!