Converting an untyped AST for a simple typed language into a GADT

问题

I have an ADT representing the AST for a simple language:

data UTerm = UTrue
      | UFalse
      | UIf UTerm UTerm UTerm
      | UZero
      | USucc UTerm
      | UIsZero UTerm

This data structure can represent invalid terms that don't follow the type rules of the language, like UIsZero UFalse, so I'd like to use a GADT that enforces well-typedness:

{-# LANGUAGE GADTs #-}

data TTerm a where
  TTrue :: TTerm Bool
  TFalse :: TTerm Bool
  TIf :: TTerm Bool -> TTerm a -> TTerm a -> TTerm a
  TZero :: TTerm Int
  TSucc :: TTerm Int -> TTerm Int
  TIsZero :: TTerm Int -> TTerm Bool

My problem is to type check a UTerm and convert it into a TTerm. My first thought was UTerm -> Maybe (TTerm a), but this of course doesn't work because it's not valid for all as. I don't even know what the type would be, because we don't know if a is going to be Int or Bool. Then I thought I could write a different type checking function for each of the possible values of a:

import Control.Applicative

typecheckbool :: UTerm -> Maybe (TTerm Bool)
typecheckbool UTrue = Just TTrue
typecheckbool UFalse = Just TFalse
typecheckbool (UIsZero a) = TIsZero <$> typecheckint a
typecheckbool _ = Nothing

typecheckint :: UTerm -> Maybe (TTerm Int)
typecheckint UZero = Just TZero
typecheckint (USucc a) = TSucc <$> typecheckint a
typecheckint (UIf a b c) = TIf <$> typecheckbool a <*> typecheckint b <*> typecheckint c
typecheckint UTrue = Nothing
typecheckint UFalse = Nothing
typecheckint (UIsZero _) = Nothing

This works for some cases, for a subset of the language where TIf requires its consequent and alternative are Ints (But TIf TTrue TFalse TTrue is actually totally valid), and where we know the target type of the expression we're typing.

What's the right way to convert from a UTerm to a TTerm?

回答1:

The standard technique is to define an existential type:

data ETerm_ where
    ETerm_ :: TTerm a -> ETerm

In this case, you may also want some term-level evidence of which type you have; e.g.

data Type a where
    TInt :: Type Int
    TBool :: Type Bool

then the real ETerm would look like this:

data ETerm where
    ETerm :: Type a -> TTerm a -> ETerm

The interesting case of type checking is then something like

typeCheck (UIf ucond ut uf) = do
    ETerm TBool tcond <- typeCheck ucond
    ETerm tyt tt <- typeCheck ut
    ETerm tyf tf <- typeCheck uf
    case (tyt, tyf) of
        (TBool, TBool) -> return (ETerm TBool (TIf tcond tt tf))
        (TInt , TInt ) -> return (ETerm TInt  (TIf tcond tt tf))
        _ -> fail "branches have different types"

回答2:

As a minor complement of @DanielWagner's answer, you might want to factorize type equality checks such as

...
case (tyt, tyf) of
        (TBool, TBool) -> return (ETerm TBool (TIf tcond tt tf))
        (TInt , TInt ) -> return (ETerm TInt  (TIf tcond tt tf))
        _ -> fail "branches have different types"

One way to do that is using equality witnesses:

import Data.Type.Equality

typeEq :: Type a -> Type b -> Maybe (a :~: b)
typeEq TInt  TInt  = Just Refl
typeEq TBool TBool = Just Refl
typeEq _     _     = Nothing

typeCheck :: UTerm -> Maybe ETerm
typeCheck (UIf ucond ut uf) = do
    ETerm TBool tcond <- typeCheck ucond
    ETerm tyt tt <- typeCheck ut
    ETerm tyf tf <- typeCheck uf
    case typeEq tyt tyf of
        Just Refl -> return (ETerm tyt (TIf tcond tt tf))
        _         -> fail "branches have different types"

This factorization is convenient if you need to check type equality in multiple parts of your type checking routine. It also allows for extending the language with pair types like (t1,t2), which require a structural recursive approach to check type equality.

One might even write a full decider for type equality

{-# LANGUAGE EmptyCase #-}
typeEq2 :: Type a -> Type b -> Either (a :~: b) ((a :~:b) -> Void)
typeEq2 TInt  TInt  = Left Refl
typeEq2 TInt  TBool = Right (\eq -> case eq of)
typeEq2 TBool TBool = Left Refl
typeEq2 TBool TInt  = Right (\eq -> case eq of)

but this, I guess, will probably not be needed unless you are trying to model quite advanced types (e.g. GADTs).

The code above uses and empty case to examine all the possible values eq can have. Since it has type e.g. Int :~: Bool, and there are no constructors matching with that type, we are left with no possible values for eq and so no case branch is needed. This will not trigger an exhaustiveness warning, since there are indeed no cases left unhandled (OT: I wish these warnings to be actual errors).

Instead of using EmptyCase you can also use something like case eq of _ -> undefined, but using bottoms in proof terms like the above one is fishy.

来源：https://stackoverflow.com/questions/30564967/converting-an-untyped-ast-for-a-simple-typed-language-into-a-gadt

标签

haskell

gadt