问题
Consider the simple definition of a length-indexed vector:
data Nat = Z | S Nat
infixr 5 :>
data Vec (n :: Nat) a where
V0 :: Vec 'Z a
(:>) :: a -> Vec n a -> Vec ('S n) a
Naturally I would at some point need the following function:
vec2list :: Vec n a -> [a]
However, this function is really just a fancy identity. I believe that the runtime representations of these two types are the same, so
vec2list :: Vec n a -> [a]
vec2list = unsafeCoerce
should work. Alas, it does not:
>vec2list ('a' :> 'b' :> 'c' :> V0)
""
Every input simply returns the empty list. So I assume my understand is lacking. To test it, I define the following:
data List a = Nil | Cons a (List a) deriving (Show)
vec2list' :: Vec n a -> List a
vec2list' = unsafeCoerce
test1 = vec2list' ('a' :> 'b' :> 'c' :> V0)
data SomeVec a = forall n . SomeVec (Vec n a)
list'2vec :: List a -> SomeVec a
list'2vec x = SomeVec (unsafeCoerce x)
Surprisingly this works! It certainly isn't an issue with the GADT then (my initial thought).
I think that the List
type is really identical at runtime to []
. I try to test this too:
list2list :: [a] -> List a
list2list = unsafeCoerce
test2 = list2list "abc"
and it works! Based on this fact, I have to conclude that [a]
and List a
must have the same runtime representation. And yet, the following
list2list' :: List a -> [a]
list2list' = unsafeCoerce
test3 = list2list' (Cons 'a' (Cons 'b' (Cons 'c' Nil)))
does not work. list2list'
again always returns the empty list. I believe that "having identical runtime representations" must be a symmetric relation, so this doesn't seem to make sense.
I began to think maybe there's something funny with "primitive" types - but I always believed that []
is only special syntactically, not semantically. It seems that's the case:
data Pair a b = Pair a b deriving (Show, Eq, Ord)
tup2pair :: (a,b) -> Pair a b
tup2pair = unsafeCoerce
pair2tup :: Pair a b -> (a,b)
pair2tup = unsafeCoerce
The first function works and the second does not - same as the in the case of List
and []
. Although in this case, pair2tup
segfaults as opposed to always returning the empty list.
It seems to be consistently asymmetric with respect to types which use "built-in" syntax. Back to the Vec
example, the following
list2vec :: [a] -> SomeVec a
list2vec x = SomeVec (unsafeCoerce x)
works just fine as well! The GADT really isn't special.
The question is: how do the runtime representations of types which use "built-in" syntax differ from those that do not?
Alternatively, how does one write a zero-cost coercion from Vec n a
to [a]
? This doesn't answer the question but solves the problem.
Testing was done with GHC 7.10.3.
As noted by a commenter, this behaviour is only present when interpreting. When compiled, all functions work as expected. The question still applies, just to runtime representation when interpreting.
回答1:
Now to answer your main question, this thread appears to have the answer: start ghci with -fobject-code
:
$ ghci /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( /tmp/vec.hs, interpreted )
Ok, modules loaded: Main.
*Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
""
With -fobject-code
:
$ ghci -fobject-code /tmp/vec.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( /tmp/vec.hs, /tmp/vec.o )
Ok, modules loaded: Main.
Prelude Main> print $ vec2list ('a' :> 'b' :> 'c' :> V0)
"abc"
The modules that contain []
and (,)
are all compiled, which causes their runtime representation to be different from isomorphic datatypes in interpreted modules. According to Simon Marlow on the thread I linked, interpreted modules add annotations for the debugger. I think this also explains why tup2pair
works and pair2tup
doesn't: missing annotations isn't a problem for interpreted modules, but the compiled modules choke on the extra annotations.
-fobject-code
has some downsides: longer compilation time, only brings exported functions in scope, but it has the additional advantage that running the code is much faster.
回答2:
To answer only your alternative question, you could create a newtype
with a non-exported constructor to give a list a type-level length and a zero-cost coercion to lists:
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
module Vec (Nat(..), Vec, v0, (>:>), vec2list) where
data Nat = Z | S Nat
newtype Vec (n :: Nat) a = Vec { unVec :: [a] }
v0 :: Vec Z a
v0 = Vec []
infixr 5 >:>
(>:>) :: a -> Vec n a -> Vec ('S n) a
a >:> (Vec as) = Vec (a : as)
vec2list :: Vec n a -> [a]
vec2list (Vec as) = as
As long as the Vec
constructor is not in scope (so only v0
and >:>
can be used to construct vectors) the invariant that the type-level number represents the length can't be violated.
(This approach definitely has my preference over unsafeCoerce
, as anything with unsafeCoerce
could break with every update of GHC or on different platforms.)
来源:https://stackoverflow.com/questions/36054141/special-runtime-representation-of-type