I would like to represent a \"tree\" of the following shape in Haskell:
/\\
/\\/\\
/\\/\\/\\
/\\/\\/\\/\\
` ` ` ` `
It is certainly possible to construct a tree with shared nodes. For example, we could just define:
data Tree a = Leaf a | Node (Tree a) (Tree a)
and then carefully construct a value of this type as in
tree :: Tree Int
tree = Node t1 t2
where
t1 = Node t3 t4
t2 = Node t4 t5
t3 = Leaf 2
t4 = Leaf 3
t5 = Leaf 5
to achieve sharing of subtrees (in this case t4
).
However, as this form of sharing is not observable in Haskell, it is very hard to maintain: for example if you traverse a tree to relabel its leaves
relabel :: (a -> b) -> Tree a -> Tree b
relabel f (Leaf x) = Leaf (f x)
relabel f (Node l r) = Node (relabel f l) (relabel f r)
you loose sharing. Also, when doing a bottom-up computation such as
sum :: Num a => Tree a -> a
sum (Leaf n) = n
sum (Node l r) = sum l + sum r
you end up not taking advantage of sharing and possibly duplicate work.
To overcome these problems, you can make sharing explicit (and hence observable) by encoding your trees in a graph-like manner:
type Ptr = Int
data Tree' a = Leaf a | Node Ptr Ptr
data Tree a = Tree {root :: Ptr, env :: Map Ptr (Tree' a)}
The tree from the example above can now be written as
tree :: Tree Int
tree = Tree {root = 0, env = fromList ts}
where
ts = [(0, Node 1 2), (1, Node 3 4), (2, Node 4 5),
(3, Leaf 2), (4, Leaf 3), (5, Leaf 5)]
The price to pay is that functions that traverse these structures are somewhat cumbersome to write, but we can now define for example a relabeling function that preserves sharing
relabel :: (a -> b) -> Tree a -> Tree b
relabel f (Tree root env) = Tree root (fmap g env)
where
g (Leaf x) = Leaf (f x)
g (Node l r) = Node l r
and a sum
function that doesn't duplicate work when the tree has shared nodes:
sum :: Num a => Tree a -> a
sum (Tree root env) = fromJust (lookup root env')
where
env' = fmap f env
f (Leaf n) = n
f (Node l r) = fromJust (lookup l env') + fromJust (lookup r env')