I\'m having a hard time implementing Read for a tree structure. I want to take a left-associative string (with parens) like ABC(DE)F
and convert it into a tree.
This is a situation where using a parsing library makes the code amazingly short and extremely expressive. (I was amazed that it was so neat when I was experimenting to answer this!)
I'm going to use Parsec (that article provides some links for more information), and using it in "applicative mode" (rather than monadic), since we don't need the extra power/foot-shooting-ability of monads.
First the various imports and definitions:
import Text.Parsec
import Control.Applicative ((<*), (<$>))
data Tree = Branch Tree Tree | Leaf Char deriving (Eq, Show)
paren, tree, unit :: Parsec String st Tree
Now, the basic unit of the tree is either a single character (that's not a parenthesis) or a parenthesised tree. The parenthesised tree is just a normal tree between (
and )
. And a normal tree is just units put into branches left-associatedly (it's extremely self-recursive). In Haskell with Parsec:
-- parenthesised tree or `Leaf `
unit = paren <|> (Leaf <$> noneOf "()") > "group or literal"
-- normal tree between ( and )
paren = between (char '(') (char ')') tree
-- all the units connected up left-associatedly
tree = foldl1 Branch <$> many1 unit
-- attempt to parse the whole input (don't short-circuit on the first error)
onlyTree = tree <* eof
(Yes, that's the entire parser!)
If we wanted to, we could do without paren
and unit
but the code above is very expressive, so we can leave it as is.
As a brief explanation (I've provided links to the documentation):
many1
, rather than many
which parses zero or more);We can use the parse function to run the parser (it returns Either ParseError Tree
, Left
is an error and Right
is a correct parse).
read
Using it as a read
like function could be something like:
read' str = case parse onlyTree "" str of
Right tr -> tr
Left er -> error (show er)
(I've used read'
to avoid conflicting with Prelude.read
; if you want a Read
instance you'll have to do a bit more work to implement readPrec
(or whatever is required) but it shouldn't be too hard with the actual parsing already complete.)
Some basic examples:
*Tree> read' "A"
Leaf 'A'
*Tree> read' "AB"
Branch (Leaf 'A') (Leaf 'B')
*Tree> read' "ABC"
Branch (Branch (Leaf 'A') (Leaf 'B')) (Leaf 'C')
*Tree> read' "A(BC)"
Branch (Leaf 'A') (Branch (Leaf 'B') (Leaf 'C'))
*Tree> read' "ABC(DE)F" == example
True
*Tree> read' "ABC(DEF)" == example
False
*Tree> read' "ABCDEF" == example
False
Demonstrating errors:
*Tree> read' ""
***Exception: (line 1, column 1):
unexpected end of input
expecting group or literal
*Tree> read' "A(B"
***Exception: (line 1, column 4):
unexpected end of input
expecting group or literal or ")"
And finally, the difference between tree
and onlyTree
:
*Tree> parse tree "" "AB)CD" -- success: ignores ")CD"
Right (Branch (Leaf 'A') (Leaf 'B'))
*Tree> parse onlyTree "" "AB)CD" -- fail: can't parse the ")"
Left (line 1, column 3):
unexpected ')'
expecting group or literal or end of input
Parsec is amazing! This answer might be long but the core of it is just 5 or 6 lines of code which do all the work.