I would highly recommend the F# language as your language of choice for parsing on the .NET Platform. It's roots in the ML family of languages means it has excellent support for language-oriented programming.
Discriminated unions and pattern-matching allow for a very succinct and powerful specification of your AST. Higher-order functions allow for definition of parse operations and their composition. First-class support for monadic types allows for state management to be handled implicitly greatly simplifying the composition of parsers. Powerful type-inference greatly aides the definition of these (complex) types. And all of this can be specified and executed interactively allowing you to rapidly prototype.
Stephan Tolksdorf has put this into practice with his parser combinator library FParsec
From his examples we see how naturally an AST is specified:
type expr =
| Val of string
| Int of int
| Float of float
| Decr of expr
type stmt =
| Assign of string * expr
| While of expr * stmt
| Seq of stmt list
| IfThen of expr * stmt
| IfThenElse of expr * stmt * stmt
| Print of expr
type prog = Prog of stmt list
the implementation of the parser (partially elided) is just as succinct:
let stmt, stmtRef = createParserForwardedToRef()
let stmtList = sepBy1 stmt (ch ';')
let assign =
pipe2 id (str ":=" >>. expr) (fun id e -> Assign(id, e))
let print = str "print" >>. expr |>> Print
let pwhile =
pipe2 (str "while" >>. expr) (str "do" >>. stmt) (fun e s -> While(e, s))
let seq =
str "begin" >>. stmtList .>> str "end" |>> Seq
let ifthen =
pipe3 (str "if" >>. expr) (str "then" >>. stmt) (opt (str "else" >>. stmt))
(fun e s1 optS2 ->
match optS2 with
| None -> IfThen(e, s1)
| Some s2 -> IfThenElse(e, s1, s2))
do stmtRef:= choice [ifthen; pwhile; seq; print; assign]
let prog =
ws >>. stmtList .>> eof |>> Prog
On the second line, as an example, stmt
and ch
are parsers and sepBy1
is a monadic parser combinator that takes two simple parsers and returns a combination parser. In this case sepBy1 p sep
returns a parser that parses one or more occurrences of p
separated by sep
. You can thus see how quickly a powerful parser can be combined from simple parsers. F#'s support for overridden operators also allow for concise infix notation e.g. the sequencing combinator and the choice combinator can be specified as >>.
and <|>
.
Best of luck,
Danny