I need to write a compiler. It\'s homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way
I've used SableCC in my compiler course, though not by choice.
I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).
I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.
Regex is good to use in a compiler, but only for recognizing tokens (i.e. no recursive structures).
The classic way of writing a compiler is having a lexical analyzer for recognizing tokens, a syntax analyzer for recognizing structure, a semantic analyzer for recognizing meaning, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, or skipped entirely, if makes the compiler easier to write.
There have been many tools developed to help with this process. For Java, you can look at
I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.
What is better is that Terence Parr's book on the same is by far one of the better books oriented towards writing compilers with a parser generator.
Then you have ANTLRWorks which enables you to study and debug your grammar on the fly.
To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.
I'd recommend using either a metacompiler like ANTLR, or a simple parser combinator library. Functional Java has a parser combinator API. There's also JParsec. Both of these are based on the Parsec library for Haskell.
Use a parser combinator, like JParsec. There's a good video tutorial on how to use it.
Parser combinators is a good choice. Popular Java implementation is JParsec.