I\'m looking for a good parser generator that I can use to read a custom text-file format in our large commercial app. Currently this particular file format is read with a handm
Please have a look at the new C++ target I have posted for ANTLR. It also has the option to restrict the memory usage of the parser, and it exposes all the necessary memory management routines in the form of traits.
http://www.antlr.org/wiki/pages/viewpage.action?pageId=29130826
Then why don't you use flex/yacc? It generates C code,can be run from MSVC, was developped with efficiency in mind, can have malloc overriden (google for yymalloc), they are themselves GPL, but the resulting code (the code you use in your project) AFAIK not.
Or use a hand-made parser.
ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.
It does generate C, and furthermore, it works with Visual Studio and C++. I know this because I've done it before and submitted a patch to get it to work with stdcall.
Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.
The antlr3c runtime, last time I checked does not have a memory leak, and uses the Memory pool paradigm which you describe. However, it does have one shortcoming in the API which the author refuses to change, which is that if you request the string of a node, it will create a new copy each time until you free the entire parser.
I have no comment on the ease of using a custom malloc, but it does have a macro to define what malloc function to use in the entire project.
As for the executable size, my compilation was about 100 kb in size including a small interpreter.
My suggestion to you is to keep learning ANTLR, because it still fits your requirements and you probably need to sacrifice a little more time before it will start working for you.
A hand-coded recursive descent parser is actually quite fast and can be very compact. The only downside is you have to be careful to code essentially LL(1) grammars. [If you use ANTLR, you have similar restrictions so this isn't that big a deal].
You can hand code such parsers as plain recursive C code. (See this answer for complete details: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?)
If you are really tight on space, you can define a parsing virtual machine, and build a tiny C interpreter to run it. I used to build BASIC interpreters this way back in the early 70s.
By sticking to the very simple conventions that make these parsers actually work, you can guarantee that there is no memory leak caused by the parsing machinery. (Of course, you may attach arbitrary actions to the parser where it recognizes items of interest; whether those actions leak is a matter of general programming, not the parser).
The ideas came from a 1964 paper on metacompilers by Val Schorre, who shows how to build complete compilers in 10 pages. Shorre's tiny parser generator produces pretty good recursive descent parsers. A site describing this paper and showing precisely how to build such parsers can be found at http://www.bayfronttechnologies.com/metaii.html
I used Schorre's methods to build Basic compilers in the late 70s, after I got tired of hand-coding complex grammars.
We use Boost Spirit successfully in our application. The Boost license is a very liberal one, so there is no problem using it in commercial applications.
Quote from the documentation:
Spirit is an object-oriented recursive-descent parser generator framework implemented using template meta-programming techniques. Expression templates allow us to approximate the syntax of Extended Backus-Normal Form (EBNF) completely in C++. The Spirit framework enables a target grammar to be written exclusively in C++. Inline EBNF grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, are immediately executable. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code.
ANTLR parsers, and in fact any parser built with something LALR or the like, tend to be big. Do you have an actual grammar for this? It looks like it might be most readily parsed with a hand-written recursive-descent parser, but it's not much of a sample.
Oops, my mistake, as ANTLR apparently generates recursive-descent. Still, I've had problems with ANTLR generating big parsers.