Most effective way to parse C-like definition strings?

前端 未结 5 1190
滥情空心
滥情空心 2021-01-13 08:24

I\'ve got a set of function definitions written in a C-like language with some additional keywords that can be put before some arguments(the same way as \"unsigned\" or \"re

相关标签:
5条回答
  • 2021-01-13 08:48

    actually, it depends how complex is your language and whether it's really close to C or not...

    Still, you could use lex as a first step even for regular expression ....

    I would go for lex + menhir and o'caml....

    but any flex/yacc combination would be fine..

    The main problem with regular bison (the gnu implementation of yacc) stems from the C typing.. you have to describe your whole tree (and all the manipulation functions)... Using o'caml would be really easier ...

    0 讨论(0)
  • 2021-01-13 08:54

    There is also the Lemon Parser, which features a less restrictive grammar. The down side is you're married to lemon, re-writing a parser's grammar to something else when you discover some limitation sucks. The up side is its really easy to use .. and self contained. You can drop it in tree and not worry about checking for the presence of others.

    SQLite3 uses it, as do several other popular projects. I'm not saying use it because SQLite does, but perhaps give it a try if time permits.

    0 讨论(0)
  • 2021-01-13 08:54

    That entirely depends on your definition of "effective". If you have all the time of the world, the fastest parser would be a hand-written pull parser. They take a long time to debug and develop but today, no parser generator beats hand-written code in terms of runtime performance.

    If you want something that can parse valid C within a week or so, use a parser generator. The code will be fast enough and most parser generators come with a grammar for C already which you can use as a starting point (avoiding 90% of the common mistakes).

    Note that regexps are not suitable for parsing recursive structures. This approach would both be slower than using a generator and more error prone than a hand-written pull parser.

    0 讨论(0)
  • 2021-01-13 08:59

    ANTLR is commonly used (as are Lex\Yacc).

    ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

    0 讨论(0)
  • 2021-01-13 08:59

    For what you want to do, our DMS Software Reengineering Toolkit is likely a very effective solution.

    DMS is designed specifically to support customer analyzers/code generators of the type you are discussing. It provides very strong facilities for defining arbitrary language parsers/analyzers (tested on 30+ real languages including several complete dialects of C, C++, Java, C#, and COBOL).

    DMS automates the construction of ASTs (so you don't have to do anything but get the grammar right to have a usable AST), enables the construction of custom analyses of exactly the pattern-directed inspection you indicated, can construct new C-specific ASTs representing the code you want to generate, and spit them out as compilable C source text. The pre-existing definitions of C for DMS can likely be bent to cover your C-like language.

    0 讨论(0)
提交回复
热议问题