Resources for lexing, tokenising and parsing in python

后端 未结 8 2228
鱼传尺愫
鱼传尺愫 2020-12-04 06:41

Can people point me to resources on lexing, parsing and tokenising with Python?

I\'m doing a little hacking on an open source project (hotwire) and wanted to do a fe

相关标签:
8条回答
  • 2020-12-04 07:21

    Frederico Tomassetti had a good (but short) concise write-up to all things related from BNF to binary deciphering on:

    • lexical,
    • parser,
    • abstract-syntax tree (AST), and
    • Construct/code-generator.

    He even mentioned the new Parsing Expression Grammar (PEG).

    https://tomassetti.me/parsing-in-python/

    0 讨论(0)
  • 2020-12-04 07:22

    Here's a few things to get you started (roughly from simplest-to-most-complex, least-to-most-powerful):

    http://en.wikipedia.org/wiki/Recursive_descent_parser

    http://en.wikipedia.org/wiki/Top-down_parsing

    http://en.wikipedia.org/wiki/LL_parser

    http://effbot.org/zone/simple-top-down-parsing.htm

    http://en.wikipedia.org/wiki/Bottom-up_parsing

    http://en.wikipedia.org/wiki/LR_parser

    http://en.wikipedia.org/wiki/GLR_parser

    When I learned this stuff, it was in a semester-long 400-level university course. We did a number of assignments where we did parsing by hand; if you want to really understand what's going on under the hood, I'd recommend the same approach.

    This isn't the book I used, but it's pretty good: Principles of Compiler Design.

    Hopefully that's enough to get you started :)

    0 讨论(0)
  • 2020-12-04 07:24

    pygments is a source code syntax highlighter written in python. It has lexers and formatters, and may be interesting to peek at the source.

    0 讨论(0)
  • 2020-12-04 07:27

    Have a look at the standard module shlex and modify one copy of it to match the syntax you use for your shell, it is a good starting point

    If you want all the power of a complete solution for lexing/parsing, ANTLR can generate python too.

    0 讨论(0)
  • 2020-12-04 07:33

    I suggest http://www.canonware.com/Parsing/, since it is pure python and you don't need to learn a grammar, but it isn't widely used, and has comparatively little documentation. The heavyweight is ANTLR and PyParsing. ANTLR can generate java and C++ parsers too, and AST walkers but you will have to learn what amounts to a new language.

    0 讨论(0)
  • 2020-12-04 07:37

    This question is pretty old, but maybe my answer would help someone who wants to learn the basics. I find this resource to be very good. It is a simple interpreter written in python without the use of any external libraries. So this will help anyone who would like to understand the internal working of parsing, lexing, and tokenising:

    "A Simple Intepreter from Scratch in Python:" Part 1, Part 2, Part 3, and Part 4.

    0 讨论(0)
提交回复
热议问题