Can people point me to resources on lexing, parsing and tokenising with Python?
I\'m doing a little hacking on an open source project (hotwire) and wanted to do a fe
Frederico Tomassetti had a good (but short) concise write-up to all things related from BNF to binary deciphering on:
He even mentioned the new Parsing Expression Grammar (PEG).
https://tomassetti.me/parsing-in-python/
Here's a few things to get you started (roughly from simplest-to-most-complex, least-to-most-powerful):
http://en.wikipedia.org/wiki/Recursive_descent_parser
http://en.wikipedia.org/wiki/Top-down_parsing
http://en.wikipedia.org/wiki/LL_parser
http://effbot.org/zone/simple-top-down-parsing.htm
http://en.wikipedia.org/wiki/Bottom-up_parsing
http://en.wikipedia.org/wiki/LR_parser
http://en.wikipedia.org/wiki/GLR_parser
When I learned this stuff, it was in a semester-long 400-level university course. We did a number of assignments where we did parsing by hand; if you want to really understand what's going on under the hood, I'd recommend the same approach.
This isn't the book I used, but it's pretty good: Principles of Compiler Design.
Hopefully that's enough to get you started :)
pygments is a source code syntax highlighter written in python. It has lexers and formatters, and may be interesting to peek at the source.
Have a look at the standard module shlex and modify one copy of it to match the syntax you use for your shell, it is a good starting point
If you want all the power of a complete solution for lexing/parsing, ANTLR can generate python too.
I suggest http://www.canonware.com/Parsing/, since it is pure python and you don't need to learn a grammar, but it isn't widely used, and has comparatively little documentation. The heavyweight is ANTLR and PyParsing. ANTLR can generate java and C++ parsers too, and AST walkers but you will have to learn what amounts to a new language.
This question is pretty old, but maybe my answer would help someone who wants to learn the basics. I find this resource to be very good. It is a simple interpreter written in python without the use of any external libraries. So this will help anyone who would like to understand the internal working of parsing, lexing, and tokenising:
"A Simple Intepreter from Scratch in Python:" Part 1, Part 2, Part 3, and Part 4.