Implementing parser for markdown-like language

别等时光非礼了梦想. 提交于 2019-12-03 03:16:19

If one thing includes another, then normally you treat them as separate tokens and then nest them in the grammar. Lepl (http://www.acooke.org/lepl which I wrote) and PyParsing (which is probably the most popular pure-Python parser) both allow you to nest things recursively.

So in Lepl you could write code something like:

# these are tokens (defined as regexps)
stg_marker = Token(r'\*\*')
emp_marker = Token(r'\*') # tokens are longest match, so strong is preferred if possible
spo_marker = Token(r'%%')
....
# grammar rules combine tokens
contents = Delayed() # this will be defined later and lets us recurse
strong = stg_marker + contents + stg_marker
emphasis = emp_marker + contents + emp_marker
spoiler = spo_marker + contents + spo_marker
other_stuff = .....
contents += strong | emphasis | spoiler | other_stuff # this defines contents recursively

Then you can see, I hope, how contents will match nested use of strong, emphasis, etc.

There's much more than this to do for your final solution, and efficiency could be an issue in any pure-Python parser (There are some parsers that are implemented in C but callable from Python. These will be faster, but may be trickier to use; I can't recommend any because I haven't used them).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!