Regex matching between two strings?

余生颓废 提交于 2019-11-26 03:59:41

问题


I can\'t seem to find a way to extract all comments like in following example.

>>> import re
>>> string = \'\'\'
... <!-- one 
... -->
... <!-- two -- -- -->
... <!-- three -->
... \'\'\'
>>> m = re.findall ( \'<!--([^\\(-->)]+)-->\', string, re.MULTILINE)
>>> m
[\' one \\n\', \' three \']

block with two -- -- is not matched most likely because of bad regex. Can someone please point me in right direction how to extract matches between two strings.


Hi I\'ve tested what you guys suggested in comments.... here is working solution with little upgrade.

>>> m = re.findall ( \'<!--(.*?)-->\', string, re.MULTILINE)
>>> m
[\' two -- -- \', \' three \']
>>> m = re.findall ( \'<!--(.*\\n?)-->\', string, re.MULTILINE)
>>> m
[\' one \\n\', \' two -- -- \', \' three \']

thanks!


回答1:


this should do the trick

 m = re.findall ( '<!--(.*?)-->', string, re.DOTALL)



回答2:


In general, it is impossible to do arbitrary matching between two delimiters with a regular grammar.

Specifcally, if you allow nesting,

<!-- how do you deal <!-- with nested --> comments? -->

you'll run in to issues. So, while you may be able to solve this specific problem with a regular expression, any regular expression that you write will be able to be broken by some other strange nesting of comments.

To parse arbitrary comments, you'll need to move on to a method of parsing context free grammars. A simple method to do so is to use a pushdown automaton.



来源:https://stackoverflow.com/questions/12736074/regex-matching-between-two-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!