regex for triple quote

后端 未结 5 659
我在风中等你
我在风中等你 2020-12-19 06:32

What regex will find the triple quote comments (possibly multi-line) in a Python source code?

相关标签:
5条回答
  • 2020-12-19 07:01

    I find this to be working perfectly for me (used it with TextMate):

    "{3}([\s\S]*?"{3})
    

    I wanted to remove all comments from a library and this took care of the triple-quote comments (single or multi-line, regardless of where they started on the line).

    For hash comments (much easier), this works:

    #.*$
    

    I used these with TextMate, which uses the Oniguruma regular expression library by K. Kosako (http://manual.macromates.com/en/regular_expressions)

    0 讨论(0)
  • 2020-12-19 07:01

    I've found this one from Tim Peters (I think) :

    pat = """
        qqq
        [^\\q]*
        (
        (   \\\\[\000-\377]
            |   q
            (   \\\\[\000-\377]
            |   [^\\q]
            |   q
            (   \\\\[\000-\377]
                |   [^\\q]
            )
            )
        )
        [^\\q]*
        )*
        qqq
    """  
    pat = ''.join(pat.split(), '')  
    tripleQuotePat = pat.replace("q", "'") + "|" + pat.replace('q', '"')  
    

    But, as stated by bobince, regex alone doesn't seem to be the right tool for parsing Python code.
    So I went with tokenize from the standard library.

    0 讨论(0)
  • 2020-12-19 07:03

    Python is not a regular language and cannot reliably be parsed using regex.

    If you want a proper Python parser, look at the ast module. You may be looking for get_docstring.

    0 讨论(0)
  • 2020-12-19 07:09
    re.findall('(?:\n[\t ]*)\"{3}(.*?)\"{3}', s, re.M | re.S)
    

    captures only text within triple quotes that are at the begging of a line and could be preceded by spaces, tabs or nothing, as python docstrings should be.

    0 讨论(0)
  • 2020-12-19 07:13

    I don't know how well this will fair when scanning Python code, but this seems to match Python strings in isolation.

    ^(\"([^\"\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"|'([^'\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*'|\"\"\"((?!\"\"\")[^\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"\"\")$
    

    The escaping is not standard Python; this is something that I cut-n-pasted from a project. See it in action at regex101.com.

    0 讨论(0)
提交回复
热议问题