Python re.sub multiline on string

前端 未结 1 1827
终归单人心
终归单人心 2020-12-19 01:09

I try to use the flag re.MULTILINE.

I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE), Python re.sub MULTILINE caret match but i

相关标签:
1条回答
  • 2020-12-19 01:35

    You need to replace re.MULTILINE with re.DOTALL/re.S and move out period outside the character class as inside it, the dot matches a literal ..

    Note that re.MULTILINE only redefines the behavior of ^ and $ that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL flag redefines the behavior of . inside the pattern outside the character class only. It starts matching a newline symbol, too.

    So, the regex you could use for the current example: /\*.*?\*/. It matches a literal /* with /\*, then .*? matches as few any symbols as possible up to and including */ (matched with \*/).

    See the code demo:

    txt = """\n\
    <?php\n\
    /* Multi-line\n\
    comment */\n\
    $var = 1;\n"""
    new_txt = re.sub(r'/\*.*?\*/', '', txt, flags=re.S)
    print("\n=========== TXT ============")
    print(txt)
    print("\n=========== NEW TXT ============")
    print(new_txt)
    

    See IDEONE demo

    However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:

    /\*[^*]*(?:\*(?!/)[^*]*)*\*/
    

    See the regex demo

    0 讨论(0)
提交回复
热议问题