print start of html tags

后端 未结 2 2014
执笔经年
执笔经年 2021-01-28 21:30

I want to print out the first html tags thats has attributes

    

test

test2

2条回答
  •  说谎
    说谎 (楼主)
    2021-01-28 21:42

    This seems pretty complicated, you can try with this expression, but it would fail in some cases. It would first collect the undesired instances, then at the end there is a capturing group for those desired.

    Maybe, it wouldn't be the best idea to use regular expressions here.

    Test

    import re
    
    regex = r"^\s*<\S+>\s*$|^\s*<\S+\s.*test.*?>.*?<\/\S+>$|^\s*(<.*>)\s*$"
    
    test_str = """
    
    

    test

    test2

    test3

    test3

    """ print(re.findall(regex, test_str, re.M))

    Output

    ['', '', '
    ', '', '', '', '']

    The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

提交回复
热议问题