Processing a HTML file using Python

前端 未结 5 484
半阙折子戏
半阙折子戏 2021-01-26 08:10

I wanted to remove all the tags in HTML file. For that I used re module of python. For example, consider the line

Hello World!

.I want to retain
5条回答
  •  一生所求
    2021-01-26 09:02

    make it non-greedy: http://docs.python.org/release/2.6/howto/regex.html#greedy-versus-non-greedy

    off-topic: the approach that uses regular expressions is error prone. it cannot handle cases when angle brackets do not represent tags. I recommend http://lxml.de/

提交回复
热议问题