Issue with Regular expressions in python

前端 未结 6 912
轮回少年
轮回少年 2021-01-21 20:59

Ok, so i\'m working on a regular expression to search out all the header information in a site.

I\'ve compiled the regular expression:

regex = re.compile         


        
6条回答
  •  伪装坚强ぢ
    2021-01-21 21:39

    Parsing things with regular expressions works for regular languages. HTML is not a regular language, and the stuff you find on web pages these days is absolute crap. BeautifulSoup deals with tag-soup HTML with browser-like heuristics so you get parsed HTML that resembles what a browser would display.

    The downside is it's not very fast. There's lxml for parsing well-formed html, but you should really use BeautifulSoup if you're not 100% certain that your input will always be well-formed.

提交回复
热议问题