Emulation of lex like functionality in Perl or Python

前端 未结 8 2034
梦毁少年i
梦毁少年i 2021-01-13 23:46

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

8条回答
  •  被撕碎了的回忆
    2021-01-14 00:15

    If your problem has anything at all to do with web scraping, I recommend looking at Web::Scraper , which provides easy element selection via XPath respectively CSS selectors. I have a (German) talk on Web::Scraper , but if you run it through babelfish or just look at the code samples, that can help you to get a quick overview of the syntax.

    Hand-parsing HTML is onerous and won't give you much over using one of the premade HTML parsers. If your HTML is of very limited variation, you can get by by using clever regular expressions, but if you're already breaking out hard-core parser tools, it sounds as if your HTML is far more regular than what is sane to parse with regular expressions.

提交回复
热议问题