Find word in HTML page fast algorithm

浪子不回头ぞ 提交于 2019-12-19 08:17:12

问题


I need to do a boolean function which returns true if a word is in the text of a HTML page and false if it's not.

I know that it's easy to do analysing all the page tree until finding the word with the lxml library but I find inefficient to iterate through all the html blocks and find if the word is there.

Any suggestions for a faster algorithm (I need to do this search so many times)?


回答1:


As long as you're not worried about accidentally finding the word in an element attribute or something (and if you are worried about that, parsing the HTML with something like lxml is kind of your only option), you can just treat the entire HTML document as a big string and search for your word in it:

def checkForWord():
    r = requests.get("http://example.com/somepage.html")
    return "myWord" in r.text



回答2:


I'd get the entire page as a string:

var markup = document.documentElement.innerHTML;

And, then I'd use a method to search for the string in a string:

var n = markup.search("YourString");

You'll get a number for the index of the match or -1 if no match found.



来源:https://stackoverflow.com/questions/31881411/find-word-in-html-page-fast-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!