问题
I need to do a boolean function which returns true if a word is in the text of a HTML page and false if it's not.
I know that it's easy to do analysing all the page tree until finding the word with the lxml
library but I find inefficient to iterate through all the html blocks and find if the word is there.
Any suggestions for a faster algorithm (I need to do this search so many times)?
回答1:
As long as you're not worried about accidentally finding the word in an element attribute or something (and if you are worried about that, parsing the HTML with something like lxml is kind of your only option), you can just treat the entire HTML document as a big string and search for your word in it:
def checkForWord():
r = requests.get("http://example.com/somepage.html")
return "myWord" in r.text
回答2:
I'd get the entire page as a string:
var markup = document.documentElement.innerHTML;
And, then I'd use a method to search for the string in a string:
var n = markup.search("YourString");
You'll get a number for the index of the match or -1 if no match found.
来源:https://stackoverflow.com/questions/31881411/find-word-in-html-page-fast-algorithm