JSOUP - How to get list of disallowed tags found in html?

烈酒焚心 提交于 2019-12-23 01:12:20

问题


I use JSoup to secure rich text areas against harmful code. How do I get a list of all the disallowed tag/code found in the string passed to JSoup's parse, clean or isValid functions?

I use ColdFusion and can parse the text with JSoup like this:

var jsoupDocument = application.jsoup.parse( this.Description );

How do I get a list with JSoup 's getErrors() function to see which HTML does not comply to my whitelist.relaxed()?


回答1:


I don't believe there's a direct function in jsoup to get a list of the invalid elements based on your whitelist. You'd have to roll your own.

It's not overly complicated. You can still work from a Document object, select all of the elements and then individually check them against your whitelist with jsoup's isValid() function.

As an example, this could probably get you started...

<cfscript>

jsoup = createObject("java", "org.jsoup.Jsoup");
whitelist = createObject("java", "org.jsoup.safety.Whitelist").relaxed();
form.textarea = '<header>Hi</header><script>hello</script><nav><li>Links</li></nav></textarea>';

badTags = [];
content = jsoup.parse(form.textarea).body().select("*");
for(element in content) {
    // tagName() doesn't inlcude the brackets so add them in
    tag = chr(60) & element.tagName() & chr(62);
    if (!jsoup.isValid(tag, whitelist)) {
        arrayAppend(badTags, tag);
    }
}

writeDump(badTags);

</cfscript>


来源:https://stackoverflow.com/questions/30817745/jsoup-how-to-get-list-of-disallowed-tags-found-in-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!