How to make a Jsoup whitelist to accept certain attribute content

China☆狼群 提交于 2019-11-27 18:55:07

问题


I'm using Jsoup with relaxed whitelist. It seems perfect but I would like to keep the embedded images tags like <img alt="" src="data:;base64.

Is there a way to modify the whitelist to accept also those img?

Edit:

If I use Whitelist.relaxed().addProtocols("img","src","data") then those img tags are not removed. But it accepts anything after "data:" and I would like just to keep them if src content starts with "data:;base64". Is it possible with jsoup?


回答1:


You can extend Whitelist and override isSafeAttribute to perform custom checks. As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list:

public class RelaxedPlusDataBase64Images extends Whitelist {
    public RelaxedPlusDataBase64Images() {
        //copied from Whitelist.relaxed()
        addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                "ul");
        addAttributes("a", "href", "title");
        addAttributes("blockquote", "cite");
        addAttributes("col", "span", "width");
        addAttributes("colgroup", "span", "width");
        addAttributes("img", "align", "alt", "height", "src", "title", "width");
        addAttributes("ol", "start", "type");
        addAttributes("q", "cite");
        addAttributes("table", "summary", "width");
        addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
        addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
        addAttributes("ul", "type");
        addProtocols("a", "href", "ftp", "http", "https", "mailto");
        addProtocols("blockquote", "cite", "http", "https");
        addProtocols("cite", "cite", "http", "https");
        addProtocols("img", "src", "http", "https");
        addProtocols("q", "cite", "http", "https");
    }

    @Override
    protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
        return ("img".equals(tagName)
                && "src".equals(attr.getKey())
                && attr.getValue().startsWith("data:;base64")) ||
            super.isSafeAttribute(tagName, el, attr);
    }
}

As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this.



来源:https://stackoverflow.com/questions/22444156/how-to-make-a-jsoup-whitelist-to-accept-certain-attribute-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!