Recommended method for escaping HTML in Java

前端 未结 11 878
南旧
南旧 2020-11-22 04:30

Is there a recommended way to escape <, >, \" and & characters when outputting HTML in plain Java code? (Other

相关标签:
11条回答
  • 2020-11-22 04:54

    An alternative to Apache Commons: Use Spring's HtmlUtils.htmlEscape(String input) method.

    0 讨论(0)
  • 2020-11-22 04:56

    There is a newer version of the Apache Commons Lang library and it uses a different package name (org.apache.commons.lang3). The StringEscapeUtils now has different static methods for escaping different types of documents (http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html). So to escape HTML version 4.0 string:

    import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
    
    String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");
    
    0 讨论(0)
  • 2020-11-22 04:58

    Nice short method:

    public static String escapeHTML(String s) {
        StringBuilder out = new StringBuilder(Math.max(16, s.length()));
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c > 127 || c == '"' || c == '\'' || c == '<' || c == '>' || c == '&') {
                out.append("&#");
                out.append((int) c);
                out.append(';');
            } else {
                out.append(c);
            }
        }
        return out.toString();
    }
    

    Based on https://stackoverflow.com/a/8838023/1199155 (the amp is missing there). The four characters checked in the if clause are the only ones below 128, according to http://www.w3.org/TR/html4/sgml/entities.html

    0 讨论(0)
  • 2020-11-22 04:58

    On android (API 16 or greater) you can:

    Html.escapeHtml(textToScape);
    

    or for lower API:

    TextUtils.htmlEncode(textToScape);
    
    0 讨论(0)
  • 2020-11-22 05:00

    While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtml is nice and I have used it in the past it should not be used for escaping HTML (or XML) attributes otherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).

    I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.

    While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data- attribute usage.

    0 讨论(0)
提交回复
热议问题