Recommended method for escaping HTML in Java

前端 未结 11 852
南旧
南旧 2020-11-22 04:30

Is there a recommended way to escape <, >, \" and & characters when outputting HTML in plain Java code? (Other

相关标签:
11条回答
  • 2020-11-22 04:35

    For those who use Google Guava:

    import com.google.common.html.HtmlEscapers;
    [...]
    String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
    String escaped = HtmlEscapers.htmlEscaper().escape(source);
    
    0 讨论(0)
  • 2020-11-22 04:36

    For some purposes, HtmlUtils:

    import org.springframework.web.util.HtmlUtils;
    [...]
    HtmlUtils.htmlEscapeDecimal("&"); //gives &#38;
    HtmlUtils.htmlEscape("&"); //gives &amp;
    
    0 讨论(0)
  • 2020-11-22 04:41

    The most libraries offer escaping everything they can, including hundreds of symbols and thousands of non-ASCII characters which is not what you want in UTF-8 world.

    Also, as Jeff Williams noted, there's no single “escape HTML” option, there are several contexts.

    Assuming you never use unquoted attributes, and keeping in mind that different contexts exist, it've written my own version:

    private static final long BODY_ESCAPE =
            1L << '&' | 1L << '<' | 1L << '>';
    private static final long DOUBLE_QUOTED_ATTR_ESCAPE =
            1L << '"' | 1L << '&' | 1L << '<' | 1L << '>';
    private static final long SINGLE_QUOTED_ATTR_ESCAPE =
            1L << '"' | 1L << '&' | 1L << '\'' | 1L << '<' | 1L << '>';
    
    // 'quot' and 'apos' are 1 char longer than '#34' and '#39' which I've decided to use
    private static final String REPLACEMENTS = "&#34;&amp;&#39;&lt;&gt;";
    private static final int REPL_SLICES = /*  |0,   5,   10,  15, 19, 23*/
            5<<5 | 10<<10 | 15<<15 | 19<<20 | 23<<25;
    // These 5-bit numbers packed into a single int
    // are indices within REPLACEMENTS which is a 'flat' String[]
    
    private static void appendEscaped(
            StringBuilder builder,
            CharSequence content,
            long escapes // pass BODY_ESCAPE or *_QUOTED_ATTR_ESCAPE here
    ) {
        int startIdx = 0, len = content.length();
        for (int i = 0; i < len; i++) {
            char c = content.charAt(i);
            long one;
            if (((c & 63) == c) && ((one = 1L << c) & escapes) != 0) {
            // -^^^^^^^^^^^^^^^   -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            // |                  | take only dangerous characters
            // | java shifts longs by 6 least significant bits,
            // | e. g. << 0b110111111 is same as >> 0b111111.
            // | Filter out bigger characters
    
                int index = Long.bitCount(SINGLE_QUOTED_ATTR_ESCAPE & (one - 1));
                builder.append(content, startIdx, i /* exclusive */)
                        .append(REPLACEMENTS,
                                REPL_SLICES >>> 5*index & 31,
                                REPL_SLICES >>> 5*(index+1) & 31);
                startIdx = i + 1;
            }
        }
        builder.append(content, startIdx, len);
    }
    

    Consider copy-pasting from Gist without line length limit.

    0 讨论(0)
  • 2020-11-22 04:48

    StringEscapeUtils from Apache Commons Lang:

    import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
    // ...
    String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
    String escaped = escapeHtml(source);
    

    For version 3:

    import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
    // ...
    String escaped = escapeHtml4(source);
    
    0 讨论(0)
  • 2020-11-22 04:52

    Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheet for details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.

    0 讨论(0)
  • 2020-11-22 04:53

    org.apache.commons.lang3.StringEscapeUtils is now deprecated. You must now use org.apache.commons.text.StringEscapeUtils by

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-text</artifactId>
            <version>${commons.text.version}</version>
        </dependency>
    
    0 讨论(0)
提交回复
热议问题