Fastest method to escape HTML tags as HTML entities?

前端 未结 12 1386
-上瘾入骨i
-上瘾入骨i 2020-11-22 09:24

I\'m writing a Chrome extension that involves doing a lot of the following job: sanitizing strings that might contain HTML tags, by converting

相关标签:
12条回答
  • 2020-11-22 09:35

    I'll add XMLSerializer to the pile. It provides the fastest result without using any object caching (not on the serializer, nor on the Text node).

    function serializeTextNode(text) {
      return new XMLSerializer().serializeToString(document.createTextNode(text));
    }
    

    The added bonus is that it supports attributes which is serialized differently than text nodes:

    function serializeAttributeValue(value) {
      const attr = document.createAttribute('a');
      attr.value = value;
      return new XMLSerializer().serializeToString(attr);
    }
    

    You can see what it's actually replacing by checking the spec, both for text nodes and for attribute values. The full documentation has more node types, but the concept is the same.

    As for performance, it's the fastest when not cached. When you do allow caching, then calling innerHTML on an HTMLElement with a child Text node is fastest. Regex would be slowest (as proven by other comments). Of course, XMLSerializer could be faster on other browsers, but in my (limited) testing, a innerHTML is fastest.


    Fastest single line:

    new XMLSerializer().serializeToString(document.createTextNode(text));

    Fastest with caching:

    const cachedElementParent = document.createElement('div');
    const cachedChildTextNode = document.createTextNode('');
    cachedElementParent.appendChild(cachedChildTextNode);
    
    function serializeTextNode(text) {
      cachedChildTextNode.nodeValue = text;
      return cachedElementParent.innerHTML;
    }
    

    https://jsperf.com/htmlentityencode/1

    0 讨论(0)
  • 2020-11-22 09:35

    A bit late to the show, but what's wrong with using encodeURIComponent() and decodeURIComponent()?

    0 讨论(0)
  • 2020-11-22 09:37

    Here's one way you can do this:

    var escape = document.createElement('textarea');
    function escapeHTML(html) {
        escape.textContent = html;
        return escape.innerHTML;
    }
    
    function unescapeHTML(html) {
        escape.innerHTML = html;
        return escape.textContent;
    }
    

    Here's a demo.

    0 讨论(0)
  • 2020-11-22 09:43

    function encode(r) {
      return r.replace(/[\x26\x0A\x3c\x3e\x22\x27]/g, function(r) {
    	return "&#" + r.charCodeAt(0) + ";";
      });
    }
    
    test.value=encode('How to encode\nonly html tags &<>\'" nice & fast!');
    
    /*
     \x26 is &ampersand (it has to be first),
     \x0A is newline,
     \x22 is ",
     \x27 is ',
     \x3c is <,
     \x3e is >
    */
    <textarea id=test rows=11 cols=55>www.WHAK.com</textarea>

    0 讨论(0)
  • 2020-11-22 09:45

    An even quicker/shorter solution is:

    escaped = new Option(html).innerHTML
    

    This is related to some weird vestige of JavaScript whereby the Option element retains a constructor that does this sort of escaping automatically.

    Credit to https://github.com/jasonmoo/t.js/blob/master/t.js

    0 讨论(0)
  • 2020-11-22 09:47

    Martijn's method as a prototype function:

    String.prototype.escape = function() {
        var tagsToReplace = {
            '&': '&amp;',
            '<': '&lt;',
            '>': '&gt;'
        };
        return this.replace(/[&<>]/g, function(tag) {
            return tagsToReplace[tag] || tag;
        });
    };
    
    var a = "<abc>";
    var b = a.escape(); // "&lt;abc&gt;"
    
    0 讨论(0)
提交回复
热议问题