Strip HTML from Text JavaScript

前端 未结 30 3588
北荒
北荒 2020-11-21 05:08

Is there an easy way to take a string of html in JavaScript and strip out the html?

30条回答
  •  天涯浪人
    2020-11-21 05:16

    You can safely strip html tags using the iframe sandbox attribute.

    The idea here is that instead of trying to regex our string, we take advantage of the browser's native parser by injecting the text into a DOM element and then querying the textContent/innerText property of that element.

    The best suited element in which to inject our text is a sandboxed iframe, that way we can prevent any arbitrary code execution (Also known as XSS).

    The downside of this approach is that it only works in browsers.

    Here's what I came up with (Not battle-tested):

    const stripHtmlTags = (() => {
      const sandbox = document.createElement("iframe");
      sandbox.sandbox = "allow-same-origin"; // <--- This is the key
      sandbox.style.setProperty("display", "none", "important");
    
      // Inject the sanbox in the current document
      document.body.appendChild(sandbox);
    
      // Get the sandbox's context
      const sanboxContext = sandbox.contentWindow.document;
    
      return (untrustedString) => {
        if (typeof untrustedString !== "string") return ""; 
    
        // Write the untrusted string in the iframe's body
        sanboxContext.open();
        sanboxContext.write(untrustedString);
        sanboxContext.close();
    
        // Get the string without html
        return sanboxContext.body.textContent || sanboxContext.body.innerText || "";
      };
    })();
    

    Usage (demo):

    console.log(stripHtmlTags(`XSS injection :)`));
    console.log(stripHtmlTags(`
    
                                     
                  
提交回复
热议问题