Is there an easy way to take a string of html in JavaScript and strip out the html?
You can safely strip html tags using the iframe sandbox attribute.
The idea here is that instead of trying to regex our string, we take advantage of the browser's native parser by injecting the text into a DOM element and then querying the textContent
/innerText
property of that element.
The best suited element in which to inject our text is a sandboxed iframe, that way we can prevent any arbitrary code execution (Also known as XSS).
The downside of this approach is that it only works in browsers.
Here's what I came up with (Not battle-tested):
const stripHtmlTags = (() => {
const sandbox = document.createElement("iframe");
sandbox.sandbox = "allow-same-origin"; // <--- This is the key
sandbox.style.setProperty("display", "none", "important");
// Inject the sanbox in the current document
document.body.appendChild(sandbox);
// Get the sandbox's context
const sanboxContext = sandbox.contentWindow.document;
return (untrustedString) => {
if (typeof untrustedString !== "string") return "";
// Write the untrusted string in the iframe's body
sanboxContext.open();
sanboxContext.write(untrustedString);
sanboxContext.close();
// Get the string without html
return sanboxContext.body.textContent || sanboxContext.body.innerText || "";
};
})();
Usage (demo):
console.log(stripHtmlTags(`XSS injection :)`));
console.log(stripHtmlTags(`