I have some Javascript code that communicates with an XML-RPC backend. The XML-RPC returns strings of the form:
A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser
API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.
If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent
.
var encodedStr = 'hello & world';
var parser = new DOMParser;
var dom = parser.parseFromString(
'' + encodedStr,
'text/html');
var decodedString = dom.body.textContent;
console.log(decodedString);
We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.
The
parseFromString(str, type)
method must run these steps, depending on type:
"text/html"
Parse str with an
HTML parser
, and return the newly createdDocument
.The scripting flag must be set to "disabled".
NOTE
script
elements get marked unexecutable and the contents ofnoscript
get parsed as markup.
It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.