I have some Javascript code that communicates with an XML-RPC backend. The XML-RPC returns strings of the form:
I tried everything to remove & from a JSON array. None of the above examples, but https://stackoverflow.com/users/2030321/chris gave a great solution that led me to fix my problem.
var stringtodecode="<B>Hello</B> world<br>";
document.getElementById("decodeIt").innerHTML=stringtodecode;
stringtodecode=document.getElementById("decodeIt").innerText
I did not use, because I did not understand how to insert it into a modal window that was pulling JSON data into an array, but I did try this based upon the example, and it worked:
var modal = document.getElementById('demodal');
$('#ampersandcontent').text(replaceAll(data[0],"&", "&"));
I like it because it was simple, and it works, but not sure why it's not widely used. Searched hi & low to find a simple solution. I continue to seek understanding of the syntax, and if there is any risk to using this. Have not found anything yet.
jQuery will encode and decode for you. However, you need to use a textarea tag, not a div.
var str1 = 'One & two & three';
var str2 = "One & two & three";
$(document).ready(function() {
$("#encoded").text(htmlEncode(str1));
$("#decoded").text(htmlDecode(str2));
});
function htmlDecode(value) {
return $("<textarea/>").html(value).text();
}
function htmlEncode(value) {
return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<div id="encoded"></div>
<div id="decoded"></div>
The question doesn't specify the origin of x
but it makes sense to defend, if we can, against malicious (or just unexpected, from our own application) input. For example, suppose x
has a value of & <script>alert('hello');</script>
. A safe and simple way to handle this in jQuery is:
var x = "& <script>alert('hello');</script>";
var safe = $('<div />').html(x).text();
// => "& alert('hello');"
Found via https://gist.github.com/jmblog/3222899. I can't see many reasons to avoid using this solution given it is at least as short, if not shorter than some alternatives and provides defence against XSS.
(I originally posted this as a comment, but am adding it as an answer since a subsequent comment in the same thread requested that I do so).
This is the most comprehensive solution I've tried so far:
const STANDARD_HTML_ENTITIES = {
nbsp: String.fromCharCode(160),
amp: "&",
quot: '"',
lt: "<",
gt: ">"
};
const replaceHtmlEntities = plainTextString => {
return plainTextString
.replace(/&#(\d+);/g, (match, dec) => String.fromCharCode(dec))
.replace(
/&(nbsp|amp|quot|lt|gt);/g,
(a, b) => STANDARD_HTML_ENTITIES[b]
);
};
A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser
API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.
If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent
.
var encodedStr = 'hello & world';
var parser = new DOMParser;
var dom = parser.parseFromString(
'<!doctype html><body>' + encodedStr,
'text/html');
var decodedString = dom.body.textContent;
console.log(decodedString);
We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.
The
parseFromString(str, type)
method must run these steps, depending on type:
"text/html"
Parse str with an
HTML parser
, and return the newly createdDocument
.The scripting flag must be set to "disabled".
NOTE
script
elements get marked unexecutable and the contents ofnoscript
get parsed as markup.
It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.
Matthias Bynens has a library for this: https://github.com/mathiasbynens/he
Example:
console.log(
he.decode("Jörg & Jürgen rocked to & fro ")
);
// Logs "Jörg & Jürgen rocked to & fro"
I suggest favouring it over hacks involving setting an element's HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.
If you really can't bear to load in a library, you can use the textarea
hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:
function decodeEntities(encodedString) {
var textArea = document.createElement('textarea');
textArea.innerHTML = encodedString;
return textArea.value;
}
console.log(decodeEntities('1 & 2')); // '1 & 2'
But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a textarea
(or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.