I need to determine the length of string which may contain html-entities.
For example \"&darr ;\" (↓) would return length 6, which is correct, but I want these
<div id="foo">↓</div>
alert(document.getElementById("foo").innerHTML.length); // alerts 1
So based on that rationale, create a div, append your mixed up entity ridden string to it, extract the HTML and check the length.
var div = document.createElement("div");
div.innerHTML = "↓↓↓↓";
alert(div.innerHTML.length); // alerts 4
Try it here.
You might want to put that in a function for convenience, e.g.:
function realLength(str) { // maybe there's a better name?
var el = document.createElement("div");
el.innerHTML = str;
return el.innerHTML.length;
}
If you are running the javascript in a browser I would suggest using it to help you. You can create an element and set its innerHTML to be your string containing HTML-entities. Then extract the contents of that element you just created as text.
Here is an example (uses Mootools): http://jsfiddle.net/mqchen/H73EV/
You could for most purposes assume that an ampersand followed by letters, or a possible '#' and numbers, followed by a semicolon, is one character.
var strlen=string.replace(/&#?[a-zA-Z0-9]+;/g,' ').length;
Since there's no solution using jQuery yet:
var str = 'lol&';
alert($('<span />').html(str).text().length); // alerts 4
Uses the same approach like karim79, but it never adds the created element to the document.
Unfortunately, JavaScript does not natively support encoding or decoding of HTML entities, which is what you will need to do to get the 'real' string length. I was able to find this third-party library which is able to decode and encode HTML entities and it appears to work well enough, but there's no guaranteeing how complete it will be.
http://www.strictly-software.com/htmlencode