Given any arbitrary text file full of printable characters, how can this be converted to HTML that would be rendered exactly the same (with the following requirements)?
Use a zero-width space (
) to preserve whitespace and allow the text to wrap. The basic idea is to pair each space or sequence of spaces with a zero-width space. Then replace each space with a non-breaking space. You'll also want to encode html and add line breaks.
If you don't care about unicode characters, it's trivial. You can just use string.replace()
:
function textToHTML(text)
{
return ((text || "") + "") // make sure it is a string;
.replace(/&/g, "&")
.replace(//g, ">")
.replace(/\t/g, " ")
.replace(/ /g, " ")
.replace(/\r\n|\r|\n/g, "
");
}
If it's ok for the white space to wrap, pair each space with a zero-width space as above. Otherwise, to keep white space together, pair each sequence of spaces with a zero-width space:
.replace(/ /g, " ")
.replace(/(( )+)/g, "$1")
To encode unicode characters, it's a little more complex. You need to iterate the string:
var charEncodings = {
"\t": " ",
" ": " ",
"&": "&",
"<": "<",
">": ">",
"\n": "
",
"\r": "
"
};
var space = /[\t ]/;
var noWidthSpace = "";
function textToHTML(text)
{
text = (text || "") + ""; // make sure it is a string;
text = text.replace(/\r\n/g, "\n"); // avoid adding two
tags
var html = "";
var lastChar = "";
for (var i in text)
{
var char = text[i];
var charCode = text.charCodeAt(i);
if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || ""))
{
html += noWidthSpace;
}
html += char in charEncodings ? charEncodings[char] :
charCode > 127 ? "" + charCode + ";" : char;
lastChar = char;
}
return html;
}
Now, just a comment. Without using monospace fonts, you'll lose some formatting. Consider how these lines of text with a monospace font form columns:
ten seven spaces
eleven four spaces
Without the monospaced font, you will lose the columns:
ten seven spaces
eleven four spaces
It seems that the algorithm to fix that would be very complex.