Given any arbitrary text file full of printable characters, how can this be converted to HTML that would be rendered exactly the same (with the following requirements)?
While this doesn't quite meet all your requirements — for one thing it doesn't handle tabs, I've used the following gem, which adds a wordWrap()
method to Javascript String
s, on a couple of occasions to do something similar to what you're describing — so it might be a good starting point to come up with something that also does the additional things you want.
//+ Jonas Raoni Soares Silva
//@ http://jsfromhell.com/string/wordwrap [rev. #2]
// String.wordWrap(maxLength: Integer,
// [breakWith: String = "\n"],
// [cutType: Integer = 0]): String
//
// Returns an string with the extra characters/words "broken".
//
// maxLength maximum amount of characters per line
// breakWith string that will be added whenever one is needed to
// break the line
// cutType 0 = words longer than "maxLength" will not be broken
// 1 = words will be broken when needed
// 2 = any word that trespasses the limit will be broken
String.prototype.wordWrap = function(m, b, c){
var i, j, l, s, r;
if(m < 1)
return this;
for(i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
for(s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s = s.slice(j)).length ? b : ""))
j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m : j.input.length - j[0].length
|| c == 1 && m || j.input.length + (j = s.slice(m).match(/^\S*/)).input.length;
return r.join("\n");
};
I'd also like to comment that it seems to me as though, in general, you'd want to use a monospaced font if tabs are involved because the width of words would vary with the proportional font used (making the results of using of tab stops very font dependent).
Update: Here's a slightly more readable version courtesy of an online javascript beautifier:
String.prototype.wordWrap = function(m, b, c) {
var i, j, l, s, r;
if (m < 1)
return this;
for (i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s)
for (s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s =
s.slice(j)).length ? b : ""))
j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m :
j.input.length - j[0].length || c == 1 && m || j.input.length +
(j = s.slice(m).match(/^\S*/)).input.length;
return r.join("\n");
};
Is is very simple if you use jQuery library in your project.
Just one line ,Add asHTml
extenstion to String Class and :
var plain='<a> i am text plain </a>'
plain.asHtml();
/* '<a> i am text plain </a>' */
DEMO :http://jsfiddle.net/abdennour/B6vGG/3/
Note : You will not have to access to DoM . Just use builder design pattern of jQuery
$('<tagName />')
The solution to do that while still allowing the browser to wrap long lines is to replace each sequence of two spaces with a space and a non break space.
The browser will correctly render all spaces (normal and non break ones), while still wrapping long lines (due to normal spaces).
Javascript:
text = html_escape(text); // dummy function
text = text.replace(/\t/g, ' ')
.replace(/ /g, ' ')
.replace(/ /g, ' ') // second pass
// handles odd number of spaces, where we
// end up with " " + " " + " "
.replace(/\r\n|\n|\r/g, '<br />');
Use a zero-width space (​
) to preserve whitespace and allow the text to wrap. The basic idea is to pair each space or sequence of spaces with a zero-width space. Then replace each space with a non-breaking space. You'll also want to encode html and add line breaks.
If you don't care about unicode characters, it's trivial. You can just use string.replace()
:
function textToHTML(text)
{
return ((text || "") + "") // make sure it is a string;
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/\t/g, " ")
.replace(/ /g, "​ ​")
.replace(/\r\n|\r|\n/g, "<br />");
}
If it's ok for the white space to wrap, pair each space with a zero-width space as above. Otherwise, to keep white space together, pair each sequence of spaces with a zero-width space:
.replace(/ /g, " ")
.replace(/(( )+)/g, "​$1​")
To encode unicode characters, it's a little more complex. You need to iterate the string:
var charEncodings = {
"\t": " ",
" ": " ",
"&": "&",
"<": "<",
">": ">",
"\n": "<br />",
"\r": "<br />"
};
var space = /[\t ]/;
var noWidthSpace = "​";
function textToHTML(text)
{
text = (text || "") + ""; // make sure it is a string;
text = text.replace(/\r\n/g, "\n"); // avoid adding two <br /> tags
var html = "";
var lastChar = "";
for (var i in text)
{
var char = text[i];
var charCode = text.charCodeAt(i);
if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || ""))
{
html += noWidthSpace;
}
html += char in charEncodings ? charEncodings[char] :
charCode > 127 ? "&#" + charCode + ";" : char;
lastChar = char;
}
return html;
}
Now, just a comment. Without using monospace fonts, you'll lose some formatting. Consider how these lines of text with a monospace font form columns:
ten seven spaces
eleven four spaces
Without the monospaced font, you will lose the columns:
ten seven spaces
eleven four spaces
It seems that the algorithm to fix that would be very complex.