Rendering Plaintext as HTML maintaining whitespace – without

前端 未结 4 1744
情深已故
情深已故 2021-02-14 08:02

Given any arbitrary text file full of printable characters, how can this be converted to HTML that would be rendered exactly the same (with the following requirements)?

4条回答
  •  面向向阳花
    2021-02-14 08:28

    Use a zero-width space () to preserve whitespace and allow the text to wrap. The basic idea is to pair each space or sequence of spaces with a zero-width space. Then replace each space with a non-breaking space. You'll also want to encode html and add line breaks.

    If you don't care about unicode characters, it's trivial. You can just use string.replace():

    function textToHTML(text)
    {
        return ((text || "") + "")  // make sure it is a string;
            .replace(/&/g, "&")
            .replace(//g, ">")
            .replace(/\t/g, "    ")
            .replace(/ /g, "​ ​")
            .replace(/\r\n|\r|\n/g, "
    "); }

    If it's ok for the white space to wrap, pair each space with a zero-width space as above. Otherwise, to keep white space together, pair each sequence of spaces with a zero-width space:

        .replace(/ /g, " ")
        .replace(/(( )+)/g, "​$1​")
    

    To encode unicode characters, it's a little more complex. You need to iterate the string:

    var charEncodings = {
        "\t": "    ",
        " ": " ",
        "&": "&",
        "<": "<",
        ">": ">",
        "\n": "
    ", "\r": "
    " }; var space = /[\t ]/; var noWidthSpace = "​"; function textToHTML(text) { text = (text || "") + ""; // make sure it is a string; text = text.replace(/\r\n/g, "\n"); // avoid adding two
    tags var html = ""; var lastChar = ""; for (var i in text) { var char = text[i]; var charCode = text.charCodeAt(i); if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || "")) { html += noWidthSpace; } html += char in charEncodings ? charEncodings[char] : charCode > 127 ? "&#" + charCode + ";" : char; lastChar = char; } return html; }

    Now, just a comment. Without using monospace fonts, you'll lose some formatting. Consider how these lines of text with a monospace font form columns:

    ten       seven spaces
    eleven    four spaces
    

    Without the monospaced font, you will lose the columns:

     ten       seven spaces
     eleven    four spaces

    It seems that the algorithm to fix that would be very complex.

提交回复
热议问题