Convert plain text to HTML text in Java

前端 未结 6 629
-上瘾入骨i
-上瘾入骨i 2020-12-05 10:53

I have java program, which will receive plain text from server. The plain text may contain URLs. Is there any Class in Java library to convert plain text to HTML text? Or an

相关标签:
6条回答
  • 2020-12-05 11:27

    You should do some replacements on the text programmatically. Here are some clues:

    • All Newlines should be converted to "<br>\n" (The \n for better readability of the output).
    • All CRs should be dropped (who uses DOS encoding anyway).
    • All pairs of spaces should be replaced with " &nbsp;"
    • Replace "<" with "&lt;"
    • Replace "&" with "&amp;"
    • All other characters < 128 should be left as they are.
    • All other characters >= 128 should be written as "&#"+((int)myChar)+";", to make them readable in every encoding.
    • To autodetect your links, you could either use a regex like "http://[^ ]+", or "www.[^ ]" and convert them like JB Nizet said. to "<a href=\""+url+"\">"+url+"</a>", but only after having done all the other replacements.

    The code to do this looks something like this:

    public static String escape(String s) {
        StringBuilder builder = new StringBuilder();
        boolean previousWasASpace = false;
        for( char c : s.toCharArray() ) {
            if( c == ' ' ) {
                if( previousWasASpace ) {
                    builder.append("&nbsp;");
                    previousWasASpace = false;
                    continue;
                }
                previousWasASpace = true;
            } else {
                previousWasASpace = false;
            }
            switch(c) {
                case '<': builder.append("&lt;"); break;
                case '>': builder.append("&gt;"); break;
                case '&': builder.append("&amp;"); break;
                case '"': builder.append("&quot;"); break;
                case '\n': builder.append("<br>"); break;
                // We need Tab support here, because we print StackTraces as HTML
                case '\t': builder.append("&nbsp; &nbsp; &nbsp;"); break;  
                default:
                    if( c < 128 ) {
                        builder.append(c);
                    } else {
                        builder.append("&#").append((int)c).append(";");
                    }    
            }
        }
        return builder.toString();
    }
    

    However, the link conversion has yet to be added. If someone does it, please update the code.

    0 讨论(0)
  • 2020-12-05 11:33

    Use this

    public static String stringToHTMLString(String string) {
        StringBuffer sb = new StringBuffer(string.length());
        // true if last char was blank
        boolean lastWasBlankChar = false;
        int len = string.length();
        char c;
    
        for (int i = 0; i < len; i++) {
            c = string.charAt(i);
            if (c == ' ') {
                // blank gets extra work,
                // this solves the problem you get if you replace all
                // blanks with &nbsp;, if you do that you loss 
                // word breaking
                if (lastWasBlankChar) {
                    lastWasBlankChar = false;
                    sb.append("&nbsp;");
                } else {
                    lastWasBlankChar = true;
                    sb.append(' ');
                }
            } else {
                lastWasBlankChar = false;
                //
                // HTML Special Chars
                if (c == '"')
                    sb.append("&quot;");
                else if (c == '&')
                    sb.append("&amp;");
                else if (c == '<')
                    sb.append("&lt;");
                else if (c == '>')
                    sb.append("&gt;");
                else if (c == '\n')
                    // Handle Newline
                    sb.append("<br/>");
                else {
                    int ci = 0xffff & c;
                    if (ci < 160)
                        // nothing special only 7 Bit
                        sb.append(c);
                    else {
                        // Not 7 Bit use the unicode system
                        sb.append("&#");
                        sb.append(new Integer(ci).toString());
                        sb.append(';');
                    }
                }
            }
        }
        return sb.toString();
    }
    
    0 讨论(0)
  • 2020-12-05 11:37

    Just joined the coded from all answers:

    private static String txtToHtml(String s) {
            StringBuilder builder = new StringBuilder();
            boolean previousWasASpace = false;
            for (char c : s.toCharArray()) {
                if (c == ' ') {
                    if (previousWasASpace) {
                        builder.append("&nbsp;");
                        previousWasASpace = false;
                        continue;
                    }
                    previousWasASpace = true;
                } else {
                    previousWasASpace = false;
                }
                switch (c) {
                    case '<':
                        builder.append("&lt;");
                        break;
                    case '>':
                        builder.append("&gt;");
                        break;
                    case '&':
                        builder.append("&amp;");
                        break;
                    case '"':
                        builder.append("&quot;");
                        break;
                    case '\n':
                        builder.append("<br>");
                        break;
                    // We need Tab support here, because we print StackTraces as HTML
                    case '\t':
                        builder.append("&nbsp; &nbsp; &nbsp;");
                        break;
                    default:
                        builder.append(c);
    
                }
            }
            String converted = builder.toString();
            String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
            Pattern patt = Pattern.compile(str);
            Matcher matcher = patt.matcher(converted);
            converted = matcher.replaceAll("<a href=\"$1\">$1</a>");
            return converted;
        }
    
    0 讨论(0)
  • 2020-12-05 11:40

    In Android application I just implemented HTMLifying of a content ( see https://github.com/andstatus/andstatus/issues/375 ). Actual transformation was done in literary 3 lines of code using Android system libraries. This gives an advantage of using better implementation at each subsequent version of Android libraries.

    private static String htmlifyPlain(String textIn) {
        SpannableString spannable = SpannableString.valueOf(textIn);
        Linkify.addLinks(spannable, Linkify.WEB_URLS);
        return Html.toHtml(spannable);
    }
    
    0 讨论(0)
  • 2020-12-05 11:45

    If your plain text is a URL (which is different from containing a hyperlink, as you wrote in your question), then transforming it into a hyperlink in HTML is simply done by

    String hyperlink = "<a href='" + url + "'>" + url + "</a>";
    
    0 讨论(0)
  • 2020-12-05 11:50

    I found a solution using pattern matching. Here is my code -

    String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
    Pattern patt = Pattern.compile(str);
    Matcher matcher = patt.matcher(plain);
    plain = matcher.replaceAll("<a href=\"$1\">$1</a>");
    

    And Here are the input and output -

    Input text is variable plain:

    some text and then the URL http://www.google.com and then some other text.
    

    Output :

    some text and then the URL <a href="http://www.google.com">http://www.google.com</a> and then some other text.
    
    0 讨论(0)
提交回复
热议问题