How to remove hard spaces with Jsoup?

前端 未结 2 1551
南旧
南旧 2021-01-11 11:13

I\'m trying to remove hard spaces (from   entities in the HTML). I can\'t remove it with .trim() or .replace(\" \", \"\"), et

相关标签:
2条回答
  • 2021-01-11 11:39

    The question has been edited to reflect the true problem.

    New answer; The hardspace, ie. entity   (Unicode character NO-BREAK SPACE U+00A0 ) can in Java be represented by the character \u00a0, thus code becomes, where str is the string gotten from the text() method

    str.replaceAll ("\u00a0", "");
    

    Old answer; Using the JSoup library,

    import org.jsoup.parser.Parser;
    
    String str1 = Parser.unescapeEntities("last week, Ovokerie Ogbeta", false);
    String str2 = Parser.unescapeEntities("Entered » Here", false);
    System.out.println(str1 + " " + str2);
    

    Prints out:

    last week, Ovokerie Ogbeta Entered » Here 
    
    0 讨论(0)
  • 2021-01-11 11:57

    Your first attempt was very nearly it, you're quite right that Jsoup maps   to U+00A0. You just don't want the double backslash in your string:

    System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
    // Just one ------------------------------------------^
    

    replace doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.

    0 讨论(0)
提交回复
热议问题