How to remove control characters from java string?

前端未结

关注

 7  2122

I have a string coming from UI that may contains control characters, and I want to remove all control characters except carriage returns, line feeds

相关标签:

7条回答

既然无缘

2020-12-15 16:49

use these

public static String removeNonAscii(String str) { return str.replaceAll("[^\\x00-\\x7F]", ""); } public static String removeNonPrintable(String str) // All Control Char { return str.replaceAll("[\\p{C}]", ""); } public static String removeSomeControlChar(String str) // Some Control Char { return str.replaceAll("[\\p{Cntrl}\\p{Cc}\\p{Cf}\\p{Co}\\p{Cn}]", ""); } public static String removeControlCharFull(String str) { return removeNonPrintable(str).replaceAll("[\\r\\n\\t]", ""); }

0 讨论(0)

发布评论:

提交评论

加载中...

轮回少年

2020-12-15 16:50

One option is to use a combination of CharMatchers:

CharMatcher charsToPreserve = CharMatcher.anyOf("\r\n\t"); CharMatcher allButPreserved = charsToPreserve.negate(); CharMatcher controlCharactersToRemove = CharMatcher.JAVA_ISO_CONTROL.and(allButPreserved);

Then use removeFrom as before. I don't know how efficient it is, but it's at least simple.

As noted in edits, JAVA_ISO_CONTROL is now deprecated in Guava; the javaIsoControl() method is preferred.

0 讨论(0)

发布评论:

提交评论

加载中...

佛祖请我去吃肉

2020-12-15 16:52

In Java regular expression, it is possible to exclude some characters in a character class. Here's a sample program demonstrate something similar:

class test { public static void main (String argv[]) { String testStr="abcdefABCDEF"; System.out.println(testStr); System.out.println(testStr.replaceAll("[\\p{Lower}&&[^cd]]","")); } }

It will produce this output:

abcdefABCDEF cdABCDEF

0 讨论(0)

发布评论:

提交评论

加载中...

猫巷女王i

2020-12-15 16:54

I'm using Selenium to test web screens. I use Hamcrest asserts and matchers to search the page source for different strings based on various conditions.

String pageSource = browser.getPageSource(); assertThat("Text not found!", pageSource, containsString(text));

This works just fine using an IE or Firefox driver, but it bombs when using the HtmlUnitDriver. The HtmlUnitDriver formats the page source with tabs, carriage returns, and other control characters. I am using a riff on Nidhish Krishnan's ingenious answer above. If I use Nidish's solution "out of the box," I am left with extra spaces, so I added a private method named filterTextForComparison:

String pageSource = filterTextForComparison(browser.getPageSource()); assertThat("Text not found!", pageSource, containsString(filterTextForComparison(text)));

And the function:

/** * Filter out any characters embedded in the text that will interfere with * comparing Strings. * * @param text * the text to filter. * @return the text with any extraneous character removed. */ private String filterTextForComparison(String text) { String filteredText = text; if (filteredText != null) { filteredText = filteredText.replaceAll("\\p{Cc}", " ").replaceAll("\\s{2,}", " "); } return filteredText; }

First, the method replaces the control characters with a space then it replaces multiple spaces with a single one. I tried doing everything at once with "\p{Cc}+?" but it didn't catch "\t " becoming " ".

0 讨论(0)

发布评论:

提交评论

加载中...

有刺的猬

2020-12-15 16:55

You can do something like this if you want to delete all characters in other or control uni-code category

System.out.println( "a\u0000b\u0007c\u008fd".replaceAll("\\p{Cc}", "") ); // abcd

Note : This actually removes (among others) '\u008f' Unicode character from the string, not the escaped form "%8F" string.

Courtesy : polygenelubricants ( Replace Unicode Control Characters )

0 讨论(0)

发布评论:

提交评论

加载中...

走了就别回头了

2020-12-15 16:57

Use StringUtils.deleteWhiteSpace(text) from Apache Commons Lang.

0 讨论(0)

发布评论:

提交评论

加载中...

1 2 下一页

验证码

看不清?

提交回复