I am trying to check if a Java String is not null
, not empty and not whitespace.
In my mind, this code should have been quite up for the job.
Is there a string that will make the
isEmpty
andisBlank
behave differently in a test case?
Note that Character.isWhitespace can recognize Unicode characters and return true
for Unicode whitespace characters.
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (
SPACE_SEPARATOR
,LINE_SEPARATOR
, orPARAGRAPH_SEPARATOR
) but is not also a non-breaking space ('\u00A0'
,'\u2007'
,'\u202F'
).
[...]
On the other hand, trim()
method would trim all control characters whose code points are below U+0020 and the space character (U+0020).
Therefore, the two methods would behave differently at presence of a Unicode whitespace character. For example: "\u2008"
. Or when the string contains control characters that are not consider whitespace by Character.isWhitespace
method. For example: "\002"
.
If you were to write a regular expression to do this (which is slower than doing a loop through the string and check):
isEmpty()
would be equivalent to .matches("[\\x00-\\x20]*")
isBlank()
would be equivalent to .matches("\\p{javaWhitespace}*")
(The isEmpty()
and isBlank()
method both allow for null
String reference, so it is not exactly equivalent to the regex solution, but putting that aside, it is equivalent).
Note that \p{javaWhitespace}
, as its name implied, is Java-specific syntax to access the character class defined by Character.isWhitespace
method.
Assuming there are none, is there any other consideration because of which I should choose
isBlank
and not useisEmpty
?
It depends. However, I think the explanation in the part above should be sufficient for you to decide. To sum up the difference:
isEmpty()
will consider the string is empty if it contains only control characters1 below U+0020 and space character (U+0020)
isBlank
will consider the string is empty if it contains only whitespace characters as defined by Character.isWhitespace
method, which includes Unicode whitespace characters.
1 There is also the control character at U+007F DELETE
, which is not trimmed by trim()
method.