Say I have two strings,
String s1 = \"AbBaCca\";
String s2 = \"bac\";
I want to perform a check returning that s2
is contained
String.regionMatches()
Using regexp can be relatively slow. It (being slow) doesn't matter if you just want to check in one case. But if you have an array or a collection of thousands or hundreds of thousands of strings, things can get pretty slow.
The presented solution below doesn't use regular expressions nor toLowerCase()
(which is also slow because it creates another strings and just throws them away after the check).
The solution builds on the String.regionMatches() method which seems to be unknown. It checks if 2 String
regions match, but what's important is that it also has an overload with a handy ignoreCase
parameter.
public static boolean containsIgnoreCase(String src, String what) {
final int length = what.length();
if (length == 0)
return true; // Empty string is contained
final char firstLo = Character.toLowerCase(what.charAt(0));
final char firstUp = Character.toUpperCase(what.charAt(0));
for (int i = src.length() - length; i >= 0; i--) {
// Quick check before calling the more expensive regionMatches() method:
final char ch = src.charAt(i);
if (ch != firstLo && ch != firstUp)
continue;
if (src.regionMatches(true, i, what, 0, length))
return true;
}
return false;
}
This speed analysis does not mean to be rocket science, just a rough picture of how fast the different methods are.
I compare 5 methods.
String.contains()
.String.contains()
with the pre-cached, lower-cased substring. This solution is already not as flexible because it tests a predefiend substring.Pattern.compile().matcher().find()
...)Pattern
. This solution is already not as flexible because it tests a predefined substring.Results (by calling the method 10 million times):
Pattern
: 1845 msResults in a table:
RELATIVE SPEED 1/RELATIVE SPEED
METHOD EXEC TIME TO SLOWEST TO FASTEST (#1)
------------------------------------------------------------------------------
1. Using regionMatches() 670 ms 10.7x 1.0x
2. 2x lowercase+contains 2829 ms 2.5x 4.2x
3. 1x lowercase+contains cache 2446 ms 2.9x 3.7x
4. Regexp 7180 ms 1.0x 10.7x
5. Regexp+cached pattern 1845 ms 3.9x 2.8x
Our method is 4x faster compared to lowercasing and using contains()
, 10x faster compared to using regular expressions and also 3x faster even if the Pattern
is pre-cached (and losing flexibility of checking for an arbitrary substring).
If you're interested how the analysis was performed, here is the complete runnable application:
import java.util.regex.Pattern;
public class ContainsAnalysis {
// Case 1 utilizing String.regionMatches()
public static boolean containsIgnoreCase(String src, String what) {
final int length = what.length();
if (length == 0)
return true; // Empty string is contained
final char firstLo = Character.toLowerCase(what.charAt(0));
final char firstUp = Character.toUpperCase(what.charAt(0));
for (int i = src.length() - length; i >= 0; i--) {
// Quick check before calling the more expensive regionMatches()
// method:
final char ch = src.charAt(i);
if (ch != firstLo && ch != firstUp)
continue;
if (src.regionMatches(true, i, what, 0, length))
return true;
}
return false;
}
// Case 2 with 2x toLowerCase() and contains()
public static boolean containsConverting(String src, String what) {
return src.toLowerCase().contains(what.toLowerCase());
}
// The cached substring for case 3
private static final String S = "i am".toLowerCase();
// Case 3 with pre-cached substring and 1x toLowerCase() and contains()
public static boolean containsConverting(String src) {
return src.toLowerCase().contains(S);
}
// Case 4 with regexp
public static boolean containsIgnoreCaseRegexp(String src, String what) {
return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
.matcher(src).find();
}
// The cached pattern for case 5
private static final Pattern P = Pattern.compile(
Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);
// Case 5 with pre-cached Pattern
public static boolean containsIgnoreCaseRegexp(String src) {
return P.matcher(src).find();
}
// Main method: perfroms speed analysis on different contains methods
// (case ignored)
public static void main(String[] args) throws Exception {
final String src = "Hi, I am Adam";
final String what = "i am";
long start, end;
final int N = 10_000_000;
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCase(src, what);
end = System.nanoTime();
System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsConverting(src, what);
end = System.nanoTime();
System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsConverting(src);
end = System.nanoTime();
System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCaseRegexp(src, what);
end = System.nanoTime();
System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");
start = System.nanoTime();
for (int i = 0; i < N; i++)
containsIgnoreCaseRegexp(src);
end = System.nanoTime();
System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
}
}
String container = " Case SeNsitive ";
String sub = "sen";
if (rcontains(container, sub)) {
System.out.println("no case");
}
public static Boolean rcontains(String container, String sub) {
Boolean b = false;
for (int a = 0; a < container.length() - sub.length() + 1; a++) {
//System.out.println(sub + " to " + container.substring(a, a+sub.length()));
if (sub.equalsIgnoreCase(container.substring(a, a + sub.length()))) {
b = true;
}
}
return b;
}
Basically, it is a method that takes two strings. It is supposed to be a not-case sensitive version of contains(). When using the contains method, you want to see if one string is contained in the other.
This method takes the string that is "sub" and checks if it is equal to the substrings of the container string that are equal in length to the "sub". If you look at the for
loop, you will see that it iterates in substrings (that are the length of the "sub") over the container string.
Each iteration checks to see if the substring of the container string is equalsIgnoreCase
to the sub.
We can use stream with anyMatch and contains of Java 8
public class Test2 {
public static void main(String[] args) {
String a = "Gina Gini Protijayi Soudipta";
String b = "Gini";
System.out.println(WordPresentOrNot(a, b));
}// main
private static boolean WordPresentOrNot(String a, String b) {
//contains is case sensitive. That's why change it to upper or lower case. Then check
// Here we are using stream with anyMatch
boolean match = Arrays.stream(a.toLowerCase().split(" ")).anyMatch(b.toLowerCase()::contains);
return match;
}
}
One problem with the answer by Dave L. is when s2 contains regex markup such as \d
, etc.
You want to call Pattern.quote() on s2:
Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();
"AbCd".toLowerCase().contains("abcD".toLowerCase())
I'm not sure what your main question is here, but yes, .contains is case sensitive.