I need a comparator in java which has the same semantics as the sql \'like\' operator. For example:
myComparator.like(\"digital\",\"%ital%\");
myComparator.l
public static boolean like(String source, String exp) {
if (source == null || exp == null) {
return false;
}
int sourceLength = source.length();
int expLength = exp.length();
if (sourceLength == 0 || expLength == 0) {
return false;
}
boolean fuzzy = false;
char lastCharOfExp = 0;
int positionOfSource = 0;
for (int i = 0; i < expLength; i++) {
char ch = exp.charAt(i);
// 是否转义
boolean escape = false;
if (lastCharOfExp == '\\') {
if (ch == '%' || ch == '_') {
escape = true;
// System.out.println("escape " + ch);
}
}
if (!escape && ch == '%') {
fuzzy = true;
} else if (!escape && ch == '_') {
if (positionOfSource >= sourceLength) {
return false;
}
positionOfSource++;// <<<----- 往后加1
} else if (ch != '\\') {// 其他字符,但是排查了转义字符
if (positionOfSource >= sourceLength) {// 已经超过了source的长度了
return false;
}
if (lastCharOfExp == '%') { // 上一个字符是%,要特别对待
int tp = source.indexOf(ch);
// System.out.println("上一个字符=%,当前字符是=" + ch + ",position=" + position + ",tp=" + tp);
if (tp == -1) { // 匹配不到这个字符,直接退出
return false;
}
if (tp >= positionOfSource) {
positionOfSource = tp + 1;// <<<----- 往下继续
if (i == expLength - 1 && positionOfSource < sourceLength) { // exp已经是最后一个字符了,此刻检查source是不是最后一个字符
return false;
}
} else {
return false;
}
} else if (source.charAt(positionOfSource) == ch) {// 在这个位置找到了ch字符
positionOfSource++;
} else {
return false;
}
}
lastCharOfExp = ch;// <<<----- 赋值
// System.out.println("当前字符是=" + ch + ",position=" + position);
}
// expr的字符循环完了,如果不是模糊的,看在source里匹配的位置是否到达了source的末尾
if (!fuzzy && positionOfSource < sourceLength) {
// System.out.println("上一个字符=" + lastChar + ",position=" + position );
return false;
}
return true;// 这里返回true
}
Assert.assertEquals(true, like("abc_d", "abc\\_d"));
Assert.assertEquals(true, like("abc%d", "abc\\%%d"));
Assert.assertEquals(false, like("abcd", "abc\\_d"));
String source = "1abcd";
Assert.assertEquals(true, like(source, "_%d"));
Assert.assertEquals(false, like(source, "%%a"));
Assert.assertEquals(false, like(source, "1"));
Assert.assertEquals(true, like(source, "%d"));
Assert.assertEquals(true, like(source, "%%%%"));
Assert.assertEquals(true, like(source, "1%_"));
Assert.assertEquals(false, like(source, "1%_2"));
Assert.assertEquals(false, like(source, "1abcdef"));
Assert.assertEquals(true, like(source, "1abcd"));
Assert.assertEquals(false, like(source, "1abcde"));
// 下面几个case很有代表性
Assert.assertEquals(true, like(source, "_%_"));
Assert.assertEquals(true, like(source, "_%____"));
Assert.assertEquals(true, like(source, "_____"));// 5个
Assert.assertEquals(false, like(source, "___"));// 3个
Assert.assertEquals(false, like(source, "__%____"));// 6个
Assert.assertEquals(false, like(source, "1"));
Assert.assertEquals(false, like(source, "a_%b"));
Assert.assertEquals(true, like(source, "1%"));
Assert.assertEquals(false, like(source, "d%"));
Assert.assertEquals(true, like(source, "_%"));
Assert.assertEquals(true, like(source, "_abc%"));
Assert.assertEquals(true, like(source, "%d"));
Assert.assertEquals(true, like(source, "%abc%"));
Assert.assertEquals(false, like(source, "ab_%"));
Assert.assertEquals(true, like(source, "1ab__"));
Assert.assertEquals(true, like(source, "1ab__%"));
Assert.assertEquals(false, like(source, "1ab___"));
Assert.assertEquals(true, like(source, "%"));
Assert.assertEquals(false, like(null, "1ab___"));
Assert.assertEquals(false, like(source, null));
Assert.assertEquals(false, like(source, ""));
Check out https://github.com/hrakaroo/glob-library-java.
It's a zero dependency library in Java for doing glob (and sql like) type of comparisons. Over a large data set it is faster than translating to a regular expression.
Basic syntax
MatchingEngine m = GlobPattern.compile("dog%cat\%goat_", '%', '_', GlobPattern.HANDLE_ESCAPES);
if (m.matches(str)) { ... }
The Comparator and Comparable interfaces are likely inapplicable here. They deal with sorting, and return integers of either sign, or 0. Your operation is about finding matches, and returning true/false. That's different.
i dont know exactly about the greedy issue, but try this if it works for you:
public boolean like(final String str, String expr)
{
final String[] parts = expr.split("%");
final boolean traillingOp = expr.endsWith("%");
expr = "";
for (int i = 0, l = parts.length; i < l; ++i)
{
final String[] p = parts[i].split("\\\\\\?");
if (p.length > 1)
{
for (int y = 0, l2 = p.length; y < l2; ++y)
{
expr += p[y];
if (i + 1 < l2) expr += ".";
}
}
else
{
expr += parts[i];
}
if (i + 1 < l) expr += "%";
}
if (traillingOp) expr += "%";
expr = expr.replace("?", ".");
expr = expr.replace("%", ".*");
return str.matches(expr);
}
Java strings have .startsWith() and .contains() methods which will get you most of the way. For anything more complicated you'd have to use regex or write your own method.
Ok this is a bit of a weird solution, but I thought it should still be mentioned.
Instead of recreating the like mechanism we can utilize the existing implementation already available in any database!
(Only requirement is, your application must have access to any database).
Just run a very simple query each time,that returns true or false depending on the result of the like's comparison. Then execute the query, and read the answer directly from the database!
For Oracle db:
SELECT
CASE
WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
ELSE 'false'
END test
FROM dual
For MS SQL Server
SELECT
CASE
WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
ELSE 'false'
END test
All you have to do is replace "StringToSearch" and "LikeSequence" with bind parameters and set the values you want to check.