I am trying to get a best match string matching to work using existing Java data structures. It is quite slow though, any suggestions to improve its performance will be welcomed
I prefer the TreeMap answer, but for completeness the same algorithm, now with binary search.
String[][] data = {
{ "0060175559138", "VIP" }, // <-- found insert position
{ "00601755511", "International" }, // <-- skipped
{ "00601755510", "International" }, // <-- skipped
{ "006017555", "National" }, // <-- final find
{ "006017", "Local" },
{ "0060", "X" },
};
Comparator comparator = (lhs, rhs) -> lhs[0].compareTo(rhs[0]);
Arrays.sort(data, comparator);
String searchKey = "0060175552020";
int ix = Arrays.binarySearch(data, new String[] { searchKey }, comparator);
if (ix < 0) {
ix = ~ix; // Not found, insert position
--ix;
while (ix >= 0) {
if (searchKey.startsWith(data[ix][0])) {
break;
}
if (searchKey.compareTo(data[ix][0]) < 0) {
ix = -1; // Not found
break;
}
--ix;
}
}
if (ix == -1) {
System.out.println("Not found");
} else {
System.out.printf("Found: %s - %s%n", data[ix][0], data[ix][1]);
}
This algorithm is first logarithmic, and then does a loop. If there are no skipped entries, logarithmic time: fine. So the question is, how many entries need to be skipped.
If you store at every element a reference to its prefix:
from { "00601755511", "International" },
to { "006017555", "National" },
then you would only need to follow the prefix back links.