Implementing a best match search in Java

后端未结

关注

 2  1269

北海茫月 2021-02-08 04:54

I am trying to get a best match string matching to work using existing Java data structures. It is quite slow though, any suggestions to improve its performance will be welcomed

2条回答

逝去的感伤 (楼主)

2021-02-08 05:29

I prefer the TreeMap answer, but for completeness the same algorithm, now with binary search.

String[][] data = {
        { "0060175559138", "VIP" },           // <-- found insert position
        { "00601755511", "International" },   // <-- skipped
        { "00601755510", "International" },   // <-- skipped
        { "006017555", "National" },          // <-- final find
        { "006017", "Local" },
        { "0060", "X" },
};
Comparator comparator = (lhs, rhs) -> lhs[0].compareTo(rhs[0]);
Arrays.sort(data, comparator);

String searchKey = "0060175552020";
int ix = Arrays.binarySearch(data, new String[] { searchKey }, comparator);
if (ix < 0) {
    ix = ~ix; // Not found, insert position
    --ix;
    while (ix >= 0) {
        if (searchKey.startsWith(data[ix][0])) {
            break;
        }
        if (searchKey.compareTo(data[ix][0]) < 0) {
            ix = -1; // Not found
            break;
        }
        --ix;
    }
}
if (ix == -1) {
    System.out.println("Not found");
} else {
    System.out.printf("Found: %s - %s%n", data[ix][0], data[ix][1]);
}

This algorithm is first logarithmic, and then does a loop. If there are no skipped entries, logarithmic time: fine. So the question is, how many entries need to be skipped.

If you store at every element a reference to its prefix: from { "00601755511", "International" }, to { "006017555", "National" }, then you would only need to follow the prefix back links.

0 讨论(0)

查看其它2个回答