问题
I have a text file containing:
mariam amr sara john jessy salma mkkkkkaooooorllll
the user enters a word to search for: for example: maram
As you can see, it does not exist in my text file .. I want to give suggestions, similar to the word maram is mariam
I used longest common subsequence but it gives mariam
and mkkkkkaooooorllll
because both contain the Longest common subsequence "mar"
I want to force the choice of mariam only Any ideas ?
Thanks in advance
/**
** Java Program to implement Longest Common Subsequence Algorithm
**/
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
/** Class LongestCommonSubsequence **/
public class LongestCommonSubsequence
{
/** function lcs **/
public String lcs(String str1, String str2)
{
int l1 = str1.length();
int l2 = str2.length();
int[][] arr = new int[l1 + 1][l2 + 1];
for (int i = l1 - 1; i >= 0; i--)
{
for (int j = l2 - 1; j >= 0; j--)
{
if (str1.charAt(i) == str2.charAt(j))
arr[i][j] = arr[i + 1][j + 1] + 1;
else
arr[i][j] = Math.max(arr[i + 1][j], arr[i][j + 1]);
}
}
int i = 0, j = 0;
StringBuffer sb = new StringBuffer();
while (i < l1 && j < l2)
{
if (str1.charAt(i) == str2.charAt(j))
{
sb.append(str1.charAt(i));
i++;
j++;
}
else if (arr[i + 1][j] >= arr[i][j + 1])
i++;
else
j++;
}
return sb.toString();
//read text file, if a word contains sb.toString() , print it
}
/** Main Function **/
public static void main(String[] args) throws IOException
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Longest Common Subsequence Algorithm Test\n");
System.out.println("\nEnter string 1");
String str1 = br.readLine();
System.out.println("\nEnter string 2");
String str2 = br.readLine();
LongestCommonSubsequence obj = new LongestCommonSubsequence();
String result = obj.lcs(str1, str2);
System.out.println("\nLongest Common Subsequence : "+ result);
}
}
回答1:
There are a few techniques for fuzzy matching like this - Apache Commons provides some excellent tools for comparing how similar two strings are to one another. Check out the javadoc for Levenshtein Distance and Jaro Winkler Distance calculation methods.
With Levenshtein Distance, the lower the score, the more similar the strings are:
StringUtils.getLevenshteinDistance("frog", "fog") == 1
StringUtils.getLevenshteinDistance("fly", "ant") == 3
You could also consider calculating the Double Metaphone for each string - this will allow you to determine how similar the strings 'sound' when spoken, even if they aren't necessarily spelt similarly.
Back to your question - using these tools, you could throw up suggestions if the user's search term is within a certain threshold of any of the strings in your text file.
来源:https://stackoverflow.com/questions/31159227/search-suggestion-in-strings