问题
Good evening, i hope you can help me with this problem, as I'm struggling to find solutions.
I have a provider of words, who give me vowelled Hebrew words , for example -
Vowelled - בַּיִת not vowelled - בית
Vowelled - הַבַּיְתָה not vowelled - הביתה
Unlike my provider, my user can't normally enter Hebrew vowels (nor should i want him to do that). The user story is the user seeking a word in the provided words. The problem is the comparison between the vowelled and the un-vowelled words. As each is represented by a different byte array in the memory, the equals method returns false.
I tried looking into how UTF-8 handles hebrew vowels and it seems like it's just normal characters.
I do want to present the vowels to the user, so i want to keep the string as-is in the memory, but when comparing i want to ignore them. Is there any simple way to solve this problem?
回答1:
You can using a Collator. I can't tell you how exactly it's working as it's new to me, but this appears to do the trick:
public static void main( String[] args ) {
String withVowels = "בַּיִת";
String withoutVowels = "בית";
String withVowelsTwo = "הַבַּיְתָה";
String withoutVowelsTwo = "הביתה";
System.out.println( "These two strings are " + (withVowels.equals( withoutVowels ) ? "" : "not ") + "equal" );
System.out.println( "The second two strings are " + (withVowelsTwo.equals( withoutVowelsTwo ) ? "" : "not ") + "equal" );
Collator collator = Collator.getInstance( new Locale( "he" ) );
collator.setStrength( Collator.PRIMARY );
System.out.println( collator.equals( withVowels, withoutVowels ) );
System.out.println( collator.equals( withVowelsTwo, withoutVowelsTwo ) );
}
From that, I get the following output:
These two strings are not equal
The second two strings are not equal
true
true
回答2:
AFAIK there isn't. Vowels are characters. Even some combinations of letters and dots are characters. See the wikipedia page.
http://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet
You can store the search key for your words as characters only in the 05dx-05ex range. You can add another field for the word with the vowels.
Of course you should be expecting the following:
- You should need to account for words that have different meaning according to nikkud.
- You should take into account "mispellings" of י and ו, which are commonplace.
来源:https://stackoverflow.com/questions/12763476/ignoring-hebrew-vowels-when-comparing-strings