ignoring hebrew vowels when comparing strings

谁都会走 提交于 2019-12-05 16:14:18

You can using a Collator. I can't tell you how exactly it's working as it's new to me, but this appears to do the trick:

public static void main( String[] args ) {
    String withVowels = "בַּיִת";
    String withoutVowels = "בית";

    String withVowelsTwo = "הַבַּיְתָה";
    String withoutVowelsTwo = "הביתה";

    System.out.println( "These two strings are " + (withVowels.equals( withoutVowels ) ? "" : "not ") + "equal" );
    System.out.println( "The second two strings are " + (withVowelsTwo.equals( withoutVowelsTwo ) ? "" : "not ") + "equal" );

    Collator collator = Collator.getInstance( new Locale( "he" ) );
    collator.setStrength( Collator.PRIMARY );

    System.out.println( collator.equals( withVowels, withoutVowels ) );
    System.out.println( collator.equals( withVowelsTwo, withoutVowelsTwo ) );
}

From that, I get the following output:

These two strings are not equal
The second two strings are not equal
true
true

AFAIK there isn't. Vowels are characters. Even some combinations of letters and dots are characters. See the wikipedia page.

http://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet

You can store the search key for your words as characters only in the 05dx-05ex range. You can add another field for the word with the vowels.

Of course you should be expecting the following:

  • You should need to account for words that have different meaning according to nikkud.
  • You should take into account "mispellings" of י and ו, which are commonplace.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!