Setting Turkish and English locale: translate Turkish characters to Latin equivalents

前端 未结 5 1102
别跟我提以往
别跟我提以往 2021-02-04 02:29

I want to translate my Turkish strings to lowercase in both English and Turkish locale. I\'m doing this:

String myString=\"YAŞAT BAYRI\";
Locale trlocale= new Lo         


        
相关标签:
5条回答
  • 2021-02-04 03:11

    I think this is the problem:

    Locale trlocale= new Locale("tr-TR");
    

    Try this instead:

    Locale trlocale= new Locale("tr", "TR");
    

    That's the constructor to use to specify country and language.

    0 讨论(0)
  • 2021-02-04 03:11

    Characters ş and s are different characters. Changing locale cannot help you to translate one to another. You have to create turkish-to-english characters table and do this yourself. I once did this for Vietnamic language that has a lot of such characters. You have to deal with 4 of 5, right? So, good luck!

    0 讨论(0)
  • 2021-02-04 03:27

    If you are using the Locale constructor, you can and must set the language, country and variant as separate arguments:

    new Locale(language)
    new Locale(language, country)
    new Locale(language, country, variant)
    

    Therefore, your test program creates locales with the language "tr-TR" and "en_US". For your test program, you can use new Locale("tr", "TR") and new Locale("en", "US").

    If you are using Java 1.7+, then you can also parse a language tag using Locale.forLanguageTag:

    String myString="YASAT BAYRI";
    Locale trlocale= Locale.forLanguageTag("tr-TR");
    Locale enLocale = Locale.forLanguageTag("en_US");
    

    Creates strings that have the appropriate lower case for the language.

    0 讨论(0)
  • 2021-02-04 03:27

    you can do that:

    Locale trlocale= new Locale("tr","TR");
    

    The first parameter is your language, while the other one is your country.

    0 讨论(0)
  • 2021-02-04 03:32

    If you just want the string in ASCII, without accents, the following might do. First an accented character might be split in ASCII char and a combining diacritical mark (zero-width accent). Then only those accents may be removed by regular expression replace.

    public static String withoutDiacritics(String s) {
        // Decompose any ş into s and combining-,.
        String s2 = Normalizer.normalize(s, Normalizer.Form.NFD);
        return s2.replaceAll("(?s)\\p{InCombiningDiacriticalMarks}", "");
    }
    
    0 讨论(0)
提交回复
热议问题