Alphabetize Arabic and Japanese text that is in Unicode?

前端 未结 5 1239
逝去的感伤
逝去的感伤 2021-01-03 02:23

Does anyone have any code for alphabetizing Arabic and Japanese text that is in Unicode? If the code was in ruby that would be great.

相关标签:
5条回答
  • 2021-01-03 03:06

    To ask the obvious question, what don't you like about mylist.sort?

    0 讨论(0)
  • 2021-01-03 03:12

    mylist.sort should work out of the box in Ruby 1.9 (which has built-in unicode support). In Ruby 1.8, where Unicode support isn't built in, I think you'd have to use the character-encodings gem extend the String class with UTF-8 string comparisions. (And then mylist.sort would work.)

    0 讨论(0)
  • 2021-01-03 03:23

    I don't know Ruby, but python has a function, ord() that translates a unicode special character to its unicode code point. For example,

    >>> a = u'ل'
    >>> ord(a)
    0: 1604
    >>> b = u'ع'
    >>> ord(b)
    1: 1593
    

    Look for something like that in Ruby. I assume that the Arabic symbols are listed in unicode in alphabetic order.

    0 讨论(0)
  • 2021-01-03 03:24

    Unicode code points are not listed in alphabetic order (Z < a, for example), but they try to be approximately in that order anyway. There is a canonical unicode order, defined by the Unicode Collation Algorithm and they are also language-specific ordering (french order is not exacly the same as german or czech order, even with the same alphabet), which can be specified in locale information. I think the ICU library contains the language specific algorithms you are looking for.

    0 讨论(0)
  • 2021-01-03 03:30

    Depending on your needs words.sort in ruby will be fine for Japanese. The order the characters appear in Unicode are in a reasonably good sorting order. Can't vouch for Arabic though, but my guess is that it's ok as well.

    0 讨论(0)
提交回复
热议问题