Ruby 1.9: how can I properly upcase & downcase multibyte strings?

后端 未结 3 390
自闭症患者
自闭症患者 2020-11-28 06:29

So matz made the decision to keep upcase and downcase limited to /[A-Z]/i in ruby 1.9.1.

ActiveSupport::Multibyte

相关标签:
3条回答
  • 2020-11-28 07:05

    Case conversion is complicated and locale-dependent. Fortunately, Martin Dürst added full Unicode case mapping in Ruby 2.4:

    puts RUBY_DESCRIPTION
    
    sd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN"
    def ps(u, d, k); puts "%-30s:  %24s / %-24s" % [k, u, d] end 
    ps sd.upcase,              su.downcase,              "Ruby 2.4 (default)"
    ps sd.upcase(:ascii),      su.downcase(:ascii),      "Ruby 2.4 (ascii)"
    ps sd.upcase(:turkic),     su.downcase(:turkic),     "Ruby 2.4 (turkic)"
    ps sd.upcase(:lithuanian), su.downcase(:lithuanian), "Ruby 2.4 (lithuanian)"
    ps "-",                    su.downcase(:fold),       "Ruby 2.4 (fold)"
    

    Output:

    ruby 2.4.0dev (2016-06-24 trunk 55499) [x86_64-linux]
    Ruby 2.4 (default)            :      IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn
    Ruby 2.4 (ascii)              :      IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn
    Ruby 2.4 (turkic)             :      IÑTËRNÂTİÔNÀLİZÆTİØN / ıñtërnâtıônàlızætıøn
    Ruby 2.4 (lithuanian)         :      IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn
    Ruby 2.4 (fold)               :                         - / iñtërnâtiônàlizætiøn
    
    0 讨论(0)
  • 2020-11-28 07:08

    Case conversion is locale dependent and doesn't always round-trip, which is why Ruby 1.9 doesn't cover it (see here and here)

    The unicode-util gem should address your needs.

    0 讨论(0)
  • 2020-11-28 07:18

    for anybody coming from Google by ruby upcase utf8:

    > "your problem chars here çöğıü Iñtërnâtiônàlizætiøn".mb_chars.upcase.to_s
    => "YOUR PROBLEM CHARS HERE ÇÖĞIÜ IÑTËRNÂTIÔNÀLIZÆTIØN"
    

    solution is to use mb_chars.

    Documentation:

    • https://www.rubydoc.info/gems/activesupport/String#mb_chars-instance_method
    • https://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
    0 讨论(0)
提交回复
热议问题