Since Ruby 1.9, Strings always have an encoding attached. So Ruby can properly handle multi-byte characters and is able to convert between different encodings. Prior versions of Ruby basically handled strings as byte arrays which made it nearly impossible to properly handle multiple encodings.
By default, Ruby 1.9 uses US_ASCII encoding everywhere while Ruby since 2.0 uses UTF-8 by default.
Generally, you only have to change anything if you are running Ruby 1.9. If your editor saves UTF-8 files and you are running Ruby >= 2.0, everything will be fine by default.
Still, in all Ruby versions since 1.9, you can change the encoding used. There are three different default encodings you can set (which all use the respective Ruby's default encoding by default, i.e.m US_ASCII
on 1.9, UTF-8
on Ruby 2.0 and newer):
- internal encoding: The default encoding all strings are converted to. This is the encoding that strings are saved internally.
- external encoding: When reading files, assume them to be in that encoding.
- source encoding: Assume the ruby source code to be written in this encoding
The former two encodings can be set like this
Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'UTF-8'
They are then used during all operations in the current Ruby processes lifetime.
The source encoding can be set using a "magic comment" on the first line of your ruby file (or below the shebang) like so
# encoding: UTF-8
or by starting your script using ruby -KU
which also sets the default encoding to UTF-8. You can also set this in your shebang. In your specific case, you have to at least set the source encoding using one of the provided mechanisms.
See http://graysoftinc.com/character-encodings and especially http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings for some more information and background on String encodings in Ruby 1.9.