I have an Rails application surviving from migrations since Rails version 1 and I would like to ignore all invalid byte sequences on it, to keep the backwards c
If you can configure your database/page/whatever to give you strings in ASCII-8BIT, this will get you their real encoding.
Use Ruby's stdlib encoding guessing library. Pass all your strings through something like this:
require 'nkf'
str = "- Men\xFC -"
str.force_encoding(NKF.guess(str))
The NKF library will guess the encoding (usually successfully), and force that encoding on the string. If you don't feel like trusting the NKF library totally, build this safeguard around string operations too:
begin
str.split
rescue ArgumentError
str.force_encoding('BINARY')
retry
end
This will fallback on BINARY if NKF didn't guess correctly. You can turn this into a method wrapper:
def str_op(s)
begin
yield s
rescue ArgumentError
s.force_encoding('BINARY')
retry
end
end