postgres encoding error in sidekiq app

白昼怎懂夜的黑 提交于 2019-12-05 14:13:10

Just because the string claims to be UTF-8 doesn't mean that it is UTF-8. \xe9 is é in ISO-8859-1 (AKA Latin-1) but it is invalid in UTF-8; similarly, \xf1 is ñ in ISO-8859-1 but invalid in UTF-8. That suggests that the string is actually encoded in ISO-8859-1 rather than UTF-8. You can fix it with a combination of force_encoding to correct Ruby's confusion about the current encoding and encode to re-encode it as UTF-8:

> "Tweets en Ingl\xE9s y en Espa\xF1ol".force_encoding('iso-8859-1').encode('utf-8')
=> "Tweets en Inglés y en Español" 

So before sending that string to the database you want to:

name = name.force_encoding('iso-8859-1').encode('utf-8')

Unfortunately, there is no way to reliably detect a string's real encoding. The various encodings overlap and there's no way to tell if è (\xe8 in ISO-8859-1) or č (\xe8 in ISO-8859-2) is the right character without manual sanity checking.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!