Iconv::IllegalSequence when using www::mechanize

ε祈祈猫儿з 提交于 2019-12-04 15:58:59

That page is most certainly UTF-8, however Mechanize uses NKF (a core Ruby library) to guess the encoding and for some reason it comes up as Shift JIS. The quickest way to work around the problem is to override the encoding mapping for Mechanize, so that when it attempts to convert the body to UTF-8 using Iconv it passes in the source encoding as UTF-8 as well. You can do it like this:

WWW::Mechanize::Util::CODE_DIC[:SJIS] = "UTF-8"

Place that just after the line where you require the Mechanize library. You may want to set the value back immediately after, or even better, find the root cause of the problem and submit a patch if necessary.

Note: The way I solved this was by debugging the Mechanize library by using the backtrace. The to_native_charset method calls detect_charset which is where the problem was.

In my case a Mechanize::File was returned by the get method which doesn't use encoding at all.
I was able to fix it by manually converting with Iconv, but this only works if you know the encoding already.

result = @agent.get uri
# Mechanize::File instead of Mechanize::Page is returned 
# so we have to convert manually
result = Iconv.conv("utf-8", "iso-8859-1", result.body)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!