Iconv::IllegalSequence when using www::mechanize

北城以北 提交于 2019-12-06 09:32:21

问题


I'm trying to do a little bit of webscraping, but the WWW:Mechanize gem doesn't seem to like the encoding and crashes.
The post request results in a 302 redirect (which mechanize follows, so far so good) and the resulting page seems to crash it. I googled quite a bit, but nothing came up so far how to solve this. Any of you got an idea?

Code:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new

agent.user_agent_alias = 'Mac Safari'
answer = agent.post('https://www.budget.de/de/reservierung/privatkunden/step1/schnellbuchung',
{"Country" => "Deutschland",
"Abholstation" => "Aalen",
"Abgabestation" => "Aalen",
"Abholdatum" => "26.02.2009",
"Abholzeit_stunde" => "13",
"Abholzeit_minute" => "30",
"Abgabedatum" => "28.02.2009",
"Abgabezeit_stunde" => "13",
"Abgabezeit_minute" => "30",
"CountryID" => "DE",
"AbholstationID"=>"AA1",
"AbgabestationID"=>"AA1"
}
)
puts answer.body

Error:

D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/util.rb:29:in `iconv': "\204nderungen vorbe"... (Iconv::IllegalSequence)
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/util.rb:29:in `to_native_charset'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/response_header_handler.rb:29:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:30:in `pass'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/handler.rb:6:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/response_body_parser.rb:35:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:30:in `pass'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/handler.rb:6:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain/pre_connect_hook.rb:14:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize/chain.rb:25:in `handle'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:494:in `fetch_page'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:545:in `fetch_page'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:403:in `post_form'
from D:/Ruby/lib/ruby/gems/1.8/gems/mechanize-0.9.1/lib/www/mechanize.rb:322:in `post'
from test.rb:7

回答1:


That page is most certainly UTF-8, however Mechanize uses NKF (a core Ruby library) to guess the encoding and for some reason it comes up as Shift JIS. The quickest way to work around the problem is to override the encoding mapping for Mechanize, so that when it attempts to convert the body to UTF-8 using Iconv it passes in the source encoding as UTF-8 as well. You can do it like this:

WWW::Mechanize::Util::CODE_DIC[:SJIS] = "UTF-8"

Place that just after the line where you require the Mechanize library. You may want to set the value back immediately after, or even better, find the root cause of the problem and submit a patch if necessary.

Note: The way I solved this was by debugging the Mechanize library by using the backtrace. The to_native_charset method calls detect_charset which is where the problem was.




回答2:


In my case a Mechanize::File was returned by the get method which doesn't use encoding at all.
I was able to fix it by manually converting with Iconv, but this only works if you know the encoding already.

result = @agent.get uri
# Mechanize::File instead of Mechanize::Page is returned 
# so we have to convert manually
result = Iconv.conv("utf-8", "iso-8859-1", result.body)


来源:https://stackoverflow.com/questions/586163/iconvillegalsequence-when-using-wwwmechanize

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!