For the last month, we\'ve had a bot scraping our site regularly, leading to a bunch of ArgumentError: invalid %-encoding
errors because the URLs are malformed. I\'
if you don't mind against monkeypatching Rack then create in config/initializers
file (for example rack.rb
) with this content:
module Rack
module Utils
if defined?(::Encoding)
def unescape(s, encoding = Encoding::UTF_8)
begin
URI.decode_www_form_component(s, encoding)
rescue ArgumentError
URI.decode_www_form_component(URI.encode(s), encoding)
end
end
else
def unescape(s, encoding = nil)
begin
URI.decode_www_form_component(s, encoding)
rescue ArgumentError
URI.decode_www_form_component(URI.encode(s), encoding)
end
end
end
module_function :unescape
end
end
p.s. it works with passenger, but with Webrick and Thin it doesn't. It looks like both webrick and thin parse a request too, so the failure happens before initializer is loaded. For example with Thin error happens in thin-1.6.2/lib/thin/request.rb:84
.