For the last month, we\'ve had a bot scraping our site regularly, leading to a bunch of ArgumentError: invalid %-encoding
errors because the URLs are malformed. I\'
if you don't mind against monkeypatching Rack then create in config/initializers
file (for example rack.rb
) with this content:
module Rack
module Utils
if defined?(::Encoding)
def unescape(s, encoding = Encoding::UTF_8)
begin
URI.decode_www_form_component(s, encoding)
rescue ArgumentError
URI.decode_www_form_component(URI.encode(s), encoding)
end
end
else
def unescape(s, encoding = nil)
begin
URI.decode_www_form_component(s, encoding)
rescue ArgumentError
URI.decode_www_form_component(URI.encode(s), encoding)
end
end
end
module_function :unescape
end
end
p.s. it works with passenger, but with Webrick and Thin it doesn't. It looks like both webrick and thin parse a request too, so the failure happens before initializer is loaded. For example with Thin error happens in thin-1.6.2/lib/thin/request.rb:84
.
You could inject a middleware designed to detect these and fail gracefully. The basic idea is to just try to parse the query string, and if it fails, bail out with a HTTP 400. Otherwise, just allow the request through.
class RefuseInvalidRequest
def initialize(app)
@app = app
end
def call(env)
query = Rack::Utils.parse_nested_query(env['QUERY_STRING'].to_s) rescue :bad_query
if query == :bad_query
[400, {'Content-Type' => 'text/plain'}, "Bad Request"]
else
@app.call(env)
end
end
end
I haven't tested this, but the concept should work.