External links URL encoding leads to '%3F' and '%3D' on Nginx server

拥有回忆 提交于 2019-11-29 09:25:57

An exact same question was actually asked on nginx-ru mailing list about a year ago:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050200.html

The most helpful response, by an Nginx, Inc, employee/developer, Валентин Бартенев:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050209.html

Если запрос приходит в таком виде, то это уже не параметры, а имя запрошенного файла. Другое дело, что location ищется по уже раскодированному адресу, о чем в документации написано.

Translation:

If the request comes in such a form, then these are no longer the args, but the name of the requested file. Another thing is that, as documented, the location matching is performed against a normalised URI.

His suggested solution, translated to the sample example from the question here at SO, would then be:

location /default/Site? {
    rewrite \?(.*)$ /default/Site?$1? last;
}

location = /default/Site {
    [...]
}

The following sample would redirect all wrongly-looking requests (defined as having ? in the requested filename — encoded as %3F in the request) into less wrongly-looking ones, regardless of URL.

(Please note that, as rightly advised elsewhere, you should not be getting these wrongly-formed links in the first place, so, use it as a last resort — only when you cannot correct the wrongly formed links otherwise, and you do know that such requests are attempted by valid agents.)

server {
    listen      [::]:80;
    server_name localhost;

    rewrite     ^/([^?]*)\?(.*)$    /$1?$2?     permanent;
    location / {
        return  200 "id is $arg_id\n";
    }
}

This is example of how it would work — when a wrongly looking request is encountered, a correction attempt is made with a 301 Moved Permanently response with a supposedly correct Location response header, which would make the browser automatically re-issue the request to the newly provided location:

opti# curl -6v "http://localhost/default/Site%3Fid%3D13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site%3Fid%3D13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:25 GMT
< Content-Type: text/html
< Content-Length: 184
< Location: http://localhost/default/Site?id=13
< Connection: keep-alive
<
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.4.1</center>
</body>
</html>
* Connection #0 to host localhost left intact
* Closing connection #0

Note that no correction attempts are made on proper-looking requests:

opti# curl -6v "http://localhost/default/Site?id=13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site?id=13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:30 GMT
< Content-Type: application/octet-stream
< Content-Length: 9
< Connection: keep-alive
<
id is 13
* Connection #0 to host localhost left intact
* Closing connection #0
The Surrican

The URL is perfectly valid. The escaped characters it contains are just that, escaped. Which is perfectly fine.

The purpose is that you can actually have a request name (in most cases corresponding to the filename on the disk) that is Site?id=13 and not Site and the rest as the query string.

I would consider it bad practice to have characters in a filename that makes this necessary. However, in URL arguments it may very well be necessary.

Nevertheless, the request URL is valid, and probably not what you want it to be. Which consequently suggest that you should correct the error wherever anybody has picked up the wrong URL in the first place.

I do not really understand why you get an error 400; you should rather get an error 404. But that depends on your setup.

There are also cases, especially with nginx, that mostly involve passing on whole URLs and URL parts along multiple levels (for example reverse proxies, matching regular expressions from the URL and using them as variables, etc.) where such an error may occur. But to verify this and fix it we would need to know more about your setup.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!