Nginx location match regex for special characters and encoded url characters

主宰稳场 提交于 2019-12-10 11:26:24

问题


I've been trying so many things today and I am just not winning. I have one file in my site which got created by accident with a special character in it. As a result Googlebot has stopped crawling for 3 weeks now and Webmaster tools / Search console keeps notifying me and wanting to retest the url.

All I want to achieve is to configure Nginx to match the following requests and redirect them to the correct location but regex has me stumped on this one.

The unencoded URL string is:

/historical-rainfall-trends-south-africa-1921–2015.pdf

The encoded URL string is:

/historical-rainfall-trends-south-africa-1921%C3%A2%E2%82%AC%E2%80%9C2015.pdf

How can I get a location match for these?

UPDATE:

Still losing my mind, nothing I have tried is working. I get a match with this regex here - https://regex101.com/r/3Lk2zr/3

but then using this

location ~ /.*[^\x00-\x7F]+.* { return 444; }

still gives me a 404 and not a 444

Likewise I get a match with this - https://regex101.com/r/80KWJ8/1 But then

location ~ /.*([^?]*)\%(.*)$ { return 444; }

Gives 404 and not 444 😭

Also tried this but still no work. Sourced from: https://serverfault.com/questions/656096/rewriting-ascii-percent-encoded-locations-to-their-utf-8-encoded-equivalent

location ~* (*UTF8).*([^?]*)\%(.*)$ { return 444; }

location ~* (*UTF8).*[^\x00-\x7F]+.* { return 444; }

Temporary Solution

Thanks to @funilrys and also this How do I redirect all requests that contains a certain string to 404 in nginx?

This works now 100%

location /resources { expires 3h; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=10800'; location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ { expires 3h; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=10800'; } location ~* \.(pdf)$ { expires 30d; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=2592000'; if ($request_uri ~ .*%.*) { return 301 https://example.com/resources/weather-documents/historical-rainfall-trends-south-africa_1921_2015.pdf; } if ($request_uri ~ .*[^\x00-\x7F]+.*) { return 301 https://example.com/resources/weather-documents/historical-rainfall-trends-south-africa_1921_2015.pdf; } }


回答1:


Your solution is terrible, let me tell you why.

Every single request which matches this location block now has to be evaluated against two if conditions before being served.

Any request which matches then gets redirected to the correct url, which also matches this location block so now your server is doing another two evaluations of those if conditions.

Just for fun you are also making Nginx evaluate requests for image, css and js files against your if conditions too. None of them will match as you are worried about a pdf, but you are still adding an extra 200% overhead to the request processing.

A much more Nginx friendly solution is actually very simple.

Nginx does regex matching in the order the location directives are listed in your config and chooses the first matching block, so if this file url will match any of your other regex directives then you need to place this block above those locations:

location ~* /historical-rainfall-trends-south-africa-1921([^_])*?2015\.pdf$ {
    return 301 https://example.com/resources/weather-documents/historical-rainfall-trends-south-africa_1921_2015.pdf;
}

Just tested it on one of my servers running Nginx 1.15.1, works a charm.




回答2:


I don't know about Nginx and the way it handles regex but :

  • You could try to match for percent in the encoded URL with:

    %+

  • You could try to match for the special chars in the encoded URL with:

    (%([A-Z][0-9]|[0-9][A-Z]|[0-9]+|[A-Z]+))+

  • You could try to match for non-ASCII chars in the unencoded URL with:

    [^\x00-\x7F]+

Proofs:

  • https://regex101.com/r/3Lk2zr/2
  • https://regex101.com/r/5c8PpH/2
  • https://regex101.com/r/lRyHgj/2



回答3:


Temporary Solution

Thanks to @funilrys and also this How do I redirect all requests that contains a certain string to 404 in nginx?

This works now 100%

location /resources { expires 3h; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=10800'; location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ { expires 3h; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=10800'; } location ~* \.(pdf)$ { expires 30d; add_header Cache-Control 'must-revalidate, proxy-revalidate, max-age=2592000'; if ($request_uri ~ .*%.*) { return 301 https://example.com/resources/weather-documents/historical-rainfall-trends-south-africa_1921_2015.pdf; } if ($request_uri ~ .*[^\x00-\x7F]+.*) { return 301 https://example.com/resources/weather-documents/historical-rainfall-trends-south-africa_1921_2015.pdf; } }



来源:https://stackoverflow.com/questions/51747175/nginx-location-match-regex-for-special-characters-and-encoded-url-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!