How to encode special characters using mod_rewrite & Apache?

前端 未结 5 1089
逝去的感伤
逝去的感伤 2020-11-29 07:38

I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =

相关标签:
5条回答
  • 2020-11-29 07:44

    I finally made it work with the help of RewriteMap.

    Added the escape map in httpd.conf file RewriteMap es int:escape

    and used it in Rewrite rule

    RewriteRule ([^?.]*) /abc?arg1=${es:$1}&country_sniff=true [L]
    
    0 讨论(0)
  • 2020-11-29 07:45

    The underlying problem is that you are moving from a request that has one encoding (specifically, a plus sign is a plus sign) into a request that has different encoding (a plus sign represents a space). The solution is to bypass the decoding that mod_rewrite does and convert your path directly from the raw request to the query string.

    To bypass the normal flow of the rewrite rules, we’ll load the raw request string directly into an environment variable and modify the environment variable instead of the normal rewrite path. It will already be encoded, so we don't generally need to worry about encoding it when we move it to the query string. What we do want, however, is to percent-encode the plus signs so that they are properly relayed as plus signs and not spaces.

    The rules are incredibly simple:

    RewriteEngine On
    
    RewriteRule ^script.php$ - [L]
    
    # Move the path from the raw request into _rq
    RewriteCond %{ENV:_rq} =""
    RewriteCond %{THE_REQUEST} "^[^ ]+ (/path/[^/]+/[^? ]+)"
    RewriteRule .* - [E=_rq:%1]
    
    # encode the plus signs (%2B)  (Loop with [N])
    RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)\+(.*)$"
    RewriteRule .* - [E=_rq:/path/%1/%2\%2B%3,N]
    
    # finally, move it from the path to the query string
    # ([NE] says to not re-code it)
    RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)$"
    RewriteRule .* /path/script.php?%1=%2 [NE]
    

    This trivial script.php confirms that it works:

    <input readonly type="text" value="<?php echo $_GET['tag']; ?>" />
    
    0 讨论(0)
  • 2020-11-29 07:57

    I meet the similar problem for mod_rewrite with + sign in url. The scenario like below:

    we have a url with + sign need rewrite like http://deskdomain/2013/08/09/a+b+c.html

    RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1

    The struts action urlRedirect get url parameter, do some change and using the url for another redirect. But in req.getParameter("url") the + sign change to empty, parameter url content is http://deskdomain/2013/08/09/a b c.html , that cause redirect 404 not found. For resolve it (get help from prior answer)we use rewrite flag B (escape backreferences), and NE (noescape)

    RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1 [B,NE]

    The B , will escape + to %2B , NE will prevent mod_write escape %2B to %252B (double escape + sign), so in req.getParameter("url")=http://deskdomain/2013/08/09/a+b+c.html

    I think the reason is req.getParameter("url") will do a unescape for us, the + sign can unescape to empty. You can try unescape %2B one time to + , then unescape + again to empty.

    "%2B" unescape-> "+" unescape-> " "

    0 讨论(0)
  • 2020-11-29 08:02

    I'm not sure I understand what you're asking, but the NE (noescape) flag to Apache's RewriteRule directive might be of some interest to you. Basically, it prevents mod_rewrite from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is

    RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]
    

    which will turn, for example, /foo/zed into a redirect to /bar/arg=P1%3dzed, so that the script /bar will then see a query parameter named arg with a value P1=zed, if it looks in its PATH_INFO (okay, that's not a real query parameter, so sue me ;-P).

    At least, I think that's how it works . . . I've never used that particular flag myself.

    0 讨论(0)
  • 2020-11-29 08:03

    The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.

    I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.

    So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).

    So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.

    Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!

    RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]
    
    0 讨论(0)
提交回复
热议问题