trouble with utf-8 chars & apache2 rewrite rules

前端 未结 6 1979
南方客
南方客 2021-02-19 23:46

I see the post validating utf-8 in htaccess rewrite rule and I think that is great, but a more fundamental problem I am having first:

I needed to expand to handle utf-

6条回答
  •  广开言路
    2021-02-20 00:29

    I'd suggest you activate MultiViews and forget mod_rewrite. Add to your apache configuration in the relevant Directory/VirtualHost section:

    Options +MultiViews
    #should already be set to this, but it doesn't hurt:
    AcceptPathInfo Default
    

    No you can always omit the extensions as long as the client includes the correspondent mime type in its Accept header.

    Now a request for /puzzle/whatever will map to /puzzle.php and $_SERVER['PATH_INFO'] will be filled with /whatever.


    If you want to do it with mod_rewrite it's also possible. The test string for RewriteRule is unescaped (the %xx portions are converted to the actual bytes they represent). You can get the original escaped string using %{REQUEST_URI} or %{THE_REQUEST} (the last one also contains the HTTP method and version).

    By convention, web browsers use UTF-8 encoding in URLs. This means that "México" will be urlencoded to M%C2%82xico, not M%82xico, which would be expected if the browsers used ISO-8859-1. Also, [a-zA-Z] will not match é. However, this should work:

    RewriteCond %{REQUEST_URI} ^/puzzle/[^/]*$
    RewriteRule ^/puzzle/(.*)$ /puzzle.php?q=$1 [B,L]
    

    You need B to escape the backreference because you're using it in a query string, in which the set of characters that are allowed is smaller than for the rest of the URI.

    The thing you should be aware of is that RewriteRule is not unicode aware. Anything other than .* can give (potentially) incorrect results. Even [^/] may not work because the / "character" (read: byte) may be part of a multi-byte character sequence. If RewriteRule were unicode aware, your solution with \w should work.

    Since you do not want to match subdirectories, and RewriteRule ^/puzzle/[^/]* is not an option, that check is deferred to a RewriteCond that uses the (escaped) %{REQUEST_URI}.

提交回复
热议问题