问题
We've run into an odd argument where I work, and I may be wrong on this, so this is why I am asking.
Our software outputs a directory to an Apache server that replaces an underscore with a %5F in the name of the directory.
For instance if the name of the directory was listed as a string in our software it would be: "andy_test", but then when the software outputs the directory to the Apache server, it would become "andy%5Ftest". Unfortunately, when you access the url on the server it ends up becoming "andy%255Ftest".
Somehow this seems wrong to me, once again the progression is:
- andy_test <- (as a string in the software)
- andy%5Ftest <- (listed as a directory on the server)
- andy%255Ftest <- (must be used when calling the same directory as a URL on the server from a web browser.)
I'm assuming that "%5" is encoding for underscore, and that "%25" is encoding for "%".
Now it would seem to me that the way that the directory name should be listed on the server would be just plain andy_test and if you were using an encoded URI then maybe you would end up with the "andy%5Ftest" to access the directory on the apache server.
I asked the guys on the backend about it, and they said that they were just: "encoding anything that was not a letter or a number.
So I guess I'm a bit confused on this. Can you tell me who is right, and direct me to some information on why?
回答1:
You should not encode the directory names as you create them (as you suggested). Encoding should only happen at the last stage where it is handed out to the browser. That's why you are ending up with 'double' encoding: %25 is % and 5F is the leftover from the first encoding of underscore.
Also, note that you don't need to encode underscores according to rfc1738.
2.2. URL Character Encoding Issues
...
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
回答2:
There is double encoding happening in what you are showing. Two steps should be enough:
andy_test
is both the string in the software and the actual name of the directory or script in the filesystem (the resource the web server accesses)
andy%5Ftest
is andy_test
URL encoded. This string should the browser use (it's not really needed in the underscore case, but may be in other cases).
andy%255ftest
is just andy_test
URL encoded twice, which makes no sense, there should be no need to. Just decide WHERE you will do the encoding. If you do it both at the code level and at the webserver level this is what can happen and the result is broken links unless you are decoding two times again, which is not really needed nor sane.
来源:https://stackoverflow.com/questions/2222519/url-encoding-with-underscores-in-a-directory-name