If only deal with url encoding, I should use EscapeUriString?
A simple example
var data = "example.com/abc?DEF=あいう\x20えお";
Console.WriteLine(Uri.EscapeUriString(data));
Console.WriteLine(Uri.EscapeDataString(data));
Console.WriteLine(System.Net.WebUtility.UrlEncode(data));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(data));
/*
=>
example.com/abc?DEF=%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86+%E3%81%88%E3%81%8A
example.com%2fabc%3fDEF%3d%e3%81%82%e3%81%84%e3%81%86+%e3%81%88%e3%81%8a
*/
The plus (+) characters can reveal a lot about the difference between these methods. In a simple URI, the plus character means "space". Consider querying Google for "happy cat":
https://www.google.com/?q=happy+cat
That's a valid URI (try it), and EscapeUriString
will not modify it.
Now consider querying Google for "happy c++":
https://www.google.com/?q=happy+c++
That's a valid URI (try it), but it produces a search for "happy c", because the two pluses are interpreted as spaces. To fix it, we can pass "happy c++" to EscapeDataString
and voila*:
https://www.google.com/?q=happy+c%2B%2B
*)The encoded data string is actually "happy%20c%2B%2B"; %20 is hex for the space character, and %2B is hex for the plus character.
If you're using UriBuilder
as you should be, then you'll only need EscapeDataString
to properly escape some of the components of your entire URI. @Livven's answer to this question further proves that there really is no reason to use EscapeUriString
.
Use EscapeDataString
always (for more info about why, see Livven's answer below)
Edit: removed dead link to how the two differ on encoding
Comments in the source address the difference clearly. Why this info isn't brought forward via XML documentation comments is a mystery to me.
EscapeUriString:
This method will escape any character that is not a reserved or unreserved character, including percent signs. Note that EscapeUriString will also do not escape a '#' sign.
EscapeDataString:
This method will escape any character that is not an unreserved character, including percent signs.
So the difference is in how they handle reserved characters. EscapeDataString
escapes them; EscapeUriString
does not.
According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=
For completeness, the unreserved characters are alphanumeric and -._~
Both methods escape characters that are neither reserved nor unreserved.
I disagree with the general notion that EscapeUriString
is evil. I think a method that escapes only illegal characters (such as spaces) and not reserved characters is useful. But it does have a quirk in how it handles the %
character. Percent-encoded characters (%
followed by 2 hex digits) are legal in a URI. I think EscapeUriString
would be far more useful if it detected this pattern and avoided encoding %
when it's immediately proceeded by 2 hex digits.
I didn't find the existing answers satisfactory so I decided to dig a little deeper to settle this issue. Surprisingly, the answer is very simple:
There is (almost) no valid reason to ever use Uri.EscapeUriString
. If you need to percent-encode a string, always use Uri.EscapeDataString
.*
* See the last paragraph for a valid use case.
Why is this? According to the documentation:
Use the EscapeUriString method to prepare an unescaped URI string to be a parameter to the Uri constructor.
This doesn't really make sense. According to RFC 2396:
A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.
While the quoted RFC has been obsoleted by RFC 3986, the point still stands. Let's verify it by looking at some concrete examples:
You have a simple URI, like this:
http://example.org/
Uri.EscapeUriString
won't change it.
You decide to manually edit the query string without regard for escaping:
http://example.org/?key=two words
Uri.EscapeUriString
will (correctly) escape the space for you:
http://example.org/?key=two%20words
You decide to manually edit the query string even further:
http://example.org/?parameter=father&son
However, this string is not changed by Uri.EscapeUriString
, since it assumes the ampersand signifies the start of another key-value pair. This may or may not be what you intended.
You decide that you in fact want the key
parameter to be father&son
, so you fix the previous URL manually by escaping the ampersand:
http://example.org/?parameter=father%26son
However, Uri.EscapeUriString
will escape the percent character too, leading to a double encoding:
http://example.org/?parameter=father%2526son
As you can see, using Uri.EscapeUriString
for its intended purpose makes it impossible to use &
as part of a key or value in a query string instead of as a separator between multiple key-value pairs.
This is because, in an attempt at making it suitable for escaping full URIs, it ignores reserved characters and only escapes characters that are neither reserved nor unreserved, which, BTW, is contrary to the documentation. This way you don't end up with something like http%3A%2F%2Fexample.org%2F
, but you do end up with the issues illustrated above.
In the end, if your URI is valid, it does not need to be escaped to be passed as a parameter to the Uri constructor, and if it's not valid then calling Uri.EscapeUriString
isn't a magic solution either. Actually, it will work in many if not most cases, but it is by no means reliable.
You should always construct your URLs and query strings by gathering the key-value pairs and percent-encoding and then concatenating them with the necessary separators. You can use Uri.EscapeDataString
for this purpose, but not Uri.EscapeUriString
, since it doesn't escape reserved characters, as mentioned above.
Only if you cannot do that, e.g. when dealing with user-provided URIs, does it make sense to use Uri.EscapeUriString
as a last resort. But the previously mentioned caveats apply – if the user-provided URI is ambiguous, the results may not be desirable.