问题
using System;
namespace UnicodeRlm
{
class Program
{
static void Main(string[] args)
{
var uri = new Uri(
"https://example.com/attachments/The title is \"مفتاح معايير الويب!\" in Arabic.pdf");
Console.WriteLine(uri.AbsolutePath);
Console.WriteLine(uri.AbsolutePath.Length);
}
}
}
Under .NET 4.0, this produces
/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%E2%80%8F%22%20in%20Arabic.pdf
168
Under .NET 4.5+, this produces
/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%22%20in%20Arabic.pdf
159
.NET 4.5 drops the %E2%80%8F
part, which is the RLM character:
...!%E2%80%8F%22%20in%20Arabic.pdf
...!%22%20in%20Arabic.pdf
I have a hypothesis that this is caused by System.Uri escaping now supports RFC 3986, but my RFC-fu and Unicode-fu are failing me as to whether this RFC requires RLM to be dropped or wither this RLM character is placed correctly at all in the original string.
I'm not entirely sure whether this is the correct behavior standards-wise, but for me it's certainly not since I cannot download a file with an RLM character in the name in .NET 4.5 neither with WebClient
nor with HttpWebRequest
.
Is there any way to work around this quirk?
回答1:
In .Net 4.5 International Resource Identifier support was enabled by default. When targeting .Net 4.7.2 the right-to-left mark seems to be honored again, this could indicate there was a bug.
If the project needs to target .Net 4.5, the method ToggleIDNIRISupport in this post can help to overcome the issue.
Call the method like this:
ToggleIDNIRISupport(false);
When constructing the URI after this method call, it contains the right-to-left mark.
来源:https://stackoverflow.com/questions/65805812/system-uri-drops-unicode-rlm-right-to-left-mark-u200f-character-in-net-4-5