Say I have a URL
and I have a query entered by the user such as:
random word £500 bank
URL url= new URL(" word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
What is happening here?
1. Split URL into structural parts. Use
for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere)
to Punycode encode the host name!
4. Use
to percent-encode, NFC encoded unicode - (better would be NFKC!). For more info see: How to encode properly this URL
In some cases it is advisable to check if the url is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
"in" : "http://نامهای.com/",
"out" : ""
"in" : "‥/foo",
"out" : ""
"in" : " book.pdf",
"out" : ""
}, {
"in" : " word £500 bank $",
"out" : "$"
The solution passes around 100 of the testcases provided by Web Plattform Tests.