How could I check if string has already been encoded?
For example, if I encode TEST==
, I get TEST%3D%3D
. If I again encode last string, I
In order to avoid encoding twice and generating a bug (as the OP is stating) we unquote and than quote again, in Python this will be:
import urllib.parse
urllib.parse.unquote(str)
urllib.parse.quote(str)
Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).
If you want to be sure that string is encoded correctly (if it needs to be encoded) - just decode and encode it once again.
metacode:
100%_correctly_encoded_string = encode(decode(input_string))
already encoded string will remain untouched. Unencoded string will be encoded. String with only url-allowed characters will remain untouched too.
Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.
I hope one can't write a quine in urlencode, or this algorithm would get stuck.
Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded
Using Spring UriComponentsBuilder:
import java.net.URI;
import org.springframework.web.util.UriComponentsBuilder;
private URI getProperlyEncodedUri(String uriString) {
try {
return URI.create(uriString);
} catch (IllegalArgumentException e) {
return UriComponentsBuilder.fromUriString(uriString).build().toUri();
}
}
Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:
# Returns encoded URL for any given URL after determining whether it is already encoded or not
def escape(url)
unescaped_url = URI.unescape(url)
if (unescaped_url.length < url.length)
return url
else
return URI.escape(url)
end
end