How to find out if string has already been URL encoded?

后端 未结 11 995
死守一世寂寞
死守一世寂寞 2020-11-30 03:46

How could I check if string has already been encoded?

For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I

相关标签:
11条回答
  • 2020-11-30 04:19

    Check your URL for suspicious characters[1]. List of candidates:

    WHITE_SPACE ,", < , > , { , } , | , \ , ^ , ~ , [ , ] , . and `

    I use:

    private static boolean isAlreadyEncoded(String passedUrl) {
            boolean isEncoded = true;
            if (passedUrl.matches(".*[\\ \"\\<\\>\\{\\}|\\\\^~\\[\\]].*")) {
                    isEncoded = false;
            }
            return isEncoded;
    }
    

    For the actual encoding I proceed with:

    https://stackoverflow.com/a/49796882/1485527

    Note: Even if your URL doesn't contain unsafe characters you might want to apply, e.g. Punnycode encoding to the host name. So there is still much space for additional checks.


    [1] A list of candidates can be found in the section "unsafe" of the URL spec at Page 2. In my understanding '%' or '#' should be left out in the encoding check, since these characters can occur in encoded URLs as well.

    0 讨论(0)
  • 2020-11-30 04:22

    You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.

    0 讨论(0)
  • 2020-11-30 04:22

    Thanks to this answer I coded a function (JS Language) that encodes the URL just once with encodeURI so you can call it to make sure is encoded just once and you don't need to know if the URL is already encoded.

    ES6:

    var getUrlEncoded = sURL => {
        if (decodeURI(sURL) === sURL) return encodeURI(sURL)
        return getUrlEncoded(decodeURI(sURL))
    }
    

    Pre ES6:

    var getUrlEncoded = function(sURL) {
        if (decodeURI(sURL) === sURL) return encodeURI(sURL)
        return getUrlEncoded(decodeURI(sURL))
    }
    

    Here are some tests so you can see the URL is only encoded once:

    getUrlEncoded("https://example.com/media/Screenshot27 UI Home.jpg")
    //"https://example.com/media/Screenshot27%20UI%20Home.jpg"
    getUrlEncoded(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
    //"https://example.com/media/Screenshot27%20UI%20Home.jpg"
    getUrlEncoded(encodeURI(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
    //"https://example.com/media/Screenshot27%20UI%20Home.jpg"
    getUrlEncoded(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
    //"https://example.com/media/Screenshot27%20UI%20Home.jpg"
    getUrlEncoded(decodeURI(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
    //"https://example.com/media/Screenshot27%20UI%20Home.jpg"
    
    0 讨论(0)
  • 2020-11-30 04:23

    According to the spec (https://tools.ietf.org/html/rfc3986) all URLs MUST start with a scheme followed by a :

    Since colons are required as the delimiter between a scheme and the rest of the URI, any string that contains a colon is not encoded.

    (This assumes you will not be given an incomplete URI with no scheme.)

    So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI.

    You can make this loop simpler if you know what schemes you can expect.

    0 讨论(0)
  • 2020-11-30 04:24

    Joel on software had a solution for this sometime back - http://www.joelonsoftware.com/articles/Wrong.html
    Or You may add some prefix to the Strings.

    0 讨论(0)
提交回复
热议问题