Is array syntax using square brackets in URL query strings valid?

后端 未结 6 917
遥遥无期
遥遥无期 2020-12-03 01:21

Is it actually safe/valid to use multidimensional array synthax in the URL query string?

http://example.com?abc[]=123&abc[]=456

It seem

相关标签:
6条回答
  • 2020-12-03 01:26

    According to RFC 3986, the Query component of an URL has the following grammar:

    *( pchar / "/" / "?" )
    

    From appendix A of the same RFC:

    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
    [...]
    pct-encoded   = "%" HEXDIG HEXDIG
    
    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
    [...]    
    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="
    

    My interpretation of this is that anything that isn't:

     ALPHA / DIGIT / "-" / "." / "_" / "~" / 
         "!" / "$" / "&" / "'" / "(" / ")" / 
         "*" / "+" / "," / ";" / "=" / ":" / "@"
    

    ...should be pct-encoded, i.e percent-encoded. Thus [ and ] should be percent-encoded to follow RFC 3986.

    0 讨论(0)
  • 2020-12-03 01:29

    I'd ideally like to comment on Ethan's answer really, but don't have sufficient reputation to do it.

    I'm not sure that the relevant part of the WHATWG URL standard is being referenced here. I think the correct part might be in the definition of a valid URL-query string, which it describes as being composed of URL units that themselves are formed from URL code points and percent-encoded bytes. Square brackets are listed within URL code points and thus fall into the percent-encoded bytes category.

    Thus, in answer to the original question, multidimensional array syntax (i.e. using square brackets to represent array indexing) within the query part of the URL is valid, provided the square brackets are percent encoded (as %5B for [ and %5D for ]).

    0 讨论(0)
  • 2020-12-03 01:37

    My understanding that square brackets are not first-class citizens anyway. Here is the quote: http://tools.ietf.org/html/rfc1738

    Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

    0 讨论(0)
  • 2020-12-03 01:44

    I always had a temptation to go for that sort of query when I had to pass an array, but I steered away from it. The reason being:

    • It is not cleared defined in RFC.
    • Different languages may interpret it differently.

    You have a couple of options to pass an array:

    • Encode the string representation of the array(JSON may be?)
    • Have parameters like "val1=blah&val2=blah&.." or something like that.

    And if you are sure about the language you are using, you can (safely) go for the kind of query string you have (Just that you need to %-encode [] also).

    0 讨论(0)
  • 2020-12-03 01:48

    The answer is not simple.

    The following is extracted from section 3.2.2 of RFC 3986 :

    A host identified by an Internet Protocol literal address, version 6
    [RFC3513] or later, is distinguished by enclosing the IP literal
    within square brackets ("[" and "]"). This is the only place where
    square bracket characters are allowed in the URI syntax.

    This seems to answer the question by flatly stating that square brackets are not allowed anywhere else in the URI. But there is a difference between a square bracket character and a percent encoded square bracket character.

    The following is extracted from the beginning of section 3 of RFC 3986 :

    1. Syntax Components

      The generic URI syntax consists of a hierarchical sequence of
      components referred to as the scheme, authority, path, query, and
      fragment.

      URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

    So the "query" is a component of the "URI".

    The following is extracted from section 2.2 of RFC 3986 :

    2.2. Reserved Characters

    URIs include components and subcomponents that are delimited by
    characters in the "reserved" set. These characters are called
    "reserved" because they may (or may not) be defined as delimiters by
    the generic syntax, by each scheme-specific syntax, or by the
    implementation-specific syntax of a URI's dereferencing algorithm.
    If data for a URI component would conflict with a reserved
    character's purpose as a delimiter, then the conflicting data must
    be percent-encoded before the URI is formed.

      reserved    = gen-delims / sub-delims
    
      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    
      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
    

    So square brackets may appear in a query string, but only if they are percent encoded. Unless they aren't, to be explained further down in section 2.2 :

    URI producing applications should percent-encode data octets that
    correspond to characters in the reserved set unless these characters
    are specifically allowed by the URI scheme to represent data in that
    component. If a reserved character is found in a URI component and
    no delimiting role is known for that character, then it must be
    interpreted as representing the data octet corresponding to that
    character's encoding in US-ASCII.

    So because square brackets are only allowed in the "host" subcomponent, they "should" be percent encoded in other components and subcomponents, and in this case in the "query" component, unless RFC 3986 explicitly allows unencoded square brackets to represent data in the query component, which is does not.

    However, if a "URI producing application" fails to do what it "should" do, by leaving square brackets unencoded in the query, then readers of the URI are not to reject the URI outright. Instead, the square brackets are to be considered as belonging to the data of the query component, since they are not used as delimiters in that component.

    This is why, for example, it is not a violation of RFC 3986 when PHP accepts both unencoded and percent encoded square brackets as valid characters in a query string, and even assigns to them a special purpose. However, it would appear that authors who try to take advantage of this loophole by not percent encoding square brackets are in violation of RFC 3986.

    0 讨论(0)
  • 2020-12-03 01:49

    David N. Jafferian's answer is fantastic. I just want to add a couple updates and practical notes:

    1. For many years, every browser has left square brackets in query strings unencoded when submitting the request to the server. (Source: https://bugzilla.mozilla.org/show_bug.cgi?id=1152455#c6). As such, I imagine a huge portion of the web has come to rely on this behavior, which makes it extremely unlikely to change.

    2. My reading of the WHATWG URL standard which, at least for web purposes, can be seen as superseding RFC 3986, is that it codifies this behavior of not encoding [ and ] in query strings. I believe the relevant portion is: https://url.spec.whatwg.org/#query-state, which makes no reference about percent-encoding those characters.

    0 讨论(0)
提交回复
热议问题