I am interested in knowing why \'%20\' is used as a space in URLs, particularly why %20 was used and why we even need it in the first place.
It's called percent encoding. Some characters can't be in a URI (for example #
, as it denotes the URL fragment), so they are represented with characters that can be (#
becomes %23
)
Here's an excerpt from that same article:
When a character from the reserved set (a "reserved character") has special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded. Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits. The digits, preceded by a percent sign ("%") which is used as an escape character, are then used in the URI in place of the reserved character. (For a non-ASCII character, it is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.)
The space character's character code is 32
:
> ' '.charCodeAt(0)
32
Which is 20
in base-16:
> ' '.charCodeAt(0).toString(16)
"20"
Tack a percent sign in front of it and you get %20
.
It uses percent encoding. You can see the Percent Encoding part of the RFC for Uniform Resource Identifier (URI): Generic Syntax
A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component. A percent-encoded octet is encoded as a character
triplet, consisting of the percent character "%" followed by the two
hexadecimal digits representing that octet's numeric value. For
example, "%20" is the percent-encoding for the binary octet
"00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
character (SP).
Because URLs have strict syntactic rules, like /
being a special path separator character, spaces not being allowed in a URL and all characters having to be a certain subset of ASCII. To embed arbitrary characters in URLs regardless of these restrictions, bytes can be percent encoded. The byte x20
represents a space in the ASCII encoding (and most other encodings), hence %20
is the URL-encoded version of it.