Do certain characters take more bytes than others?

回眸只為那壹抹淺笑 提交于 2019-11-28 11:36:36

It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

  • In ASCII or ISO 8859, each character is represented by one byte
  • In UTF-32, each character is represented by 4 bytes
  • In UTF-8, each character uses between 1 and 4 bytes
  • In ISO 2022, it's much more complicated

US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.

It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.

The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.

If you choose a char from the ASCII set, then yes your delimter is a small as possible.

No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).

A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.

You can find an ASCII chart and what numbers correspond to what characters here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!