What's the difference between UTF-8 and UTF-8 without BOM?

前端 未结 21 1485
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-21 05:45

What\'s different between UTF-8 and UTF-8 without a BOM? Which is better?

21条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-21 06:15

    There are at least three problems with putting a BOM in UTF-8 encoded files.

    1. Files that hold no text are no longer empty because they always contain the BOM.
    2. Files that hold text that is within the ASCII subset of UTF-8 is no longer themselves ASCII because the BOM is not ASCII, which makes some existing tools break down, and it can be impossible for users to replace such legacy tools.
    3. It is not possible to concatenate several files together because each file now has a BOM at the beginning.

    And, as others have mentioned, it is neither sufficient nor necessary to have a BOM to detect that something is UTF-8:

    • It is not sufficient because an arbitrary byte sequence can happen to start with the exact sequence that constitutes the BOM.
    • It is not necessary because you can just read the bytes as if they were UTF-8; if that succeeds, it is, by definition, valid UTF-8.

提交回复
热议问题