Does Java have methods to get the various byte order marks?

大兔子大兔子 提交于 2019-12-01 05:22:28

问题


I am looking for a utility method or constant in Java that will return me the bytes that correspond to the appropriate byte order mark for an encoding, but I can't seem to find one. Is there one? I really would like to do something like:

byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark();

Where CharEncoding comes from Apache Commons.


回答1:


Java does not recognize byte order marks for UTF-8. See bugs 4508058 and 6378911.

The gist is that support was added, broke backwards compatibility, and was rolled back. You'll have to do BOM recognition in UTF-8 yourself.




回答2:


Apache Commons IO contains what you are looking for, see org.apache.commons.io.ByteOrderMark.




回答3:


You can generate the BOM like this:

byte[] utf8_bom = "\uFEFF".getBytes("UTF-8");
byte[] utf16le_bom = "\uFEFF".getBytes("UnicodeLittleUnmarked");

If you wish to create the BOMs for other encodings using this method, make sure you use the version of the encoding that does not automatically insert the BOM or it will be repeated. This technique only applies to Unicode encodings and will not produce meaningful results for others (like Windows-1252).

  • Unicode BOM FAQ
  • Sun Java 6 supported encodings
  • Sun Java 5 supported encodings



回答4:


There isn't anything in the JDK as far as I can see, nor any of the Apache projects.

Eclipse EMF has an Enum however that provides support:

org.eclipse.emf.ecore.resource.ContentHandler.ByteOrderMark

I'm not sure whether that's of any help to you?

There's some more info here on the various BOM's for each encoding type, you could write a simple helper class or enum for this...

http://mindprod.com/jgloss/bom.html

Hope that helps. I'm surprised this isn't in Commons I/O to be honest.




回答5:


It worth noting that many encodings don't use any byte order marks. e.g. an empty string in UTF-8 is just an empty byte[]. While there is a BOM specified for UTF-8 it is rarely used in Java and is not always supported.



来源:https://stackoverflow.com/questions/712004/does-java-have-methods-to-get-the-various-byte-order-marks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!