The file in question is not under my control. Most byte sequences are valid UTF-8, it is not ISO-8859-1 (or an other encoding). I want to do my best do extract as much informat
One way would be to read the first few bytes to check Byte Order Mark (if exists). More information on BOM: http://en.wikipedia.org/wiki/Byte_order_mark In the given url, you will find a table of the BOM bytes. However, one problem is, UTF-8 does not require to use BOM in its' header. There is another way to solve the problem is by pattern recognition (read few bytes-8 bits each time). Anyway, this is the complicated solution..