I have an .URL file which contains the following text which contains a German Umlaut character:
[InternetShortcut]
URL=http://edn.embarcadero.com/ar
The rule of thumb - to read data (file, stream whatever) correctly you must know the encoding! And the best solution is to let user to choose encoding or force one e.g. utf-8.
Moreover, the information ANSI
does make things easier without code page.
A must read - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Other approach is to try to detect encoding (like browsers do with sites if no encoding specified). Detecting UTF is relatively easy if BOM exists, but more often is omitted. Take a look Mozilla's universalchardet or chsdet.