Some legacy code relies on the platform\'s default charset for translations. For Windows and Linux installations in the \"western world\" I know what that means. But thinkin
For Windows and Linux installations in the "western world" I know what that means.
Probably not as well as you think.
But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is
Usually it's whatever encoding is historically used in their country.
(just UTF-16?).
Most definitely not. Computer usage spread widely before the Unicode standard existed, and each language area developed one or more encodings that could support its language. Those who needed less than 128 characters outside ASCII typically developed an "extended ASCII", many of which were eventually standardized as ISO-8859, while others developed two-byte encodings, often several competing ones. For example, in Japan, emails typically use JIS, but webpages use Shift-JIS, and some applications use EUC-JP. Any of these might be encountered as the platform default encoding in Java.
It's all a huge mess, which is exactly why Unicode was developed. But the mess has not yet disappeared and we still have to deal with it and should not make any assumptions about what encoding a given bunch of bytes to be interpreted as text are in. There Ain't No Such Thing as Plain Text.
That's a user specific setting. On many modern Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's often CP1250, in Europe it's CP1252. In China, you often find simplified chinese (Big5 or a GB*).
But that’s the system default, which each user can change at any time. Which is probably the solution: Set the encoding when you start your app using the system property file.encoding
See this answer how to do that. I suggest to put this into a small script which starts your app, so the user default isn't tainted.