How important is it to save your source code in UTF-8 format?
Eclipse on Windows uses CP1252 character encoding by default. The CP1251 format means non UTF-8 charact
What is your goal? Balance your needs against the pros and cons of this choice.
UTF-8 Pros
\uHHHH
escapingUTF-8 Cons
\uHHHH
increases risk of character corruption
ASCII Pros
ASCII Cons
Note: ASCII is 7-bit, not "extended" and not to be confused with Windows-1252, ISO 8859-1, or anything else.
Important is at least that you need to be consistent with the encoding used to avoid herrings. Thus not, X here, Y there and Z elsewhere. Save source code in encoding X. Set code input to encoding X. Set code output to encoding X. Set characterbased FTP transfer to encoding X. Etcetera.
Nowadays UTF-8
is a good choice as it covers every character the human world is aware of and is pretty everywhere supported. So, yes, I would set workspace encoding to it as well. I also use it so.
Eclipse's default setting of using the platform default encoding is a poor decision IMHO. I found it necessary to change the default to UTF-8 shortly after installing it because some of my existing source files used it (probably from snippets copied/pasted from web pages.)
The Java Language and API specs require UTF-8 support so you're definitely okay as far as the standard tools go, and it's a long time since I've seen a decent editor that did not support UTF-8.
Even in projects that use JNI, your C sources will normally be in US-ASCII which is a subset of UTF-8 so having both open in the same IDE will not be a problem.
I don't think there's really a straight yes or no answer to this question. I would say that the following guidelines should be used to pick an encoding format, in order of priority listed (highest to lowest):
1) Pick an encoding your tool chain supports. This is a lot easier than it used to be. Even in recent memory a lot of compilers and languages basically only supported ASCII, which more or less forced developers into coding in Western European languages. These days, many of the newer languages support other encodings, and almost all decent editors and IDEs support a tremendously long list of encodings. Still... there are just enough holdouts that you need to double check before you settle on an encoding.
2) Pick an encoding that supports as many of the alphabets you wish to use as possible. I place this as a secondary priority because frankly, if your tools don't support it it doesn't really matter whether you like the encoding better or not.
UTF-8 is an excellent choice in many circumstances of today's world. It's an ugly, inelegant format, but it solves a whole host of problems (namely dealing with legacy code) that break other encodings, and it seems to becoming more and more the de facto standard of character encodings. It supports every major alphabet, darn near every editor on the planet supports it now, and a whole host of languages/compilers support it, too. But as I mentioned above, there are just enough legacy holdouts that you need to double check your tool chain from end to end before you settle on it definitively.
Yes, unless your compiler/interpreter is not able to work with UTF-8 files, it is definitely the way to go.