I am trying to convert a Shift_JIS formatted file into UTF-8 format. For this, below is my approach:
The answer @VicJordan posted is not correct. When you call getBytes()
, you are getting the raw bytes of the string encoded under your system's native character encoding (which may or may not be UTF-8). Then, you are treating those bytes as if they were encoded in UTF-8, which they might not be.
A more reliable approach would be to read the Shift_JIS file into a Java String. Then, write out the Java String using UTF-8 encoding.
InputStream in = ...
Reader reader = new InputStreamReader(in, "Shift_JIS");
StringBuilder sb = new StringBuilder();
int read;
while ((read = reader.read()) != -1){
sb.append((char)read);
}
reader.close();
String string = sb.toString();
OutputStream out = ...
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(string);
writer.close();