java: how to convert a file to utf8

后端 未结 4 1029
一向
一向 2020-12-09 11:42

i have a file that have some non-utf8 caracters (like \"ISO-8859-1\"), and so i want to convert that file (or read) to UTF8 encoding, how i can do it?

The code it\'s

相关标签:
4条回答
  • 2020-12-09 11:59

    You only want to read it as UTF-8? What I did recently given a similar problem is to start the JVM with -Dfile.encoding=UTF-8, and reading/printing as normal. I don't know if that is applicable in your case.

    With that option:

    System.out.println("á é í ó ú")
    

    prints correctly the characters. Otherwise it prints a ? symbol

    0 讨论(0)
  • 2020-12-09 12:04
      String charset = "ISO-8859-1"; // or what corresponds
      BufferedReader in = new BufferedReader( 
          new InputStreamReader (new FileInputStream(file), charset));
      String line;
      while( (line = in.readLine()) != null) { 
        ....
      }
    

    There you have the text decoded. You can write it, by the simmetric Writer/OutputStream methods, with the encoding you prefer (eg UTF-8).

    0 讨论(0)
  • 2020-12-09 12:05

    The following code converts a file from srcEncoding to tgtEncoding:

    public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
        BufferedReader br = null;
        BufferedWriter bw = null;
        try{
            br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
            bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
            char[] buffer = new char[16384];
            int read;
            while ((read = br.read(buffer)) != -1)
                bw.write(buffer, 0, read);
        } finally {
            try {
                if (br != null)
                    br.close();
            } finally {
                if (bw != null)
                    bw.close();
            }
        }
    }
    

    --EDIT--

    Using Try-with-resources (Java 7):

    public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
        try (
          BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
          BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
              char[] buffer = new char[16384];
              int read;
              while ((read = br.read(buffer)) != -1)
                  bw.write(buffer, 0, read);
        } 
    }
    
    0 讨论(0)
  • 2020-12-09 12:24

    You need to know the encoding of the input file. For example, if the file is in Latin-1, you would do something like this,

            FileInputStream fis = new FileInputStream("test.in");
            InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
            Reader in = new BufferedReader(isr);
            FileOutputStream fos = new FileOutputStream("test.out");
            OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
            Writer out = new BufferedWriter(osw);
    
            int ch;
            while ((ch = in.read()) > -1) {
                out.write(ch);
            }
    
            out.close();
            in.close();
    
    0 讨论(0)
提交回复
热议问题