问题
Intro
I am using Runtime.exec() to execute some external command and I am using parameters that contain non-English characters. I simply want to run something like this:
python test.py шалом
It works correctly in cmd directly, but is incorrectly handled via Runtime.exec.getRuntime()("python test.py шалом")
On Windows my external program fails due to unknown symbols passed to it.
I remember similar issue from early 2010s (!) - JDK-4947220, but I thought it is already fixed since Java core 1.6.
Environments:
OS: Name Microsoft Windows 10 Pro (Version 10.0.18362 Build 18362)
Java: jdk1.8.0_221
Code
To understand the question - the best way is to use code snippet listed below:
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class MainClass {
private static void foo(String filename) {
try {
BufferedReader input = new BufferedReader(
new InputStreamReader(
Runtime.getRuntime().exec(filename).getInputStream()));
String line;
while ((line = input.readLine()) != null) {
System.out.println(line);
}
input.close();
} catch (Exception e) { /* ... */ }
}
public static void main(String[] args) {
foo("你好.bat 你好"); // ??
foo("привет.bat привет"); // ??????
foo("hi.bat hi"); // hi
}
}
Where .bat file contains only simple @echo %1
The output will be:
??
??????
hi
PS
System.out.println("привет")
- works fine and prints everything correctly
Questions are the following:
1) Is this issue related to Utf-8 utf-16 formats?
2) How to fix this issue? I do not like this answer as it looks like a very dangerous and ugly workaround.
3) Does anyone know why file names of batch file is not broken and this file can be found, but the argument gets broken? May be it is problem of @echo
?
回答1:
Yes, issue is related with UTF. Theoretically a setting 65001 codepage for
cmd
that executes the bat files should solve the issue (along with setting UTF-8 charset as default from the Java side)Unfortunately there a bug in Windows mentioning here Java, Unicode, UTF-8, and Windows Command Prompt
So there's no simple and complete solution. What it's possible to do is to set the same default language-specific encoding, like cp1251 Cyrillic, for both
java
andcmd
. Not all languages are well reflected in the windows encodings, for example Chinese is one of them.
If there's some non-technical restriction on the windows system to change default encoding to the language-specific one for all cmd
processes, the java code will be more complicated. At beginning new cmd process have to be created and to its stdin/stdout streams should be attached reader with UTF-16LE (for `cmd /U' process) and writer with CP1251 from different threads. First command sending to stdin from java should be 'chcp 1251' and second is the name of bat-file with its parameters.
Complete solution still may use UTF-16LE for reading of cmd output but to pass a text in, other universal encoding should be used, for example base64, which again leads to increasing complexity
来源:https://stackoverflow.com/questions/59986078/java-runtime-exec-and-unicode-symbols-on-windows-how-to-make-it-work-with-no