How can I open files containing accents in Java?

前端未结

关注

 6  1589

傲寒

(editing for clarification and adding some code)

Hello, We have a requirement to parse data sent from users all over the world. Our Linux systems have a de

相关标签:

6条回答

忘掉有多难

2020-12-01 19:05

In the DirectoryStream usage then don't forget to close the stream (try-with-resources can help here)

0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-12-01 19:15
The Java system property file.encoding should match the console's character encoding. The property must be set when starting java on the command-line:
```
java -Dfile.encoding=UTF-8 …
```
Normally this happens automatically, because the console encoding is usually the platform default encoding, and Java will use the platform default encoding if you don't specify one explicitly.
0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-01 19:20
First, the character encoding used is not directly related to the locale. So changing the locale won't help much.

Second, the ï¿½ is typical for the Unicode replacement character U+FFFD � being printed in ISO-8859-1 instead of UTF-8. Here's an evidence:
```
System.out.println(new String("�".getBytes("UTF-8"), "ISO-8859-1")); // ï¿½
```
So there are two problems:
1. Your JVM is reading those special characters as �.
2. Your console is using ISO-8859-1 to display characters.
For a Sun JVM, the VM argument -Dfile.encoding=UTF-8 should fix the first problem. The second problem is to be fixed in the console settings. If you're using for example Eclipse, you can change it in Window > Preferences > General > Workspace > Text File Encoding. Set it to UTF-8 as well.

Update: As per your update:
```
byte[] textArray = f.getName().getBytes();
```
That should have been the following to exclude influence of platform default encoding:
```
byte[] textArray = f.getName().getBytes("UTF-8");
```
If that still displays the same, then the problem lies deeper. What JVM exactly are you using? Do a java -version. As said before, the -Dfile.encoding argument is Sun JVM specific. Some Linux machines ships with GNU JVM or OpenJDK's JVM and this argument may then not work.
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-01 19:21

It's a bug in the old-skool java File api, maybe just on a mac? Anyway, the new java.nio api works much better. I have several files containing unicode characters that failed to load using java.io... classes. After converting all my code to use java.nio.Path EVERYTHING started working. And I replaced apache FileUtils (which has the same problem) with java.nio.Files...

0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-01 19:23
Well I was strangled with this issue all the day! My previous (wrong) code was the same as you:
```
for(File f : dir.listFiles()) {
 String filename = f.getName(); // The filename here is wrong !
 FileInputStream fis = new FileInputStream (filename);
}
```
and it does not work (I'm using Java 1.7 Oracle on CentOS 6, LANG and LC_CTYPE=fr_FR.UTF-8 for all users except zimbra => LANG and LC_CTYPE=C - which btw is certainly the cause of this issue but I can't change this without the risk that Zimbra stops working...)

So I decided to use the new classes of java.nio.file package (Files and Paths):
```
DirectoryStream<Path> paths = Files.newDirectoryStream(Paths.get(outputName));
for (Iterator<Path> iterator = paths.iterator(); iterator.hasNext();) {
  Path path = iterator.next();
  String filename = path.getFileName().toString(); // The filename here is correct
  ...
}
```
So if you are using Java 1.7, you should give a try to new classes into java.nio.file package : it saved my day!

Hope it helps
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-01 19:26

It is a bug in JRE/JDK which exists for years.

How to fix java when if refused to open a file with special charater in filename?

File.exists() fails with unicode characters in name

I am now re-submitting a new bug report to them as LC_ALL=en_us will fix some cases, meanwhile it will fail some other cases.

0 讨论(0)
发布评论:

提交评论
- 加载中...