I\'m creating a simple wordcount program in Java that reads through a directory\'s text-based files.
However, I keep on getting the error:
java.nio.c
Well, the problem is that Files.newBufferedReader(Path path)
is implemented like this :
public static BufferedReader newBufferedReader(Path path) throws IOException {
return newBufferedReader(path, StandardCharsets.UTF_8);
}
so basically there is no point in specifying UTF-8
unless you want to be descriptive in your code.
If you want to try a "broader" charset you could try with StandardCharsets.UTF_16
, but you can't be 100% sure to get every possible character anyway.
Creating BufferedReader from Files.newBufferedReader
Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);
when running the application it may throw the following exception:
java.nio.charset.MalformedInputException: Input length = 1
But
new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));
works well.
The different is that, the former uses CharsetDecoder default action.
The default action for malformed-input and unmappable-character errors is to report them.
while the latter uses the REPLACE action.
cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)
you can try something like this, or just copy and past below piece.
boolean exception = true;
Charset charset = Charset.defaultCharset(); //Try the default one first.
int index = 0;
while(exception) {
try {
lines = Files.readAllLines(f.toPath(),charset);
for (String line: lines) {
line= line.trim();
if(line.contains(keyword))
values.add(line);
}
//No exception, just returns
exception = false;
} catch (IOException e) {
exception = true;
//Try the next charset
if(index<Charset.availableCharsets().values().size())
charset = (Charset) Charset.availableCharsets().values().toArray()[index];
index ++;
}
}
I wrote the following to print a list of results to standard out based on available charsets. Note that it also tells you what line fails from a 0 based line number in case you are troubleshooting what character is causing issues.
public static void testCharset(String fileName) {
SortedMap<String, Charset> charsets = Charset.availableCharsets();
for (String k : charsets.keySet()) {
int line = 0;
boolean success = true;
try (BufferedReader b = Files.newBufferedReader(Paths.get(fileName),charsets.get(k))) {
while (b.ready()) {
b.readLine();
line++;
}
} catch (IOException e) {
success = false;
System.out.println(k+" failed on line "+line);
}
if (success)
System.out.println("************************* Successs "+k);
}
}