Curly quotes causing Java Scanner hasNextLine() to be false — why?

前端未结

关注

 3  1801

I\'ve been having an issue getting the java.util.Scanner to read a text file I saved in Notepad, even though it works fine with others. Basically, when it tries to read the

相关标签:

3条回答

鱼传尺愫

2021-01-17 12:24

If you don't specify an encoding when you create the scanner it will try to divine the encoding based on a byte order mark (BOM), which is the first few bytes of a file. If it doesn't have one, it will default to whatever default the OS uses. Since you're using Windows, the default is cp-1252. It seems that notepad is saving your text file using ISO-8859-1 which is similar, but not that same as cp-1252. See this link for more details:

http://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

When you save it as UTF-8, it probably places the UTF-8 BOM at the beginning of the file and the scanner can pick up on it.

If you want to look more into BOM, look it up in wikipedia--the article is quite good. You can also download PSPad and open the text file in hex mode to see the individual bytes. Hope that helps :)

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-01-17 12:45
Scanner's hasNextLine method will just return false if it encountered encoding error in the input file. Without any exception. This is frustrating, and it is not documented anywhere, even in JDK 8 documentation.

If you just want to read a file line-by-line, use this instead:
```
final BufferedReader input = new BufferedReader(new InputStreamReader(new FileInputStream("inputfile.txt"), "inputencoding"));

while (true) {
    String line = input.readLine();
    if (line == null) break;
    // process line
}

input.close();
```
Make sure the inputencoding above is replaced with the correct encoding of the file. Most likely it is utf-8 or ascii. Even if the encoding mismatches, it won't prematurely terminate like Scanner.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2021-01-17 12:47
Some time ago I had similar problem with configuration file which was edited by the user. Because I never know what type of editor user will use I try this:
```
org.mozilla.universalchardet.UniversalDetector
```
available from here:
```
https://code.google.com/p/juniversalchardet/
```
The detecting char encoding is not simple thing so I can't be sure if this library works at any condition, but for me was sufficient. Have a look, maybe will help somehow to detect your encoding and later set it to Scanner.
0 讨论(0)
发布评论:

提交评论
- 加载中...