How to preserve newlines while reading a file using stream - java 8

拈花ヽ惹草 提交于 2019-12-05 02:24:56

The problem is that Files.lines() is implemented on top of BufferedReader.readLine(), which reads a line up until the line terminator and throws it away. Then, when you write the lines with something like Files.write(), this supplies the system-specific line terminator after each line, which might differ from the line terminator that was read in.

If you really want to preserve the line terminators exactly as they are, even if they're a mixture of different line terminators, you could use a regex and Scanner for that.

First define a pattern that matches a line including the valid line terminators or EOF:

Pattern pat = Pattern.compile(".*\\R|.+\\z");

The \\R is a special linebreak matcher that matches the usual line terminators plus a few Unicode line terminators that I've never heard of. :-) You could use something like (\\r\\n|\\r|\\n) if you want just the usual CRLF, CR, or LF terminators.

You have to include .+\\z in order to match a potential last "line" in the file that doesn't have a line terminator. Make sure the regex always matches at least one character so that no match will be found when the Scanner reaches the end of the file.

Then, read lines using a Scanner until it returns null:

try (Scanner in = new Scanner(Paths.get(INFILE), "UTF-8")) {
    String line;
    while ((line = in.findWithinHorizon(pat, 0)) != null) {
        // Process the line, then write the output using something like
        // FileWriter.write(String) that doesn't add another line terminator.
    }
}

The lines in your stream do not include any newline character.

It would be nice if the method documentation for Files.lines() mentioned this. However, if you follow the implementation, it eventually leads to BufferedReader.readLine(). That method is documented to return the contents of the line, not including any line-termination characters.

You can add a newline character to the lines when you write them.

A system-dependent line separator is used by the Files.write() method you're calling, as documented in its sibling. You can also get this system-dependent line separator with System.lineSeparator().

If you want a different line separator, and know what it is, you can specify it. For example:

    try ( PrintStream out = new PrintStream( Files.newOutputStream( targetFile ))) 
    {
        lines.forEach( line -> out.print( line + "\r\n") );
    }

If you want the original file's line separators, you can't rely only on a method that strips those out. Options include:

  • Reading the first line separator, and guessing that it's consistent throughout the file. This allows you to continue to use Files.lines() to read the lines.
  • Use an API that allows you to get lines with their separators.
  • Read character-by-character, rather than line-by-line, so that you can get the line separators.

WARNING: Your code reads and writes from the same file. You could lose your original data due to abnormal termination or bugs.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!