How to find out which line separator BufferedReader#readLine() used to split the line?

人盡茶涼 提交于 2019-11-28 12:01:44

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}

BufferedReader does not accept FileInputStreams

No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.

Unfornunately all answers below are incorrect.

Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.

BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.

You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.

The answer would be You can't find out what was the line ending.

I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.

In this case I use this code:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}

If you are using groovy, you can simply do:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!