Number of lines in a file in Java

前端 未结 19 2314
抹茶落季
抹茶落季 2020-11-22 05:31

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file<

相关标签:
19条回答
  • 2020-11-22 06:03
    /**
     * Count file rows.
     *
     * @param file file
     * @return file row count
     * @throws IOException
     */
    public static long getLineCount(File file) throws IOException {
    
        try (Stream<String> lines = Files.lines(file.toPath())) {
            return lines.count();
        }
    }
    

    Tested on JDK8_u31. But indeed performance is slow compared to this method:

    /**
     * Count file rows.
     *
     * @param file file
     * @return file row count
     * @throws IOException
     */
    public static long getLineCount(File file) throws IOException {
    
        try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
    
            byte[] c = new byte[1024];
            boolean empty = true,
                    lastEmpty = false;
            long count = 0;
            int read;
            while ((read = is.read(c)) != -1) {
                for (int i = 0; i < read; i++) {
                    if (c[i] == '\n') {
                        count++;
                        lastEmpty = true;
                    } else if (lastEmpty) {
                        lastEmpty = false;
                    }
                }
                empty = false;
            }
    
            if (!empty) {
                if (count == 0) {
                    count = 1;
                } else if (!lastEmpty) {
                    count++;
                }
            }
    
            return count;
        }
    }
    

    Tested and very fast.

    0 讨论(0)
  • 2020-11-22 06:05

    Best Optimized code for multi line files having no newline('\n') character at EOF.

    /**
     * 
     * @param filename
     * @return
     * @throws IOException
     */
    public static int countLines(String filename) throws IOException {
        int count = 0;
        boolean empty = true;
        FileInputStream fis = null;
        InputStream is = null;
        try {
            fis = new FileInputStream(filename);
            is = new BufferedInputStream(fis);
            byte[] c = new byte[1024];
            int readChars = 0;
            boolean isLine = false;
            while ((readChars = is.read(c)) != -1) {
                empty = false;
                for (int i = 0; i < readChars; ++i) {
                    if ( c[i] == '\n' ) {
                        isLine = false;
                        ++count;
                    }else if(!isLine && c[i] != '\n' && c[i] != '\r'){   //Case to handle line count where no New Line character present at EOF
                        isLine = true;
                    }
                }
            }
            if(isLine){
                ++count;
            }
        }catch(IOException e){
            e.printStackTrace();
        }finally {
            if(is != null){
                is.close();    
            }
            if(fis != null){
                fis.close();    
            }
        }
        LOG.info("count: "+count);
        return (count == 0 && !empty) ? 1 : count;
    }
    
    0 讨论(0)
  • 2020-11-22 06:08

    I tested the above methods for counting lines and here are my observations for Different methods as tested on my system

    File Size : 1.6 Gb Methods:

    1. Using Scanner : 35s approx
    2. Using BufferedReader : 5s approx
    3. Using Java 8 : 5s approx
    4. Using LineNumberReader : 5s approx

    Moreover Java8 Approach seems quite handy :

    Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
    [Return type : long]
    
    0 讨论(0)
  • 2020-11-22 06:08

    Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.

    0 讨论(0)
  • 2020-11-22 06:09

    How about using the Process class from within Java code? And then reading the output of the command.

    Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
    p.waitFor();
    
    BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
    String line = "";
    int lineCount = 0;
    while ((line = b.readLine()) != null) {
        System.out.println(line);
        lineCount = Integer.parseInt(line);
    }
    

    Need to try it though. Will post the results.

    0 讨论(0)
  • 2020-11-22 06:10

    I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.

    And @er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.

    I therefore made an algo which handles as follows:

    @Test
    public void empty() throws IOException {
        assertEquals(0, count(""));
    }
    
    @Test
    public void singleNewline() throws IOException {
        assertEquals(1, count("\n"));
    }
    
    @Test
    public void dataWithoutNewline() throws IOException {
        assertEquals(1, count("one"));
    }
    
    @Test
    public void oneCompleteLine() throws IOException {
        assertEquals(1, count("one\n"));
    }
    
    @Test
    public void twoCompleteLines() throws IOException {
        assertEquals(2, count("one\ntwo\n"));
    }
    
    @Test
    public void twoLinesWithoutNewlineAtEnd() throws IOException {
        assertEquals(2, count("one\ntwo"));
    }
    
    @Test
    public void aFewLines() throws IOException {
        assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
    }
    

    And it looks like this:

    static long countLines(InputStream is) throws IOException {
        try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
            char[] buf = new char[8192];
            int n, previousN = -1;
            //Read will return at least one byte, no need to buffer more
            while((n = lnr.read(buf)) != -1) {
                previousN = n;
            }
            int ln = lnr.getLineNumber();
            if (previousN == -1) {
                //No data read at all, i.e file was empty
                return 0;
            } else {
                char lastChar = buf[previousN - 1];
                if (lastChar == '\n' || lastChar == '\r') {
                    //Ending with newline, deduct one
                    return ln;
                }
            }
            //normal case, return line number + 1
            return ln + 1;
        }
    }
    

    If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use @er.vikas solution, but don't add one to the result and retry the skip:

    try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
        while(lnr.skip(Long.MAX_VALUE) > 0){};
        return lnr.getLineNumber();
    }
    
    0 讨论(0)
提交回复
热议问题