Java : Read last n lines of a HUGE file

穿精又带淫゛_ 提交于 2019-12-17 04:28:50

问题


I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.

I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.

I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.


回答1:


If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.

If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.

An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).




回答2:


I found it the simplest way to do by using ReversedLinesFileReader from apache commons-io api. This method will give you the line from bottom to top of a file and you can specify n_lines value to specify the number of line.

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}



回答3:


RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.

If your file is not encoded with an one-byte-per-character encoding, the readLine() method is not going to work for you. And readUTF() won't work in any circumstances. (It reads a string preceded by a character count ...)

Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.

In the case of UTF-8, the first byte of a character will be 0xxxxxxx or 110xxxxx or 1110xxxx or 11110xxx. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.

Having identified a proper character boundary, you can then just call new String(...) passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...) to count end-of-lines.




回答4:


I found RandomAccessFile and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>. So this it was the best solution for me.

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}



回答5:


CircularFifoBuffer from apache commons . answer from a similar question at How to read last 5 lines of a .txt file into java

Note that in Apache Commons Collections 4 this class seems to have been renamed to CircularFifoQueue




回答6:


A RandomAccessFile allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.




回答7:


I had similar problem, but I don't understood to another solutions.

I used this. I hope thats simple code.

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}



回答8:


Here is the working for this.

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}



回答9:


The ReversedLinesFileReader can be found in the Apache Commons IO java library.

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;



回答10:


package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}



回答11:


Here is the best way I've found to do it. Simple and pretty fast and memory efficient.

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}


来源:https://stackoverflow.com/questions/4121678/java-read-last-n-lines-of-a-huge-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!