Get last n lines of a file, similar to tail

前端 未结 30 2520
挽巷
挽巷 2020-11-22 03:46

I\'m writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest

相关标签:
30条回答
  • 2020-11-22 03:58

    An even cleaner python3 compatible version that doesn't insert but appends & reverses:

    def tail(f, window=1):
        """
        Returns the last `window` lines of file `f` as a list of bytes.
        """
        if window == 0:
            return b''
        BUFSIZE = 1024
        f.seek(0, 2)
        end = f.tell()
        nlines = window + 1
        data = []
        while nlines > 0 and end > 0:
            i = max(0, end - BUFSIZE)
            nread = min(end, BUFSIZE)
    
            f.seek(i)
            chunk = f.read(nread)
            data.append(chunk)
            nlines -= chunk.count(b'\n')
            end -= nread
        return b'\n'.join(b''.join(reversed(data)).splitlines()[-window:])
    

    use it like this:

    with open(path, 'rb') as f:
        last_lines = tail(f, 3).decode('utf-8')
    
    0 讨论(0)
  • 2020-11-22 03:59

    This may be quicker than yours. Makes no assumptions about line length. Backs through the file one block at a time till it's found the right number of '\n' characters.

    def tail( f, lines=20 ):
        total_lines_wanted = lines
    
        BLOCK_SIZE = 1024
        f.seek(0, 2)
        block_end_byte = f.tell()
        lines_to_go = total_lines_wanted
        block_number = -1
        blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting
                    # from the end of the file
        while lines_to_go > 0 and block_end_byte > 0:
            if (block_end_byte - BLOCK_SIZE > 0):
                # read the last block we haven't yet read
                f.seek(block_number*BLOCK_SIZE, 2)
                blocks.append(f.read(BLOCK_SIZE))
            else:
                # file too small, start from begining
                f.seek(0,0)
                # only read what was not read
                blocks.append(f.read(block_end_byte))
            lines_found = blocks[-1].count('\n')
            lines_to_go -= lines_found
            block_end_byte -= BLOCK_SIZE
            block_number -= 1
        all_read_text = ''.join(reversed(blocks))
        return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:])
    

    I don't like tricky assumptions about line length when -- as a practical matter -- you can never know things like that.

    Generally, this will locate the last 20 lines on the first or second pass through the loop. If your 74 character thing is actually accurate, you make the block size 2048 and you'll tail 20 lines almost immediately.

    Also, I don't burn a lot of brain calories trying to finesse alignment with physical OS blocks. Using these high-level I/O packages, I doubt you'll see any performance consequence of trying to align on OS block boundaries. If you use lower-level I/O, then you might see a speedup.


    UPDATE

    for Python 3.2 and up, follow the process on bytes as In text files (those opened without a "b" in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).:

    eg: f = open('C:/.../../apache_logs.txt', 'rb')

     def tail(f, lines=20):
        total_lines_wanted = lines
    
        BLOCK_SIZE = 1024
        f.seek(0, 2)
        block_end_byte = f.tell()
        lines_to_go = total_lines_wanted
        block_number = -1
        blocks = []
        while lines_to_go > 0 and block_end_byte > 0:
            if (block_end_byte - BLOCK_SIZE > 0):
                f.seek(block_number*BLOCK_SIZE, 2)
                blocks.append(f.read(BLOCK_SIZE))
            else:
                f.seek(0,0)
                blocks.append(f.read(block_end_byte))
            lines_found = blocks[-1].count(b'\n')
            lines_to_go -= lines_found
            block_end_byte -= BLOCK_SIZE
            block_number -= 1
        all_read_text = b''.join(reversed(blocks))
        return b'\n'.join(all_read_text.splitlines()[-total_lines_wanted:])
    
    0 讨论(0)
  • 2020-11-22 04:00

    S.Lott's answer above almost works for me but ends up giving me partial lines. It turns out that it corrupts data on block boundaries because data holds the read blocks in reversed order. When ''.join(data) is called, the blocks are in the wrong order. This fixes that.

    def tail(f, window=20):
        """
        Returns the last `window` lines of file `f` as a list.
        f - a byte file-like object
        """
        if window == 0:
            return []
        BUFSIZ = 1024
        f.seek(0, 2)
        bytes = f.tell()
        size = window + 1
        block = -1
        data = []
        while size > 0 and bytes > 0:
            if bytes - BUFSIZ > 0:
                # Seek back one whole BUFSIZ
                f.seek(block * BUFSIZ, 2)
                # read BUFFER
                data.insert(0, f.read(BUFSIZ))
            else:
                # file too small, start from begining
                f.seek(0,0)
                # only read what was not read
                data.insert(0, f.read(bytes))
            linesFound = data[0].count('\n')
            size -= linesFound
            bytes -= BUFSIZ
            block -= 1
        return ''.join(data).splitlines()[-window:]
    
    0 讨论(0)
  • 2020-11-22 04:00

    The code I ended up using. I think this is the best so far:

    def tail(f, n, offset=None):
        """Reads a n lines from f with an offset of offset lines.  The return
        value is a tuple in the form ``(lines, has_more)`` where `has_more` is
        an indicator that is `True` if there are more lines in the file.
        """
        avg_line_length = 74
        to_read = n + (offset or 0)
    
        while 1:
            try:
                f.seek(-(avg_line_length * to_read), 2)
            except IOError:
                # woops.  apparently file is smaller than what we want
                # to step back, go to the beginning instead
                f.seek(0)
            pos = f.tell()
            lines = f.read().splitlines()
            if len(lines) >= to_read or pos == 0:
                return lines[-to_read:offset and -offset or None], \
                       len(lines) > to_read or pos > 0
            avg_line_length *= 1.3
    
    0 讨论(0)
  • 2020-11-22 04:02

    Simple and fast solution with mmap:

    import mmap
    import os
    
    def tail(filename, n):
        """Returns last n lines from the filename. No exception handling"""
        size = os.path.getsize(filename)
        with open(filename, "rb") as f:
            # for Windows the mmap parameters are different
            fm = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
            try:
                for i in xrange(size - 1, -1, -1):
                    if fm[i] == '\n':
                        n -= 1
                        if n == -1:
                            break
                return fm[i + 1 if i else 0:].splitlines()
            finally:
                fm.close()
    
    0 讨论(0)
  • 2020-11-22 04:02

    I found the Popen above to be the best solution. It's quick and dirty and it works For python 2.6 on Unix machine i used the following

    def GetLastNLines(self, n, fileName):
        """
        Name:           Get LastNLines
        Description:        Gets last n lines using Unix tail
        Output:         returns last n lines of a file
        Keyword argument:
        n -- number of last lines to return
        filename -- Name of the file you need to tail into
        """
        p = subprocess.Popen(['tail','-n',str(n),self.__fileName], stdout=subprocess.PIPE)
        soutput, sinput = p.communicate()
        return soutput
    

    soutput will have will contain last n lines of the code. to iterate through soutput line by line do:

    for line in GetLastNLines(50,'myfile.log').split('\n'):
        print line
    
    0 讨论(0)
提交回复
热议问题