Get last n lines of a file, similar to tail

前端 未结 30 2524
挽巷
挽巷 2020-11-22 03:46

I\'m writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest

相关标签:
30条回答
  • 2020-11-22 04:19

    Based on Eyecue answer (Jun 10 '10 at 21:28): this class add head() and tail() method to file object.

    class File(file):
        def head(self, lines_2find=1):
            self.seek(0)                            #Rewind file
            return [self.next() for x in xrange(lines_2find)]
    
        def tail(self, lines_2find=1):  
            self.seek(0, 2)                         #go to end of file
            bytes_in_file = self.tell()             
            lines_found, total_bytes_scanned = 0, 0
            while (lines_2find+1 > lines_found and
                   bytes_in_file > total_bytes_scanned): 
                byte_block = min(1024, bytes_in_file-total_bytes_scanned)
                self.seek(-(byte_block+total_bytes_scanned), 2)
                total_bytes_scanned += byte_block
                lines_found += self.read(1024).count('\n')
            self.seek(-total_bytes_scanned, 2)
            line_list = list(self.readlines())
            return line_list[-lines_2find:]
    

    Usage:

    f = File('path/to/file', 'r')
    f.head(3)
    f.tail(3)
    
    0 讨论(0)
  • 2020-11-22 04:20

    If reading the whole file is acceptable then use a deque.

    from collections import deque
    deque(f, maxlen=n)
    

    Prior to 2.6, deques didn't have a maxlen option, but it's easy enough to implement.

    import itertools
    def maxque(items, size):
        items = iter(items)
        q = deque(itertools.islice(items, size))
        for item in items:
            del q[0]
            q.append(item)
        return q
    

    If it's a requirement to read the file from the end, then use a gallop (a.k.a exponential) search.

    def tail(f, n):
        assert n >= 0
        pos, lines = n+1, []
        while len(lines) <= n:
            try:
                f.seek(-pos, 2)
            except IOError:
                f.seek(0)
                break
            finally:
                lines = list(f)
            pos *= 2
        return lines[-n:]
    
    0 讨论(0)
  • 2020-11-22 04:22

    Posting an answer at the behest of commenters on my answer to a similar question where the same technique was used to mutate the last line of a file, not just get it.

    For a file of significant size, mmap is the best way to do this. To improve on the existing mmap answer, this version is portable between Windows and Linux, and should run faster (though it won't work without some modifications on 32 bit Python with files in the GB range, see the other answer for hints on handling this, and for modifying to work on Python 2).

    import io  # Gets consistent version of open for both Py2.7 and Py3.x
    import itertools
    import mmap
    
    def skip_back_lines(mm, numlines, startidx):
        '''Factored out to simplify handling of n and offset'''
        for _ in itertools.repeat(None, numlines):
            startidx = mm.rfind(b'\n', 0, startidx)
            if startidx < 0:
                break
        return startidx
    
    def tail(f, n, offset=0):
        # Reopen file in binary mode
        with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            # len(mm) - 1 handles files ending w/newline by getting the prior line
            startofline = skip_back_lines(mm, offset, len(mm) - 1)
            if startofline < 0:
                return []  # Offset lines consumed whole file, nothing to return
                # If using a generator function (yield-ing, see below),
                # this should be a plain return, no empty list
    
            endoflines = startofline + 1  # Slice end to omit offset lines
    
            # Find start of lines to capture (add 1 to move from newline to beginning of following line)
            startofline = skip_back_lines(mm, n, startofline) + 1
    
            # Passing True to splitlines makes it return the list of lines without
            # removing the trailing newline (if any), so list mimics f.readlines()
            return mm[startofline:endoflines].splitlines(True)
            # If Windows style \r\n newlines need to be normalized to \n, and input
            # is ASCII compatible, can normalize newlines with:
            # return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'\n').splitlines(True)
    

    This assumes the number of lines tailed is small enough you can safely read them all into memory at once; you could also make this a generator function and manually read a line at a time by replacing the final line with:

            mm.seek(startofline)
            # Call mm.readline n times, or until EOF, whichever comes first
            # Python 3.2 and earlier:
            for line in itertools.islice(iter(mm.readline, b''), n):
                yield line
    
            # 3.3+:
            yield from itertools.islice(iter(mm.readline, b''), n)
    

    Lastly, this read in binary mode (necessary to use mmap) so it gives str lines (Py2) and bytes lines (Py3); if you want unicode (Py2) or str (Py3), the iterative approach could be tweaked to decode for you and/or fix newlines:

            lines = itertools.islice(iter(mm.readline, b''), n)
            if f.encoding:  # Decode if the passed file was opened with a specific encoding
                lines = (line.decode(f.encoding) for line in lines)
            if 'b' not in f.mode:  # Fix line breaks if passed file opened in text mode
                lines = (line.replace(os.linesep, '\n') for line in lines)
            # Python 3.2 and earlier:
            for line in lines:
                yield line
            # 3.3+:
            yield from lines
    

    Note: I typed this all up on a machine where I lack access to Python to test. Please let me know if I typoed anything; this was similar enough to my other answer that I think it should work, but the tweaks (e.g. handling an offset) could lead to subtle errors. Please let me know in the comments if there are any mistakes.

    0 讨论(0)
  • 2020-11-22 04:22

    Several of these solutions have issues if the file doesn't end in \n or in ensuring the complete first line is read.

    def tail(file, n=1, bs=1024):
        f = open(file)
        f.seek(-1,2)
        l = 1-f.read(1).count('\n') # If file doesn't end in \n, count it anyway.
        B = f.tell()
        while n >= l and B > 0:
                block = min(bs, B)
                B -= block
                f.seek(B, 0)
                l += f.read(block).count('\n')
        f.seek(B, 0)
        l = min(l,n) # discard first (incomplete) line if l > n
        lines = f.readlines()[-l:]
        f.close()
        return lines
    
    0 讨论(0)
  • 2020-11-22 04:23

    There are some existing implementations of tail on pypi which you can install using pip:

    • mtFileUtil
    • multitail
    • log4tailer
    • ...

    Depending on your situation, there may be advantages to using one of these existing tools.

    0 讨论(0)
  • 2020-11-22 04:25

    Update @papercrane solution to python3. Open the file with open(filename, 'rb') and:

    def tail(f, window=20):
        """Returns the last `window` lines of file `f` as a list.
        """
        if window == 0:
            return []
    
        BUFSIZ = 1024
        f.seek(0, 2)
        remaining_bytes = f.tell()
        size = window + 1
        block = -1
        data = []
    
        while size > 0 and remaining_bytes > 0:
            if remaining_bytes - BUFSIZ > 0:
                # Seek back one whole BUFSIZ
                f.seek(block * BUFSIZ, 2)
                # read BUFFER
                bunch = f.read(BUFSIZ)
            else:
                # file too small, start from beginning
                f.seek(0, 0)
                # only read what was not read
                bunch = f.read(remaining_bytes)
    
            bunch = bunch.decode('utf-8')
            data.insert(0, bunch)
            size -= bunch.count('\n')
            remaining_bytes -= BUFSIZ
            block -= 1
    
        return ''.join(data).splitlines()[-window:]
    
    0 讨论(0)
提交回复
热议问题