Get last n lines of a file, similar to tail

前端未结

关注

 30  2524

I\'m writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest

相关标签:

30条回答

时光取名叫无心

2020-11-22 04:19

Based on Eyecue answer (Jun 10 '10 at 21:28): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

0 讨论(0)

野趣味

2020-11-22 04:20

If reading the whole file is acceptable then use a deque.

from collections import deque
deque(f, maxlen=n)

Prior to 2.6, deques didn't have a maxlen option, but it's easy enough to implement.

import itertools
def maxque(items, size):
    items = iter(items)
    q = deque(itertools.islice(items, size))
    for item in items:
        del q[0]
        q.append(item)
    return q

If it's a requirement to read the file from the end, then use a gallop (a.k.a exponential) search.

def tail(f, n):
    assert n >= 0
    pos, lines = n+1, []
    while len(lines) <= n:
        try:
            f.seek(-pos, 2)
        except IOError:
            f.seek(0)
            break
        finally:
            lines = list(f)
        pos *= 2
    return lines[-n:]

0 讨论(0)

梦如初夏

2020-11-22 04:22

Posting an answer at the behest of commenters on my answer to a similar question where the same technique was used to mutate the last line of a file, not just get it.

For a file of significant size, mmap is the best way to do this. To improve on the existing mmap answer, this version is portable between Windows and Linux, and should run faster (though it won't work without some modifications on 32 bit Python with files in the GB range, see the other answer for hints on handling this, and for modifying to work on Python 2).

import io  # Gets consistent version of open for both Py2.7 and Py3.x
import itertools
import mmap

def skip_back_lines(mm, numlines, startidx):
    '''Factored out to simplify handling of n and offset'''
    for _ in itertools.repeat(None, numlines):
        startidx = mm.rfind(b'\n', 0, startidx)
        if startidx < 0:
            break
    return startidx

def tail(f, n, offset=0):
    # Reopen file in binary mode
    with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
        # len(mm) - 1 handles files ending w/newline by getting the prior line
        startofline = skip_back_lines(mm, offset, len(mm) - 1)
        if startofline < 0:
            return []  # Offset lines consumed whole file, nothing to return
            # If using a generator function (yield-ing, see below),
            # this should be a plain return, no empty list

        endoflines = startofline + 1  # Slice end to omit offset lines

        # Find start of lines to capture (add 1 to move from newline to beginning of following line)
        startofline = skip_back_lines(mm, n, startofline) + 1

        # Passing True to splitlines makes it return the list of lines without
        # removing the trailing newline (if any), so list mimics f.readlines()
        return mm[startofline:endoflines].splitlines(True)
        # If Windows style \r\n newlines need to be normalized to \n, and input
        # is ASCII compatible, can normalize newlines with:
        # return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'\n').splitlines(True)

This assumes the number of lines tailed is small enough you can safely read them all into memory at once; you could also make this a generator function and manually read a line at a time by replacing the final line with:

        mm.seek(startofline)
        # Call mm.readline n times, or until EOF, whichever comes first
        # Python 3.2 and earlier:
        for line in itertools.islice(iter(mm.readline, b''), n):
            yield line

        # 3.3+:
        yield from itertools.islice(iter(mm.readline, b''), n)

Lastly, this read in binary mode (necessary to use mmap) so it gives str lines (Py2) and bytes lines (Py3); if you want unicode (Py2) or str (Py3), the iterative approach could be tweaked to decode for you and/or fix newlines:

        lines = itertools.islice(iter(mm.readline, b''), n)
        if f.encoding:  # Decode if the passed file was opened with a specific encoding
            lines = (line.decode(f.encoding) for line in lines)
        if 'b' not in f.mode:  # Fix line breaks if passed file opened in text mode
            lines = (line.replace(os.linesep, '\n') for line in lines)
        # Python 3.2 and earlier:
        for line in lines:
            yield line
        # 3.3+:
        yield from lines

Note: I typed this all up on a machine where I lack access to Python to test. Please let me know if I typoed anything; this was similar enough to my other answer that I think it should work, but the tweaks (e.g. handling an offset) could lead to subtle errors. Please let me know in the comments if there are any mistakes.

0 讨论(0)

走了就别回头了

2020-11-22 04:22

Several of these solutions have issues if the file doesn't end in \n or in ensuring the complete first line is read.

def tail(file, n=1, bs=1024):
    f = open(file)
    f.seek(-1,2)
    l = 1-f.read(1).count('\n') # If file doesn't end in \n, count it anyway.
    B = f.tell()
    while n >= l and B > 0:
            block = min(bs, B)
            B -= block
            f.seek(B, 0)
            l += f.read(block).count('\n')
    f.seek(B, 0)
    l = min(l,n) # discard first (incomplete) line if l > n
    lines = f.readlines()[-l:]
    f.close()
    return lines

0 讨论(0)

予麋鹿

2020-11-22 04:23
There are some existing implementations of tail on pypi which you can install using pip:
- mtFileUtil
- multitail
- log4tailer
- ...
Depending on your situation, there may be advantages to using one of these existing tools.
0 讨论(0)
发布评论:

提交评论
- 加载中...

日久生厌

2020-11-22 04:25

Update @papercrane solution to python3. Open the file with open(filename, 'rb') and:

def tail(f, window=20):
    """Returns the last `window` lines of file `f` as a list.
    """
    if window == 0:
        return []

    BUFSIZ = 1024
    f.seek(0, 2)
    remaining_bytes = f.tell()
    size = window + 1
    block = -1
    data = []

    while size > 0 and remaining_bytes > 0:
        if remaining_bytes - BUFSIZ > 0:
            # Seek back one whole BUFSIZ
            f.seek(block * BUFSIZ, 2)
            # read BUFFER
            bunch = f.read(BUFSIZ)
        else:
            # file too small, start from beginning
            f.seek(0, 0)
            # only read what was not read
            bunch = f.read(remaining_bytes)

        bunch = bunch.decode('utf-8')
        data.insert(0, bunch)
        size -= bunch.count('\n')
        remaining_bytes -= BUFSIZ
        block -= 1

    return ''.join(data).splitlines()[-window:]

0 讨论(0)

上一页 1 2 3 4 5