How to read lines from a mmapped file?

前端 未结 4 1535
孤独总比滥情好
孤独总比滥情好 2020-12-24 02:48

Is seems that the mmap interface only supports readline(). If I try to iterate over the object I get character instead of complete lines.

What would be the \"python

相关标签:
4条回答
  • 2020-12-24 03:10

    The most concise way to iterate over the lines of an mmap is

    with open(STAT_FILE, "r+b") as f:
        map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
        for line in iter(map_file.readline, b""):
            # whatever
    

    Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").

    0 讨论(0)
  • 2020-12-24 03:16

    I modified your example like this:

    with open(STAT_FILE, "r+b") as f:
            m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
            while True:
                    line=m.readline()
                    if line == '': break
                    print line.rstrip()
    

    Suggestions:

    • Do not call a variable map, this is a built-in function.
    • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.
    • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.

    Hope this helps.

    Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.

    Normal file access:

    real    2.410 2.414 2.428 2.478 2.490
    sys     0.052 0.052 0.064 0.080 0.152
    user    2.232 2.276 2.292 2.304 2.320
    

    mmap file access:

    real    1.885 1.899 1.925 1.940 1.954
    sys     0.088 0.108 0.108 0.116 0.120
    user    1.696 1.732 1.736 1.744 1.752
    

    Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.

    Edit 2: Using python -m cProfile test.py I got the following results:

    5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
    5432833    1.451    0.000    1.451    0.000 {method 'readline' of 'mmap.mmap' objects}
    

    If I'm not mistaken then mmap is quite a bit faster.

    Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.

    0 讨论(0)
  • 2020-12-24 03:20

    Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:

    On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):

    with open(someFile,"r") as f:
        if usemmap:
            m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        else:
            m=f
            e.parse(m)
    

    With MMAP:

    read in 0.308000087738
    

    Without MMAP:

    read in 0.680999994278
    
    0 讨论(0)
  • 2020-12-24 03:28

    The following is reasonably concise:

    with open(STAT_FILE, "r") as f:
        m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
        while True:
            line = m.readline()  
            if line == "": break
            print line
        m.close()
    

    Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "\n").

    The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.

    I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().

    0 讨论(0)
提交回复
热议问题