Python Random Access File

前端 未结 7 2052
借酒劲吻你
借酒劲吻你 2020-12-01 16:10

Is there a Python file type for accessing random lines without traversing the whole file? I need to search within a large file, reading the whole thing into memory wouldn\'t

相关标签:
7条回答
  • 2020-12-01 16:51

    You can use linecache:

    import linecache
    print linecache.getline(your_file.txt, randomLineNumber) # Note: first line is 1, not 0
    
    0 讨论(0)
  • 2020-12-01 16:53

    Since lines can be of arbitrary length, you really can't get at a random line (whether you mean "a line whose number is actually random" or "a line with an arbitrary number, selected by me") without traversing the whole file.

    If kinda-sorta-random is enough, you can seek to a random place in the file and then read forward until you hit a line terminator. But that's useless if you want to find (say) line number 1234, and will sample lines non-uniformly if you actually want a randomly chosen line.

    0 讨论(0)
  • 2020-12-01 16:55

    Has fixed-length records? If so, yes, you can implement a binary search algorithm using seeking.

    Otherwise, load your file into an SQLlite database. Query that.

    0 讨论(0)
  • 2020-12-01 17:03

    Yes, you can easily get a random line. Just seek to a random position in the file, then seek towards the beginning until you hit a \n or the beginning of the file, then read a line.

    Code:

    import sys,random
    with open(sys.argv[1],"r") as f:
        f.seek(0,2)                 # seek to end of file
        bytes = f.tell()
        f.seek(int(bytes*random.random()))
    
        # Now seek forward until beginning of file or we get a \n
        while True:
            f.seek(-2,1)
            ch = f.read(1)
            if ch=='\n': break
            if f.tell()==1: break
    
        # Now get a line
        print f.readline()
    
    0 讨论(0)
  • 2020-12-01 17:04

    This seems like just the sort of thing mmap was designed for. A mmap object creates a string-like interface to a file:

    >>> f = open("bonnie.txt", "wb")
    >>> f.write("My Bonnie lies over the ocean.")
    >>> f.close()
    >>> f.open("bonnie.txt", "r+b")
    >>> mm = mmap(f.fileno(), 0)
    >>> print mm[3:9]
    Bonnie
    

    In case you were wondering, mmap objects can also be assigned to:

    >>> print mm[24:]
    ocean.
    >>> mm[24:] = "sea.  "
    >>> print mm[:]
    My Bonnie lies over the sea.  
    
    0 讨论(0)
  • 2020-12-01 17:13

    file objects have a seek method which can take a value to particular byte within that file. For traversing through the large files, iterate over it and check for the value in each line. Iterating the file object does not load the whole file content into memory.

    0 讨论(0)
提交回复
热议问题