How do I read a random line from one file?

前端 未结 11 664
灰色年华
灰色年华 2020-12-04 20:03

Is there a built-in method to do it? If not how can I do this without costing too much overhead?

相关标签:
11条回答
  • 2020-12-04 20:24

    This may be bulky, but it works I guess? (at least for txt files)

    import random
    choicefile=open("yourfile.txt","r")
    linelist=[]
    for line in choicefile:
        linelist.append(line)
    choice=random.choice(linelist)
    print(choice)
    

    It reads each line of a file, and appends it to a list. It then chooses a random line from the list. If you want to remove the line once it's chosen, just do

    linelist.remove(choice)
    

    Hope this may help, but at least no extra modules and imports (apart from random) and relatively lightweight.

    0 讨论(0)
  • 2020-12-04 20:25

    Not built-in, but algorithm R(3.4.2) (Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):

    import random
    
    def random_line(afile):
        line = next(afile)
        for num, aline in enumerate(afile, 2):
          if random.randrange(num): continue
          line = aline
        return line
    

    The num, ... in enumerate(..., 2) iterator produces the sequence 2, 3, 4... The randrange will therefore be 0 with a probability of 1.0/num -- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.

    0 讨论(0)
  • 2020-12-04 20:29

    It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like

    import random
    
    random_lines = random.choice(open("file").readlines())
    

    would do the trick.

    0 讨论(0)
  • 2020-12-04 20:29

    If you don't want to read over the entire file, you can seek into the middle of the file, then seek backwards for the newline, and call readline.

    Here is a Python3 script which does just this,

    One disadvantage with this method is short lines have lower likelyhood of showing up.

    def read_random_line(f, chunk_size=16):
        import os
        import random
        with open(f, 'rb') as f_handle:
            f_handle.seek(0, os.SEEK_END)
            size = f_handle.tell()
            i = random.randint(0, size)
            while True:
                i -= chunk_size
                if i < 0:
                    chunk_size += i
                    i = 0
                f_handle.seek(i, os.SEEK_SET)
                chunk = f_handle.read(chunk_size)
                i_newline = chunk.rfind(b'\n')
                if i_newline != -1:
                    i += i_newline + 1
                    break
                if i == 0:
                    break
            f_handle.seek(i, os.SEEK_SET)
            return f_handle.readline()
    
    0 讨论(0)
  • 2020-12-04 20:29

    Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.

    0 讨论(0)
提交回复
热议问题