How do I read a random line from one file?

前端未结

关注

 11  664

灰色年华

Is there a built-in method to do it? If not how can I do this without costing too much overhead?

相关标签:

11条回答

一向

2020-12-04 20:24
This may be bulky, but it works I guess? (at least for txt files)
```
import random
choicefile=open("yourfile.txt","r")
linelist=[]
for line in choicefile:
    linelist.append(line)
choice=random.choice(linelist)
print(choice)
```
It reads each line of a file, and appends it to a list. It then chooses a random line from the list. If you want to remove the line once it's chosen, just do
```
linelist.remove(choice)
```
Hope this may help, but at least no extra modules and imports (apart from random) and relatively lightweight.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-04 20:25
Not built-in, but algorithm R(3.4.2) (Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
```
import random

def random_line(afile):
    line = next(afile)
    for num, aline in enumerate(afile, 2):
      if random.randrange(num): continue
      line = aline
    return line
```
The num, ... in enumerate(..., 2) iterator produces the sequence 2, 3, 4... The randrange will therefore be 0 with a probability of 1.0/num -- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-12-04 20:29
It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like
```
import random

random_lines = random.choice(open("file").readlines())
```
would do the trick.
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-12-04 20:29

If you don't want to read over the entire file, you can seek into the middle of the file, then seek backwards for the newline, and call readline.

Here is a Python3 script which does just this,

One disadvantage with this method is short lines have lower likelyhood of showing up.

def read_random_line(f, chunk_size=16):
    import os
    import random
    with open(f, 'rb') as f_handle:
        f_handle.seek(0, os.SEEK_END)
        size = f_handle.tell()
        i = random.randint(0, size)
        while True:
            i -= chunk_size
            if i < 0:
                chunk_size += i
                i = 0
            f_handle.seek(i, os.SEEK_SET)
            chunk = f_handle.read(chunk_size)
            i_newline = chunk.rfind(b'\n')
            if i_newline != -1:
                i += i_newline + 1
                break
            if i == 0:
                break
        f_handle.seek(i, os.SEEK_SET)
        return f_handle.readline()

0 讨论(0)

轻奢々

2020-12-04 20:29

Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2