Is there a built-in method to do it? If not how can I do this without costing too much overhead?
This may be bulky, but it works I guess? (at least for txt files)
import random
choicefile=open("yourfile.txt","r")
linelist=[]
for line in choicefile:
linelist.append(line)
choice=random.choice(linelist)
print(choice)
It reads each line of a file, and appends it to a list. It then chooses a random line from the list. If you want to remove the line once it's chosen, just do
linelist.remove(choice)
Hope this may help, but at least no extra modules and imports (apart from random) and relatively lightweight.
Not built-in, but algorithm R(3.4.2)
(Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
import random
def random_line(afile):
line = next(afile)
for num, aline in enumerate(afile, 2):
if random.randrange(num): continue
line = aline
return line
The num, ... in enumerate(..., 2)
iterator produces the sequence 2, 3, 4... The randrange
will therefore be 0 with a probability of 1.0/num
-- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.
It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like
import random
random_lines = random.choice(open("file").readlines())
would do the trick.
If you don't want to read over the entire file, you can seek into the middle of the file, then seek backwards for the newline, and call readline
.
Here is a Python3 script which does just this,
One disadvantage with this method is short lines have lower likelyhood of showing up.
def read_random_line(f, chunk_size=16):
import os
import random
with open(f, 'rb') as f_handle:
f_handle.seek(0, os.SEEK_END)
size = f_handle.tell()
i = random.randint(0, size)
while True:
i -= chunk_size
if i < 0:
chunk_size += i
i = 0
f_handle.seek(i, os.SEEK_SET)
chunk = f_handle.read(chunk_size)
i_newline = chunk.rfind(b'\n')
if i_newline != -1:
i += i_newline + 1
break
if i == 0:
break
f_handle.seek(i, os.SEEK_SET)
return f_handle.readline()
Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.