I have bunch of files and very file has a header of 5 lines. In the rest of the file, pair of line form an entry. I need to randomly select entry from these files. How can
Two other means to do so: 1- by generators (may still require a lot of memory): http://www.usrsb.in/Picking-Random-Items--Take-Two--Hacking-Python-s-Generators-.html
2- by a clever seeking (best method actually): http://www.regexprn.com/2008/11/read-random-line-in-large-file-in.html
I here copy the code of the clever Jonathan Kupferman:
#!/usr/bin/python
import os,random
filename="averylargefile"
file = open(filename,'r')
#Get the total file size
file_size = os.stat(filename)[6]
while 1:
#Seek to a place in the file which is a random distance away
#Mod by file size so that it wraps around to the beginning
file.seek((file.tell()+random.randint(0,file_size-1))%file_size)
#dont use the first readline since it may fall in the middle of a line
file.readline()
#this will return the next (complete) line from the file
line = file.readline()
#here is your random line in the file
print line