Randomly selecting lines from files

后端 未结 7 678
轮回少年
轮回少年 2021-01-13 14:21

I have bunch of files and very file has a header of 5 lines. In the rest of the file, pair of line form an entry. I need to randomly select entry from these files. How can

7条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-13 15:13

    Two other means to do so: 1- by generators (may still require a lot of memory): http://www.usrsb.in/Picking-Random-Items--Take-Two--Hacking-Python-s-Generators-.html

    2- by a clever seeking (best method actually): http://www.regexprn.com/2008/11/read-random-line-in-large-file-in.html

    I here copy the code of the clever Jonathan Kupferman:

    #!/usr/bin/python
    
    import os,random
    
    filename="averylargefile"
    file = open(filename,'r')
    
    #Get the total file size
    file_size = os.stat(filename)[6]
    
    while 1:
          #Seek to a place in the file which is a random distance away
          #Mod by file size so that it wraps around to the beginning
          file.seek((file.tell()+random.randint(0,file_size-1))%file_size)
    
          #dont use the first readline since it may fall in the middle of a line
          file.readline()
          #this will return the next (complete) line from the file
          line = file.readline()
    
          #here is your random line in the file
          print line
    

提交回复
热议问题