Python - How to parallel consume and operate on files in a directory

前端 未结 1 1367
旧巷少年郎
旧巷少年郎 2020-12-08 23:13

Current scenario: I have 900 files in a directory called directoryA. The files are named file0.txt through file 899.txt, each 15MB in size. I loop through e

相关标签:
1条回答
  • 2020-12-08 23:56

    To fully utilize your hardware core, it's better to use the multiprocessing library.

    from multiprocessing import Pool
    
    from os import listdir 
    import csv
    
    def process_file(file):
        #load the text file as list using csv module 
        #run a bunch of operations
        #regex the int from the filename. for ex file1.txt returns 1, and file42.txt returns 42
        #write out a corresponsding csv file in dirB. For example input file file99.txt is written as out99.csv
    
    if __name__ == '__main__':
        mypath = "some/path/"
    
        inputDir = mypath + 'dirA/'
        outputDir = mypath + 'dirB/'
    
        p = Pool(12)
        p.map(process_file, listdir(inputDir))
    

    Document of multiprocessing: https://docs.python.org/2/library/multiprocessing.html

    0 讨论(0)
提交回复
热议问题