Python - How to parallel consume and operate on files in a directory

前端未结

关注

 1  1367

Current scenario: I have 900 files in a directory called directoryA. The files are named file0.txt through file 899.txt, each 15MB in size. I loop through e

相关标签:

1条回答

栀梦

2020-12-08 23:56

To fully utilize your hardware core, it's better to use the multiprocessing library.

from multiprocessing import Pool

from os import listdir 
import csv

def process_file(file):
    #load the text file as list using csv module 
    #run a bunch of operations
    #regex the int from the filename. for ex file1.txt returns 1, and file42.txt returns 42
    #write out a corresponsding csv file in dirB. For example input file file99.txt is written as out99.csv

if __name__ == '__main__':
    mypath = "some/path/"

    inputDir = mypath + 'dirA/'
    outputDir = mypath + 'dirB/'

    p = Pool(12)
    p.map(process_file, listdir(inputDir))

Document of multiprocessing: https://docs.python.org/2/library/multiprocessing.html

0 讨论(0)