问题
I have 96 txt files that have to be processed. Right now I am using a for loop and doing them one at a time, this process is very slow. The resulting 96 files, do not need to be merged. Is there a way to make them run in parallel, ala Parallel.foreach in C#? Current code:
for src_name in glob.glob(source_dir+'/*.txt'):
outfile = open (...)
with open(...) as infile:
for line in infile:
--PROCESS--
for --condition--:
outfile.write(...)
infile.close()
outfile.close()
Want this process to run in parallel for all files in source_dir.
回答1:
Assuming that the limiting factor is indeed the processing and not the I/O, you can use joblib to easily run your loop on multiple CPUs.
A simple example from their documentation:
>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
来源:https://stackoverflow.com/questions/29236642/c-sharp-parallel-foreach-equivalent-in-python