I have a huge dataset of videos that I process using a python script called process.py
. The problem is it takes a lot of time to process all the dataset which conta
The multiprocessing
documentation ( https://docs.python.org/2/library/multiprocessing.html) is actually fairly easy to digest. This section (https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers) should be particularly relevant
You definitely do not need multiple copy of the same script. This is an approach you can adopt:
Assume it is the general structure of your existing script (process.py
).
def convert_vid(fname):
# do the heavy lifting
# ...
if __name__ == '__main__':
# There exists VIDEO_SET_1 to 4, as mentioned in your question
for file in VIDEO_SET_1:
convert_vid(file)
With multiprocessing
, you can fire the function convert_vid
in seperate processes. Here is the general scheme:
from multiprocessing import Pool
def convert_vid(fname):
# do the heavy lifting
# ...
if __name__ == '__main__':
pool = Pool(processes=4)
pool.map(convert_vid, [VIDEO_SET_1, VIDEO_SET_2, VIDEO_SET_3, VIDEO_SET_4])