Multithreading inside Multiprocessing in Python

谁说胖子不能爱 提交于 2020-06-29 06:41:57

问题


I am using concurrent.futures module to do multiprocessing and multithreading. I am running it on a 8 core machine with 16GB RAM, intel i7 8th Gen processor. I tried this on Python 3.7.2 and even on Python 3.8.2

import concurrent.futures
import time
takes list and multiply each elem by 2
def double_value(x):
  y = []
  for elem in x:
    y.append(2 *elem)
  return y
multiply an elem by 2
def double_single_value(x):
  return 2* x
define a
import numpy as np
a = np.arange(100000000).reshape(100, 1000000)
function to run multiple thread and multiple each elem by 2
 def get_double_value(x):
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(double_single_value, x)
  return list(results)

code shown below ran in 115 seconds. This is using only multiprocessing. CPU utilization for this piece of code is 100%

t = time.time()

with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(double_value, a)
print(time.time()-t)

Below function took more than 9 min and consumed all the Ram of system and then system kill all the process. Also CPU utilization during this piece of code is not upto 100% (~85%)

t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(get_double_value, a)

print(time.time()-t)

I really want to understand:

1) why the code that first split do multiple processing and then run tried multi-threading is not running faster than the code that runs only multiprocessing ?

(I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes ? )

2) Is there any better way of doing multi-threading inside multiprocessing for max utilization of allotted core(or CPU) ?

3) Why that last piece of code consumed all the RAM ? Was it due to multi-threading ?


回答1:


As you say: "I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes".

You need to figure out, if your program is IO-bound or CPU-bound, then apply the correct method to solve your problem. Applying various methods at random or all together at the same time usually makes things only worse.




回答2:


Use of threading in clean Python for CPU-bound problems is a bad approach regardless of using multiprocessing or not. Try to redesign your app to use only multiprocessing or use third-party libs such as Dask and so on



来源:https://stackoverflow.com/questions/62469183/multithreading-inside-multiprocessing-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!