问题
Below is the complete code simplified for the question.
ids_to_check
returns a list of ids. For my testing, I used a list of 13 random strings.
#!/usr/bin/env python3
import time
from multiprocessing.dummy import Pool as ThreadPool, current_process as threadpool_process
import requests
def ids_to_check():
some_calls()
return(id_list)
def execute_task(id):
url = f"https://myserver.com/todos/{ id }"
json_op = s.get(url,verify=False).json()
value = json_op['id']
print(str(value) + '-' + str(threadpool_process()) + str(id(s)))
def main():
pool = ThreadPool(processes=20)
while True:
pool.map(execute_task, ids_to_check())
print("Let's wait for 10 seconds")
time.sleep(10)
if __name__ == "__main__":
s = requests.Session()
s.headers.update = {
'Accept': 'application/json'
}
main()
Output:
4-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
5-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
7-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
2-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
None-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
12-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
8-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
6-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
3-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
11-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
9-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
1-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
1-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
7-<DummyProcess(Thread-17, started daemon 140208133633792)>140209446508360
6-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
4-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
9-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
5-<DummyProcess(Thread-15, started daemon 140208494323456)>140209446508360
2-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
8-<DummyProcess(Thread-18, started daemon 140208125241088)>140209446508360
11-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
12-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
2-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
1-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
4-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
5-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
9-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
8-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
7-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
6-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
12-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
10-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
11-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
Let's wait for 10 seconds
.
.
My observation:
- multiple connections are created (i.e., connection per process), but session object is same throughtout the execution of the code (as session object id is same)
- connections keep recycling as seen from ss output. I couldn't identify any certain pattern/timeout for the recycling
- connections are not recycling if I reduce the processes to a smaller number. (Example: 5)
I do not understand how/why the connections are being recycled and why they are not if I reduce the process count. I have tried disabling the garbage collector import gc; gc.disable()
and still connections are recycled.
I would like the created connections to keep alive, until it reaches a maximum number of requests. I think it would work without sessions and using keep-alive connection header.
But I am curious to know what causing these sessions connections to keep recycling when a process pool length is high.
I can reproduce this issue with any server, so it may not be dependent on server.
回答1:
I solved the same issue for myself by creating session for each process and parallelized requests executions. And at first time I used multiprocessing.dummy
too, but I faced the same issue as yours and changed it to concurrent.futures.thread.ThreadPoolExecutor
.
Here is my solution.
from concurrent.futures.thread import ThreadPoolExecutor
from functools import partial
from requests import Session, Response
from requests.adapters import HTTPAdapter
def thread_pool_execute(iterables, method, pool_size=30) -> list:
"""Multiprocess requests, returns list of responses."""
session = Session()
session.mount('https://', HTTPAdapter(pool_maxsize=pool_size)) # that's it
session.mount('http://', HTTPAdapter(pool_maxsize=pool_size)) # that's it
worker = partial(method, session)
with ThreadPoolExecutor(pool_size) as pool:
results = pool.map(worker, iterables)
session.close()
return list(results)
def simple_request(session, url) -> Response:
return session.get(url)
response_list = thread_pool_execute(list_of_urls, simple_request)
I test sitemaps with 200k urls with it with pool_size=150
without any problems. It's restricts only by target host configuration.
来源:https://stackoverflow.com/questions/65365783/how-do-connections-recycle-in-a-multiprocess-pool-serving-requests-from-a-single