Django multiprocessing and database connections

前端未结

关注

 9  823

Background:

I\'m working a project which uses Django with a Postgres database. We\'re also using mod_wsgi in case that matters, since some of my web searches have m

相关标签:

9条回答

醉酒成梦

2020-11-28 03:46
If you're also using connection pooling, the following worked for us, forcibly closing the connections after being forked. Before did not seem to help.
```
from django.db import connections
from django.db.utils import DEFAULT_DB_ALIAS

connections[DEFAULT_DB_ALIAS].dispose()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情话喂你

2020-11-28 03:48
You could give more resources to Postgre, in Debian/Ubuntu you can edit :
```
nano /etc/postgresql/9.4/main/postgresql.conf
```
by replacing 9.4 by your postgre version .

Here are some useful lines that should be updated with example values to do so, names speak for themselves :
```
max_connections=100
shared_buffers = 3000MB
temp_buffers = 800MB
effective_io_concurrency = 300
max_worker_processes = 80
```
Be careful not to boost too much these parameters as it might lead to errors with Postgre trying to take more ressources than available. Examples above are running fine on a Debian 8GB Ram machine equiped with 4 cores.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-28 03:51
When using multiple databases, you should close all connections.
```
from django import db
for connection_name in db.connections.databases:
    db.connections[connection_name].close()
```
EDIT

Please use the same as @lechup mentionned to close all connections(not sure since which django version this method was added):
```
from django import db
db.connections.close_all()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-11-28 03:51
If all you need is I/O parallelism and not processing parallelism, you can avoid this problem by switch your processes to threads. Replace
```
from multiprocessing import Process
```
with
```
from threading import Thread
```
The Thread object has the same interface as Procsess
0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2020-11-28 03:56

(not a great solution, but a possible workaround)

if you can't use celery, maybe you could implement your own queueing system, basically adding tasks to some task table and having a regular cron that picks them off and processes? (via a management command)

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-11-28 04:01
Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.

My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):
```
from django import db
db.connections.close_all()
def db_worker():      
    some_paralell_code()
Process(target = db_worker,args = ())
```
Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.

Update:

To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.
```
def db_worker(models_ids):        
    obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
    obj.run()


model_ids = Model.objects.all().values_list('id', flat=True)
model_ids = list(model_ids) # cast to list
process_count = 5
delta = (len(model_ids) / process_count) + 1

# do all the db stuff here ...

# here you can close db connection
from django import db
db.connections.close_all()

for it in range(0:process_count):
    Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))   
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页