Django multiprocessing and database connections

前端 未结 9 819
挽巷
挽巷 2020-11-28 03:16

Background:

I\'m working a project which uses Django with a Postgres database. We\'re also using mod_wsgi in case that matters, since some of my web searches have m

相关标签:
9条回答
  • 2020-11-28 03:46

    If you're also using connection pooling, the following worked for us, forcibly closing the connections after being forked. Before did not seem to help.

    from django.db import connections
    from django.db.utils import DEFAULT_DB_ALIAS
    
    connections[DEFAULT_DB_ALIAS].dispose()
    
    0 讨论(0)
  • 2020-11-28 03:48

    You could give more resources to Postgre, in Debian/Ubuntu you can edit :

    nano /etc/postgresql/9.4/main/postgresql.conf
    

    by replacing 9.4 by your postgre version .

    Here are some useful lines that should be updated with example values to do so, names speak for themselves :

    max_connections=100
    shared_buffers = 3000MB
    temp_buffers = 800MB
    effective_io_concurrency = 300
    max_worker_processes = 80
    

    Be careful not to boost too much these parameters as it might lead to errors with Postgre trying to take more ressources than available. Examples above are running fine on a Debian 8GB Ram machine equiped with 4 cores.

    0 讨论(0)
  • 2020-11-28 03:51

    When using multiple databases, you should close all connections.

    from django import db
    for connection_name in db.connections.databases:
        db.connections[connection_name].close()
    

    EDIT

    Please use the same as @lechup mentionned to close all connections(not sure since which django version this method was added):

    from django import db
    db.connections.close_all()
    
    0 讨论(0)
  • 2020-11-28 03:51

    If all you need is I/O parallelism and not processing parallelism, you can avoid this problem by switch your processes to threads. Replace

    from multiprocessing import Process
    

    with

    from threading import Thread
    

    The Thread object has the same interface as Procsess

    0 讨论(0)
  • 2020-11-28 03:56

    (not a great solution, but a possible workaround)

    if you can't use celery, maybe you could implement your own queueing system, basically adding tasks to some task table and having a regular cron that picks them off and processes? (via a management command)

    0 讨论(0)
  • 2020-11-28 04:01

    Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.

    My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):

    from django import db
    db.connections.close_all()
    def db_worker():      
        some_paralell_code()
    Process(target = db_worker,args = ())
    

    Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.

    Update:

    To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.

    def db_worker(models_ids):        
        obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
        obj.run()
    
    
    model_ids = Model.objects.all().values_list('id', flat=True)
    model_ids = list(model_ids) # cast to list
    process_count = 5
    delta = (len(model_ids) / process_count) + 1
    
    # do all the db stuff here ...
    
    # here you can close db connection
    from django import db
    db.connections.close_all()
    
    for it in range(0:process_count):
        Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))   
    
    0 讨论(0)
提交回复
热议问题