django/celery: Best practices to run tasks on 150k Django objects?

前端 未结 3 1380
南方客
南方客 2021-02-03 12:43

I have to run tasks on approximately 150k Django objects. What is the best way to do this? I am using the Django ORM as the Broker. The database backend is MySQL and chokes a

3条回答
  •  -上瘾入骨i
    2021-02-03 13:03

    I would also consider using something other than using the database as the "broker". It really isn't suitable for this kind of work.

    Though, you can move some of this overhead out of the request/response cycle by launching a task to create the other tasks:

    from celery.task import TaskSet, task
    
    from myapp.models import MyModel
    
    @task
    def process_object(pk):
        obj = MyModel.objects.get(pk)
        # do something with obj
    
    @task
    def process_lots_of_items(ids_to_process):
        return TaskSet(process_object.subtask((id, ))
                           for id in ids_to_process).apply_async()
    

    Also, since you probably don't have 15000 processors to process all of these objects in parallel, you could split the objects in chunks of say 100's or 1000's:

    from itertools import islice
    from celery.task import TaskSet, task
    from myapp.models import MyModel
    
    def chunks(it, n):
        for first in it:
            yield [first] + list(islice(it, n - 1))
    
    @task
    def process_chunk(pks):
        objs = MyModel.objects.filter(pk__in=pks)
        for obj in objs:
            # do something with obj
    
    @task
    def process_lots_of_items(ids_to_process):
        return TaskSet(process_chunk.subtask((chunk, ))
                           for chunk in chunks(iter(ids_to_process),
                                               1000)).apply_async()
    

提交回复
热议问题