How do I run a long-running job in the background in Python

前端 未结 4 649
情话喂你
情话喂你 2020-12-02 23:56

I have a web-service that runs long-running jobs (in the order of several hours). I am developing this using Flask, Gunicorn, and nginx.

What I am thinking of doing

相关标签:
4条回答
  • 2020-12-03 00:03

    Celery and RQ is overengineering for simple task. Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html

    Also check example, how to run long-running jobs in background for Flask app - https://stackoverflow.com/a/39008301/5569578

    0 讨论(0)
  • 2020-12-03 00:04

    The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.

    Using this tutorial you can create your task and trigger it from your web application.

    from flask import Flask
    
    app = Flask(__name__)
    app.config.update(
        CELERY_BROKER_URL='redis://localhost:6379',
        CELERY_RESULT_BACKEND='redis://localhost:6379'
    )
    celery = make_celery(app)
    
    
    @celery.task()
    def add_together(a, b):
        return a + b
    

    Then you can run:

    >>> result = add_together.delay(23, 42)
    >>> result.wait()
    65
    

    Just remember you need to run worker separately:

    celery -A your_application worker
    
    0 讨论(0)
  • 2020-12-03 00:18

    Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.

    I'd need to see a lot tests before I trusted any home rolled code for such an important task.

    Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.

    0 讨论(0)
  • 2020-12-03 00:20

    Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ or Celery. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:

    RQ

    pip install flask-rq
    

    Just remember to install Redis before using it in your Flask app.

    And simply use @Job Decorator in your Flask functions:

    from flask.ext.rq import job
    
    
    @job
    def process(i):
        #  Long stuff to process
    
    
    process.delay(3)
    

    And finally you need rqworker to start the worker:

    rqworker

    You can see RQ docs for more info. RQ designed for simple long running processes.

    Celery

    Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.

    Greenlets

    Greenlets have switches. Let you to switch between long running processes. You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:

    from greenlet import greenlet
    
    def test1():
        print 12
        gr2.switch()
        print 34
    
    def test2():
        print 56
        gr1.switch()
        print 78
    
    gr1 = greenlet(test1)
    gr2 = greenlet(test2)
    gr1.switch()
    
    0 讨论(0)
提交回复
热议问题