问题
I want to use Celery for a Url grabber.
I have a list of Url, and I must do a HTTP request on every URL and write the result in a file (same file for the whole list).
My first idea was to make this code in the task which is called by Celery beat every n minutes :
@app.task
def get_urls(self):
results = [get_url_content.si(
url=url
) for url in urls]
ch = chain(
group(*results),
write_result_on_disk.s()
)
return ch()
This code works pretty well, but there is 1 problem : I have a thousand of URL to grab, if 1 of the get_url_content fails, the write_result_on_disk is not called and we lose all the previous grabbed contents.
What I want to do is to chunk the tasks by splitting the URLs, grab their result and write it on disk. For example the contents of 20 urls are written on disk.
Do you have an idea please ? I tried the chunks()
function but did not got really useful results.
回答1:
Using CeleryBeat for cron-like tasks is a good idea.
I would try to catch exceptions in your get_url_content
micro-tasks. Just return something else when you catch them. This way, you can evaluate (e.g. count, list, inspect) them in a summarize_task.
How to use chunks and chain chunks with another task:
Step 1: Convert the chunk to a group:
As described in http://docs.celeryproject.org/en/latest/userguide/canvas.html#chunks, .group()
transforms an object of type celery.canvas.chunks
into a group, which is a much more common type in Celery.
Step 2: Chain a group and a task
The "Blow your mind by combining" section in http://docs.celeryproject.org/en/latest/userguide/canvas.html#the-primitives mentions:
Chaining a group together with another task will automatically upgrade it to be a chord
Here is some code with the two tasks and how I usually call them:
@app.task
def solve_micro_task(arg: str) -> dict:
...
@app.task
def summarize(items: List[List[dict]]):
flat_list = [item for sublist in items for item in sublist]
for report in flat_list:
...
chunk_tasks = solve_micro_task.chunks(<your iterator, e.g., a list>), 10) # type: celery.canvas.chunks
summarize_task = summarize.s()
chain(chunk_tasks.group(), summarize_task)()
来源:https://stackoverflow.com/questions/45082707/combining-chains-groups-and-chunks-with-celery