问题
Let's say I have a very simple task like this:
@celery.task(ignore_result=True)
def print_page(page):
with open('path/to/page','w') as f:
f.write(page)
(Please ignore the potential race condition in the above code... this is a simplified example)
My question is whether the following two code samples would produce identical results, or if one is better than the other:
Choice A:
@celery.task(ignore_result=True)
def print_pages(page_generator):
for page in page_generator:
print_page.s(page).apply_async()
Choice B:
@celery.task(ignore_result=True)
def print_pages(page_generator):
g = group(print_page.s(page) for page in page_generator)
g.apply_async()
And in general, I am curious if the above is the correct way to do what I'm doing. Essentially, I have another task that is parses some data and returns a generator which will emit all of the pages of a document. For each page, I want to output it separately.
So, my chain looks something like this (also simplified):
chain = fetch.s(url) | parse.s() | print_pages.s()
chain()
I think it would make more sense if I could somehow emit the generator inside that chain and for the group there (outside of an actual task). But I am not sure if that is practical or ideal. I would really appreciate any help. Thanks!
回答1:
Your first choice seems like the better one. You have no desire to join the results (given that ignore_result=True) of the fanned-out print_pages tasks so a group adds unnecessary overhead/complexity. Just invoke the tasks individually as in choice A and you're fine.
Further though, I'd like to note that Python generators will not pickle so you cannot pass them asynchronously to Celery tasks.
回答2:
both solution is correct in your case there is no depending in the pages tasks but lets assume you have a task divided in sub tasks and all these subtasks are depedant sequentially in this case you should group it by choosing B
来源:https://stackoverflow.com/questions/14614441/in-celery-task-queue-is-running-tasks-in-a-group-any-different-than-multiple-as