Currently, I have a bunch of luigi tasks queued together, with a simple dependency chain( a -> b -> c -> d
). d
gets executed first, and
First a comment: Luigi tasks are idempotent. if you run a task with the same parameter values, no matter how many times you run it, it must always return the same outputs. So it doesn't make sense to run it more than once. This makes Luigi powerful: if you have a big task that makes a lot of things an takes a lot of time and it fails somewhere, you'll have to run it again from the beginning. If you split it into smaller tasks, run it and it fails, you'll only have to run the rest of the tasks in the pipeline.
When you run a task Luigi checks out the outputs of that task to see if they exist. If they don't, Luigi checks out the outputs of the tasks it depends on. If they exists, then it will only run the current task and generate the output Target
. If the dependencies outputs doesn't exists, then it will run that tasks.
So, if you want to rerun a task you must delete its Target
outputs. And if you want to rerun the whole pipeline you must delete all the outputs of all the tasks that tasks depends on in cascade.
There's an ongoing discussion in this issue in Luigi repository. Take a look at this comment since it will point you to some scripts for getting the output targets of a given task and removing them.