问题
Consider a situation where a task depends on another through a dynamic dependency:
import luigi
from luigi import Task, TaskParameter, IntParameter
class TaskA(Task):
parent = TaskParameter()
arg = IntParameter(default=0)
def requires(self):
return self.parent()
def run(self):
print(f"task A arg = {self.arg}")
class TaskB(Task):
arg = IntParameter(default=0)
def run(self):
print(f"task B arg = {self.arg}")
if __name__ == "__main__":
luigi.run(["TaskA", "--parent" , "TaskB", "--arg", "1", "--TaskB-arg", "2"])
(Notice the default arg=0
Parameter).
Using the luigi.run()
interface, this works. As you can see, TaskA
is given two arguments: parent=TaskB
and arg=1
. Furthermore TaskB
is also given argument arg=2
by using the syntax --TaskB-arg
.
Scheduled 2 tasks of which:
* 1 ran successfully:
- 1 TaskB(arg=2)
* 1 failed:
- 1 TaskA(parent=TaskB, arg=1)
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
(In this example tasks failed because TaskB
is not writing its output to a file that TaskA
can read. But that's just to keep the example short. The important point is that both TaskA
and TaskB
are passed the correct arg
).
My problem now is: how do I do the exact same thing, but using the luigi.build()
interface? There's two reasons why I want to do this: First is that the source code says that luigi.run()
shouldn't be used. But second, I can't run more than one luigi.run()
per process, but I can do so with luigi.build()
. This is important because I want to do something like:
if __name__ == "__main__":
for i in range(3):
luigi.run(["TaskA", "--parent" , "TaskB", "--arg", f"{i}", "--TaskB-arg", f"{i}"])
However if you try this you get the error:
Pid(s) {10084} already running
So, in the luigi.build()
interface you're supposed to pass it a list of the tasks instantiated with their parameters:
if __name__ == "__main__":
for i in range(3):
luigi.build([TaskA(parent=TaskB, arg=i)])
This does what's expected with regards to TaskA
, but TaskB
takes the default arg=0
.
So question: how to pass arguments to dependencies using luigi.build()
interface?
Here's things that I've tried and don't work:
A)
if __name__ == "__main__":
for i in range(3):
luigi.build([TaskA(parent=TaskB, arg=i), TaskB(arg=i)])
Doesn't work because two instances of TaskB
are ran: one with the default (wrong) arg, which TaskA
depends on, and one with the correct arg, which TaskA
doesn't depend on.
B)
if __name__ == "__main__":
for i in range(3):
luigi.build([TaskA(parent=TaskB(arg=i), arg=i)])
TypeError: 'TaskB' object is not callable
C)
if __name__ == "__main__":
for i in range(3):
luigi.build([TaskA(parent=TaskB, arg=i)], "--TaskB-arg", f"{i}")
Getting desperate. I tried something like the old interface, but doesn't work:
AttributeError: 'str' object has no attribute 'create_remote_scheduler'
回答1:
I believe that your problem is that you are passing the parent as a class and not a Task object. Try to pass it like this:
luigi.build([TaskA(parent=TaskB(arg=i), ...)])
Edit: You may then need to modify TaskA
because you have
def requires(self):
return self.parent()
which constructs the parent as a TaskB
object with default params.
Edit2: This design model is actually not encouraged. If you are running with multiple workers, then this will not pickle-depickle correctly. I would recommend creating a new ParameterizedTaskParameter
(or some better name) that pickles a task instance and stores it as an object parameter does.
来源:https://stackoverflow.com/questions/64837259/luigi-how-to-pass-arguments-to-dependencies-using-luigi-build-interface