问题
Using Luigi, I want to define a workflow with two "stages":
- The first one reads data from PostgreSQL.
- The second one does something with the data.
Thus I've started by subclassing luigi.contrib.postgres.PostgresQuery
and overriding host, database, user, etc as stated in the doc.
After that, how to pass the query result to the next task in the workflow? Such next task already specifies in the requires
method the above class must be instantiated and returned.
My code:
class MyData(luigi.contrib.postgres.PostgresQuery):
host = 'my_host'
database = 'my_db'
user = 'my_user'
password = 'my_pass'
table = 'my_table'
query = 'select *'
class DoWhateverWithMyData(luigi.Task):
def requires(self):
return MyData()
What else is needed?
Thanks in advance!
EDIT 1
Having a look to Luigi's code, it seems nothing is done at the run
method of PostgresQuery with the result of the query; I mean, the query is run and that's all:
class PostgresQuery(rdbms.Query):
"""
Template task for querying a Postgres compatible database
Usage:
Subclass and override the required `host`, `database`, `user`, `password`, `table`, and `query` attributes.
Optionally one can override the `autocommit` attribute to put the connection for the query in autocommit mode.
Override the `run` method if your use case requires some action with the query result.
Task instances require a dynamic `update_id`, e.g. via parameter(s), otherwise the query will only execute once
To customize the query signature as recorded in the database marker table, override the `update_id` property.
"""
def run(self):
connection = self.output().connect()
connection.autocommit = self.autocommit
cursor = connection.cursor()
sql = self.query
logger.info('Executing query from task: {name}'.format(name=self.__class__))
cursor.execute(sql)
# Update marker table
self.output().touch(connection)
# commit and close connection
connection.commit()
connection.close()
def output(self):
"""
Returns a PostgresTarget representing the executed query.
Normally you don't override this.
"""
return PostgresTarget(
host=self.host,
database=self.database,
user=self.user,
password=self.password,
table=self.table,
update_id=self.update_id
)
I think I'll have to extend such a class with my own implementation.
EDIT 2
I found this link explaining the same than my above edit.
来源:https://stackoverflow.com/questions/53869425/using-luigi-how-to-read-postgresql-data-and-then-pass-such-data-to-the-next-tas