Using Luigi, how to read PostgreSQL data and then pass such data to the next task in the workflow?

无人久伴 提交于 2019-12-10 15:42:16

问题


Using Luigi, I want to define a workflow with two "stages":

  • The first one reads data from PostgreSQL.
  • The second one does something with the data.

Thus I've started by subclassing luigi.contrib.postgres.PostgresQuery and overriding host, database, user, etc as stated in the doc.

After that, how to pass the query result to the next task in the workflow? Such next task already specifies in the requires method the above class must be instantiated and returned.

My code:

class MyData(luigi.contrib.postgres.PostgresQuery):

    host = 'my_host'
    database = 'my_db'
    user = 'my_user'
    password = 'my_pass'
    table = 'my_table'
    query = 'select *'

class DoWhateverWithMyData(luigi.Task):

    def requires(self):
        return MyData()

What else is needed?

Thanks in advance!

EDIT 1

Having a look to Luigi's code, it seems nothing is done at the run method of PostgresQuery with the result of the query; I mean, the query is run and that's all:

class PostgresQuery(rdbms.Query):
    """
    Template task for querying a Postgres compatible database

    Usage:
    Subclass and override the required `host`, `database`, `user`, `password`, `table`, and `query` attributes.
    Optionally one can override the `autocommit` attribute to put the connection for the query in autocommit mode.

    Override the `run` method if your use case requires some action with the query result.

    Task instances require a dynamic `update_id`, e.g. via parameter(s), otherwise the query will only execute once

    To customize the query signature as recorded in the database marker table, override the `update_id` property.
    """

    def run(self):
        connection = self.output().connect()
        connection.autocommit = self.autocommit
        cursor = connection.cursor()
        sql = self.query

        logger.info('Executing query from task: {name}'.format(name=self.__class__))
        cursor.execute(sql)

        # Update marker table
        self.output().touch(connection)

        # commit and close connection
        connection.commit()
        connection.close()


    def output(self):
        """
        Returns a PostgresTarget representing the executed query.

        Normally you don't override this.
        """
        return PostgresTarget(
            host=self.host,
            database=self.database,
            user=self.user,
            password=self.password,
            table=self.table,
            update_id=self.update_id
        )

I think I'll have to extend such a class with my own implementation.

EDIT 2

I found this link explaining the same than my above edit.

来源:https://stackoverflow.com/questions/53869425/using-luigi-how-to-read-postgresql-data-and-then-pass-such-data-to-the-next-tas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!