Apache Airflow or Apache Beam for data processing and job scheduling

前端 未结 4 1424
难免孤独
难免孤独 2021-01-30 10:57

I\'m trying to give useful information but I am far from being a data engineer.

I am currently using the python library pandas to execute a long series of transformation

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-30 11:16

    Apache Airflow is not a data processing engine.

    Airflow is a platform to programmatically author, schedule, and monitor workflows.

    Cloud Dataflow is a fully-managed service on Google Cloud that can be used for data processing. You can write your Dataflow code and then use Airflow to schedule and monitor Dataflow job. Airflow also allows you to retry your job if it fails (number of retries is configurable). You can also configure in Airflow if you want to send alerts on Slack or email, if your Dataflow pipeline fails.

提交回复
热议问题