What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

后端 未结 5 1371
孤街浪徒
孤街浪徒 2021-01-31 01:43

I am using Google Data Flow to implement an ETL data ware house solution.

Looking into google cloud offering, it seems DataProc can also do the same thing.

It

5条回答
  •  旧时难觅i
    2021-01-31 02:41

    Here are three main points to consider while trying to choose between Dataproc and Dataflow

    • Provisioning
      Dataproc - Manual provisioning of clusters
      Dataflow - Serverless. Automatic provisioning of clusters

    • Hadoop Dependencies
      Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.

    • Portability
      Dataflow/Beam provides a clear separation between processing logic and the underlying execution engine. This helps with portability across different execution engines that support the Beam runtime, i.e. the same pipeline code can run seamlessly on either Dataflow, Spark or Flink.

    This flowchart from the google website explains how to go about choosing one over the other.

    https://cloud.google.com/dataflow/images/flow-vs-proc-flowchart.svg

    Further details are available in the below link
    https://cloud.google.com/dataproc/#fast--scalable-data-processing

提交回复
热议问题