What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

后端 未结 5 1351
孤街浪徒
孤街浪徒 2021-01-31 01:43

I am using Google Data Flow to implement an ETL data ware house solution.

Looking into google cloud offering, it seems DataProc can also do the same thing.

It

5条回答
  •  囚心锁ツ
    2021-01-31 02:32

    Same reason as why Dataproc offers both Hadoop and Spark: sometimes one programming model is the best fit for the job, sometimes the other. Likewise, in some cases the best fit for the job is the Apache Beam programming model, offered by Dataflow.

    In many cases, a big consideration is that one already has a codebase written against a particular framework, and one just wants to deploy it on the Google Cloud, so even if, say, the Beam programming model is superior to Hadoop, someone with a lot of Hadoop code might still choose Dataproc for the time being, rather than rewriting their code on Beam to run on Dataflow.

    The differences between Spark and Beam programming models are quite large, and there are a lot of use cases where each one has a big advantage over the other. See https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison .

提交回复
热议问题