发表新帖

发表新帖

What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

后端未结

关注

 5  1371

孤街浪徒 2021-01-31 01:43

I am using Google Data Flow to implement an ETL data ware house solution.

Looking into google cloud offering, it seems DataProc can also do the same thing.

It

5条回答

旧时难觅i (楼主)

2021-01-31 02:41
Here are three main points to consider while trying to choose between Dataproc and Dataflow
- Provisioning
  Dataproc - Manual provisioning of clusters
  Dataflow - Serverless. Automatic provisioning of clusters
- Hadoop Dependencies
  Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.
- Portability
  Dataflow/Beam provides a clear separation between processing logic and the underlying execution engine. This helps with portability across different execution engines that support the Beam runtime, i.e. the same pipeline code can run seamlessly on either Dataflow, Spark or Flink.
This flowchart from the google website explains how to go about choosing one over the other.

https://cloud.google.com/dataflow/images/flow-vs-proc-flowchart.svg

Further details are available in the below link
https://cloud.google.com/dataproc/#fast--scalable-data-processing
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题