What are the differences between Cloud Dataflow and Dataprep

走远了吗. 提交于 2019-12-13 03:04:26

问题


Both Dataprep and Dataflow can be used for ETL tasks. In fact Dataprep seems to use Dataflow jobs. Is it that the only difference that Dataprep provides tools to write dataflow jobs with a user interface ?


回答1:


Both dataflow and dataprep can transform data for sure. The main difference is who is using the technology. Does your project need self-service data transformation by data users such as data engineers or business users such as analysts and data scientists? Then dataprep is the choice. This is no coding. Ultimately it generates dataflow jobs. Cloud dataprep offers advanced transformations such as pivoting, unpivoting, aggregations, time series, joins, unions, standardization, and hundreds of other data functions exposed with an intuitive visual interface. Data needs to be in CDS or BigQuery though.




回答2:


Dataprep is a tool for performing ETL on file sources through a UI. Convenient, but relatively limited. Dataflow is a managed service for deploying ETL pipelines written using the apache beam programming model, useful for both batch and streaming data, and can potentially be used with whatever data sources you want (e.g. Kafka, pubsub, datastore, JDBC...). Dataprep is more limited to GCS and BigQuery.



来源:https://stackoverflow.com/questions/56329619/what-are-the-differences-between-cloud-dataflow-and-dataprep

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!