Saving the data from SparkStreaming Workers to Database

萝らか妹 提交于 2019-12-11 05:07:54

问题


In SparkStreaming should we off load the saving part to another layer because SparkStreaming context is not available when we use SparkCassandraConnector if our database is cassandra. Moreover, even if we use some other database to save our data then we need to create connection on the worker every time we process a batch of rdds. Reason being connection objects are not serialized.

  1. Is it recommended to create/close connections at workers?

  2. It would make our system tightly coupled with the existing database tomorrow we may change the database


回答1:


To answer your questions:

  1. Yes, it is absolutely fine to create/close connections at workers. But, make sure you don't do it for each and every record. It is recommended to do it at the partition level or at a level where connections are created/closed for a group of records.
  2. You can decouple it by passing a variable and deciding on the type of DB connection at runtime.

Possible duplicate of: Handle database connection inside spark streaming

Read this link, it should clarify some of you questions Design Patterns for using foreachRDD

Hope this help!



来源:https://stackoverflow.com/questions/39576295/saving-the-data-from-sparkstreaming-workers-to-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!