问题
Can MongoDB be used as a datasource to Apache Flink for processing the Streaming Data?
What is the native implementation of Apache Flink to use No-SQL Database as data source?
回答1:
Currently, Flink does not have a dedicated connector to read from MongoDB. What you can do is the following:
- Use
StreamExecutionEnvironment.createInput
and provide a Hadoop input format for MongoDB using Flink's wrapper input format - Implement your own MongoDB source via implementing
SourceFunction
/ParallelSourceFunction
The former should give you at-least-once processing guarantees since the MongoDB collection is completely re-read in case of a recovery. Depending on the functionality of the MongoDB client, you might be able to implement exactly-once processing guarantees with the latter approach.
来源:https://stackoverflow.com/questions/44153519/mongodb-as-datasource-to-flink