If we have to read and write to HBASE in a streaming application how could we do that. We open a connection via open method for write, how could we open a connection for read.
object test {
if (args.length != 11) {
//print args
val Array() = args
println("Parameters Passed " + ...);
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", metadataBrokerList)
properties.setProperty("zookeeper.connect", zkQuorum)
properties.setProperty("group.id", group)
val messageStream = env.addSource(new FlinkKafkaConsumer08[String](topics, new SimpleStringSchema(), properties))
messageStream.map { x => getheader(x) }
def getheader(a: String) {
//Get header and parse and split the headers
if (metadata not available hit HBASE) { //Device Level send(Just JSON)
//How to read from HBASE here .
//If the resultset is not available in Map fetch from phoenix
else {
//fetch from cache
messageStream.writeUsingOutputFormat(new HBaseOutputFormat());
Now inside the method getheader
if i want to read from HBASE inside if(metadata not available hit HBASE)
how could i do that. I don't want to open a connection here, the idea is to maintain a single connection for a thread and reuse that, like flink does with HBASE sink with open() method or how spark does with foreachpartition. I tried this but i cannot pass StreamExecutionEnvironment to methods. How could i achieve this,could someone provide a snippet?
You want to read from / write to Apache HBase from a streaming user-function. The HBaseReadExample that you linked is doing something different: it reads an HBase table into a DataSet (the batch processing abstraction of Flink). Using this code in a user-function would mean to start a Flink program from within a Flink program.
For your use case, you need to directly create an HBase client in your user function and interact with it. The best way to do this is to use a RichFlatMapFunction
and create the connection to HBase in the open()
The next version of Flink (1.2.0) will feature support for asynchronous I/O operations in user functions which should improve the throughput of applications significantly.