How to read and write to HBase in flink streaming job

强颜欢笑 提交于 2019-12-24 09:04:36

问题


If we have to read and write to HBASE in a streaming application how could we do that. We open a connection via open method for write, how could we open a connection for read.

object test {

    if (args.length != 11) {
      //print args
      System.exit(1)
    }

    val Array() = args
    println("Parameters Passed " + ...);

    val env = StreamExecutionEnvironment.getExecutionEnvironment


    val properties = new Properties()
    properties.setProperty("bootstrap.servers", metadataBrokerList)
    properties.setProperty("zookeeper.connect", zkQuorum)
    properties.setProperty("group.id", group)


    val messageStream = env.addSource(new FlinkKafkaConsumer08[String](topics, new SimpleStringSchema(), properties))

    messageStream.map { x => getheader(x) }





    def getheader(a: String) {

        //Get header and parse and split the headers
                if (metadata not available hit HBASE) { //Device Level send(Just JSON)

            //How to read from HBASE here .

                      } 
                      //If the resultset is not available in Map fetch from phoenix
                      else {
                          //fetch from cache
                      }
     }




    }
   messageStream.writeUsingOutputFormat(new HBaseOutputFormat());
   env.execute()

}

Now inside the method getheader if i want to read from HBASE inside if(metadata not available hit HBASE) how could i do that. I don't want to open a connection here, the idea is to maintain a single connection for a thread and reuse that, like flink does with HBASE sink with open() method or how spark does with foreachpartition. I tried this but i cannot pass StreamExecutionEnvironment to methods. How could i achieve this,could someone provide a snippet?


回答1:


You want to read from / write to Apache HBase from a streaming user-function. The HBaseReadExample that you linked is doing something different: it reads an HBase table into a DataSet (the batch processing abstraction of Flink). Using this code in a user-function would mean to start a Flink program from within a Flink program.

For your use case, you need to directly create an HBase client in your user function and interact with it. The best way to do this is to use a RichFlatMapFunction and create the connection to HBase in the open() method.

The next version of Flink (1.2.0) will feature support for asynchronous I/O operations in user functions which should improve the throughput of applications significantly.



来源:https://stackoverflow.com/questions/40262790/how-to-read-and-write-to-hbase-in-flink-streaming-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!