Spark rdd write to Hbase

烂漫一生 提交于 2019-12-24 01:24:22

问题


I am able to read the messages from Kafka using the below code:

val ssc = new StreamingContext(sc, Seconds(50)) 
val topicmap = Map("test" -> 1)
val lines = KafkaUtils.createStream(ssc,"127.0.0.1:2181", "test-consumer-group",topicmap)

But, I am trying to read each message from Kafka and putting into HBase. This is my code to write into HBase but no success.

lines.foreachRDD(rdd => {
  rdd.foreach(record => {
    val i = +1
    val hConf = new HBaseConfiguration() 
    val hTable = new HTable(hConf, "test") 
    val thePut = new Put(Bytes.toBytes(i)) 
    thePut.add(Bytes.toBytes("cf"), Bytes.toBytes("a"), Bytes.toBytes(record)) 
  })
})

回答1:


Well, you are not actually executing the Put, you are mereley creating a Put request and adding data to it. What you are missing is an

hTable.put(thePut);



回答2:


Adding other answer!!

You can use foreachPartition to establish connection at executor level to be more efficient instead of each row which is costly operation.

lines.foreachRDD(rdd => {

    rdd.foreachPartition(iter => {

      val hConf = new HBaseConfiguration() 
      val hTable = new HTable(hConf, "test") 

      iter.foreach(record => {
        val i = +1
        val thePut = new Put(Bytes.toBytes(i)) 
        thePut.add(Bytes.toBytes("cf"), Bytes.toBytes("a"), Bytes.toBytes(record)) 

        //missing part in your code
        hTable.put(thePut);
      })
    })
})


来源:https://stackoverflow.com/questions/27246386/spark-rdd-write-to-hbase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!