SPARK to HBase writing

霸气de小男生 提交于 2019-12-08 06:25:02

问题


The flow in my SPARK program is as follows:

Driver --> Hbase connection created --> Broadcast the Hbase handle Now from executors , we fetch this handle and trying to write into hbase

In Driver program, I'm creating HBase conf object and Connection Object and then broadcasting it through JavaSPARK Context as follows:

     SparkConf sparkConf = JobConfigHelper.getSparkConfig();

        Configuration conf = new Configuration();
        UserGroupInformation.setConfiguration(conf);

        jsc = new JavaStreamingContext(sparkConf,
                                Durations.milliseconds(Long.parseLong(batchDuration)));

        Configuration hconf=HBaseConfiguration.create();
        hconf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
        hconf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
        UserGroupInformation.setConfiguration(hconf);

    JavaSparkContext js = jsc.sparkContext();
Connection connection = ConnectionFactory.createConnection(hconf);
        connectionbroadcast=js.broadcast(connection);

Inside call() method of the executor,

Table table = connectionbroadcast.getValue().getTable(TableName.valueOf("gfttsdgn:FRESHHBaseRushi")) ;

                                Put p = new Put(Bytes.toBytes("row1"));

                                p.add(Bytes.toBytes("c1"), Bytes.toBytes("output"), Bytes.toBytes("rohan"));
                                table.put(p);

Getting following exception when trying to run in yarn-client mode:

17/03/02 09:19:38 ERROR yarn.ApplicationMaster: User class threw exception: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
    Serialization trace:
    classes (sun.misc.Launcher$AppClassLoader)
    classLoader (org.apache.hadoop.conf.Configuration)
    conf (org.apache.hadoop.hbase.client.RpcRetryingCallerFactory)
    rpcCallerFactory (org.apache.hadoop.hbase.client.AsyncProcess)
    asyncProcess (org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation)
    com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
    Serialization trace:
    classes (sun.misc.Launcher$AppClassLoader)
    classLoader (org.apache.hadoop.conf.Configuration)
    conf (org.apache.hadoop.hbase.client.RpcRetryingCallerFactory)
    rpcCallerFactory (org.apache.hadoop.hbase.client.AsyncProcess)
    asyncProcess (org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
        at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
        at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:194)
        at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203)
        at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
        at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
        at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
        at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1337)
        at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:639)
        at com.citi.fresh.core.driver.FreshDriver.main(FreshDriver.java:178)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
    Caused by: java.util.ConcurrentModificationException
        at java.util.Vector$Itr.checkForComodification(Vector.java:1156)
        at java.util.Vector$Itr.next(Vector.java:1133)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:67)
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
        at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
        ... 28 more

回答1:


I can see that you are trying to bulk put data into HBase using Spark. As @jojo_Berlin explained, your Hbase Conf is not thread safe. However, you can easily achieve this by using SparkOnHbase.

Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
hbaseContext.bulkPut(rdd, TableName.valueOf("gfttsdgn:FRESHHBaseRushi"), new PutFunction(), true);

Where your 'put' function is:

public static class PutFunction implements Function<String, Put> {
   public Put call(String v) throws Exception {
      Put put = new Put(Bytes.toBytes(v));
      put.add(Bytes.toBytes("c1"), Bytes.toBytes("output"),
      Bytes.toBytes("rohan"));
      return put;
     }
}


来源:https://stackoverflow.com/questions/42558798/spark-to-hbase-writing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!