Use elephant-bird with hive to read protobuf data

前端 未结 1 1332
-上瘾入骨i
-上瘾入骨i 2021-01-22 01:23

I have a similar problem like this one

The followning are what I used:

  1. CDH4.4 (hive 0.10)
  2. protobuf-java-.2.4.1.jar
  3. elephant-bird-hive-4.6
相关标签:
1条回答
  • 2021-01-22 02:16

    The problem had been solved.

    First I put protobuf binary data directly into HDFS, no result showed.

    Because it doesn't work that way.

    After asking some senior colleagues, they said protobuf binary data should be written into some kind of container, some file format, like hadoop SequenceFile etc.

    The elephant-bird page had written the information too, but first I couldn't understand it completely.

    After writing protobuf binary data into sequenceFile, I can read the protobuf data with hive.

    And because I use sequenceFile format, so I use the create table syntax:

    inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
    outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
    

    Hope it can help others who are new to hadoop, hive, elephant too.

    0 讨论(0)
提交回复
热议问题