Concat Avro files using avro-tools

后端 未结 2 1367
醉酒成梦
醉酒成梦 2021-02-06 03:18

Im trying to merge avro files into one big file, the problem is concat command does not accept the wildcard

hadoop jar avro-tools.jar concat /input/         


        
相关标签:
2条回答
  • 2021-02-06 03:31

    Instead of hadoop jar avro-tools.jar one can run java -jar avro-tools.jar, since you don't need hadoop for this operation.

    0 讨论(0)
  • 2021-02-06 03:47

    I quickly checked Avro's source code (1.7.7) and it seems that concat does not support glob patterns (basically, they call FileSystem.open() on each argument except the last one).

    It means that you have to explicitly provide all the filenames as argument. It is cumbersome, but following command should do what you want:

    IN=$(hadoop fs -ls /input/part* | awk '{printf "%s ", $NF}')
    hadoop jar avro-tools.jar concat ${IN} /output/bigfile.avro
    

    It would be a nice addition to add support of glob pattern to this command.

    0 讨论(0)
提交回复
热议问题