How to load xml file into Hive

前端 未结 2 1625
故里飘歌
故里飘歌 2021-01-06 17:43

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 differe

2条回答
  •  -上瘾入骨i
    2021-01-06 17:53

    You have several options:

    • Load the XML into a Hive table with a string column, one per row (e.g. CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDF to do work on the XML.
    • Since you know the XPath's of what you want (e.g. //section1), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.
    • Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
    • Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

    It depends on your level of experience and comfort with these approaches.

提交回复
热议问题