How to load xml file into Hive

前端 未结 2 1626
故里飘歌
故里飘歌 2021-01-06 17:43

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 differe

相关标签:
2条回答
  • 2021-01-06 17:53

    You have several options:

    • Load the XML into a Hive table with a string column, one per row (e.g. CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDF to do work on the XML.
    • Since you know the XPath's of what you want (e.g. //section1), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.
    • Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
    • Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

    It depends on your level of experience and comfort with these approaches.

    0 讨论(0)
  • 2021-01-06 17:56

    Use this:

    CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1'
    
    tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");
    

    And then use xpath function

    0 讨论(0)
提交回复
热议问题