Loading xml data into hive table :org.apache.hadoop.hive.ql.metadata.HiveException

后端 未结 6 2221
借酒劲吻你
借酒劲吻你 2021-02-06 14:48

I\'m trying to load XML data into Hive but I\'m getting an error :

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runt

相关标签:
6条回答
  • 2021-02-06 14:59

    Find Jar here -- > Brickhouse ,

    sample example here --> Example

    similar example in stackoverflow - here

    Solution:

    --Load xml data to table
    DROP table xmltable;
    Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
    LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;
    
    -- check contents
    SELECT * from xmltable;
    
    -- create view
    Drop view  MyxmlView;
    CREATE VIEW MyxmlView(id, genre, price) AS
    SELECT
     xpath(xmldata, 'catalog/book/id/text()'),
     xpath(xmldata, 'catalog/book/genre/text()'),
     xpath(xmldata, 'catalog/book/price/text()')
    FROM xmltable;
    
    -- check view
    SELECT id, genre,price FROM MyxmlView;
    
    
    ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 
    
    CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
    CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';
    
    SELECT 
       array_index( id, n ) as my_id,
       array_index( genre, n ) as my_genre,
       array_index( price, n ) as my_price
    from MyxmlView
    lateral view numeric_range( size( id )) MyxmlView as n;
    

    Output:

    hive > SELECT
         >    array_index( id, n ) as my_id,
         >    array_index( genre, n ) as my_genre,
         >    array_index( price, n ) as my_price
         > from MyxmlView
         > lateral view numeric_range( size( id )) MyxmlView as n;
    Automatically selecting local only mode for query
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Execution log at: /tmp/vijay/.log
    Job running in-process (local Hadoop)
    Hadoop job information for null: number of mappers: 0; number of reducers: 0
    2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
    2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
    Ended Job = job_local_0001
    Execution completed successfully
    Mapred Local Task Succeeded . Convert the Join into MapJoin
    OK
    my_id      my_genre      my_price
    11      Computer        44
    44      Fantasy 5
    

    Time taken: 8.541 seconds, Fetched: 2 row(s)

    Adding-more-info as requested by Question owner:

    enter image description here enter image description here

    0 讨论(0)
  • 2021-02-06 14:59

    Oracle XML Extensions for Hive can be used to create Hive tables over XML like this. https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691

    0 讨论(0)
  • 2021-02-06 15:00

    Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag. In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them. So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []

    Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -

    Diagnostic Messages for this Task:
    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}
    
    0 讨论(0)
  • 2021-02-06 15:07

    First try to load file my add file path-to-file, that will solve your problem as It is solved in my case

    0 讨论(0)
  • 2021-02-06 15:08

    Reason for error :

    1) case-1 : (your case) - xml content is being fed to hive as line by line.

    input xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <catalog>
    <book>
      <id>11</id>
      <genre>Computer</genre>
      <price>44</price>
    </book>
    <book>
      <id>44</id>
      <genre>Fantasy</genre>
      <price>5</price>
    </book>
    </catalog>  
    

    check in hive :

    select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata  
    

    Reason for err :

    XML is being read as 13 pieces not at unified. so invalid XML

    2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source

    input.xml

    <?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>
    

    check in hive:

    select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.
    

    Means :

    xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>
    

    then apply your xpathUDF like this

    select xpath(xmldata, 'xpath_expression_string' ) from xmltable
    
    0 讨论(0)
  • 2021-02-06 15:12

    then follow the below steps to get the solution as like as you want, just change the source data this

     <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
    <catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog> 
    

    now try below steps:

    select xpath(xmldata, '/catalog/book/id/text()')as id,
    xpath(xmldata, '/catalog/book/genre/text()')as genre,
    xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;
    

    now you will get ans as like this:

    ["11"] ["Computer"] ["44"]

    ["44"] ["Fantasy"] ["5"]

    if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like

    11 computer 44

    44 Fantasy 5.

    Thanks

    0 讨论(0)
提交回复
热议问题