I\'m trying to load XML data into Hive but I\'m getting an error :
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runt
Find Jar here -- > Brickhouse ,
sample example here --> Example
similar example in stackoverflow - here
Solution:
--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;
-- check contents
SELECT * from xmltable;
-- create view
Drop view MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
xpath(xmldata, 'catalog/book/id/text()'),
xpath(xmldata, 'catalog/book/genre/text()'),
xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;
-- check view
SELECT id, genre,price FROM MyxmlView;
ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar; --Add brickhouse jar
CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';
SELECT
array_index( id, n ) as my_id,
array_index( genre, n ) as my_genre,
array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;
Output:
hive > SELECT
> array_index( id, n ) as my_id,
> array_index( genre, n ) as my_genre,
> array_index( price, n ) as my_price
> from MyxmlView
> lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%, reduce = 0%
2014-07-09 05:36:48,226 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id my_genre my_price
11 Computer 44
44 Fantasy 5
Time taken: 8.541 seconds, Fetched: 2 row(s)
Adding-more-info as requested by Question owner:
Oracle XML Extensions for Hive can be used to create Hive tables over XML like this. https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691
Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag. In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them. So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []
Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}
First try to load file my add file path-to-file, that will solve your problem as It is solved in my case
Reason for error :
1) case-1 : (your case) - xml content is being fed to hive as line by line.
input xml:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
<id>11</id>
<genre>Computer</genre>
<price>44</price>
</book>
<book>
<id>44</id>
<genre>Fantasy</genre>
<price>5</price>
</book>
</catalog>
check in hive :
select count(*) from xmltable; // return 13 rows - means each line in individual row with col xmldata
Reason for err :
XML is being read as 13 pieces not at unified. so invalid XML
2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source
input.xml
<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>
check in hive:
select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.
Means :
xmldata = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>
then apply your xpathUDF like this
select xpath(xmldata, 'xpath_expression_string' ) from xmltable
then follow the below steps to get the solution as like as you want, just change the source data this
<catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>
now try below steps:
select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;
now you will get ans as like this:
["11"] ["Computer"] ["44"]
["44"] ["Fantasy"] ["5"]
if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like
11 computer 44
44 Fantasy 5.
Thanks