I want to store xml data into hive table, XML data :
1266
I came out through the same Problem when dealing with XML Serde. After some struggle, I fixed it by using the "Load data" statement separately and avoiding addition of "LOCATION" property in "CREATE" statement. the following is my XML data.
200000
F
1
1
2
2
0
1
1
1
4
0
2
2
18
1.003392
2.740608
0
CREATE TABLE xml_bank(customer_id STRING, income BIGINT, demographics map, financial map)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.customer_id"="/record/@customer_id",
"column.xpath.income"="/record/income/text()",
"column.xpath.demographics"="/record/demographics/*",
"column.xpath.financial"="/record/financial/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"=""
);
OK
Time taken: 0.925 seconds
hive>
for the above create statement, I used the following "LOAD DATA" statement to load the data contained in an XML file in to the above created table.
hive> load data local inpath '/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml' overwrite into table xml_bank6;
Copying data from file:/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml
Copying file: file:/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml
Loading data to table default.xml_bank6
Table default.xml_bank6 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 500, raw_data_size: 0]
OK
Time taken: 0.879 seconds
hive>
And finally,
hive> select * from xml_bank6;
OK
0000-JTALA 200000 {"empcat":"2","jobcat":"2","residecat":"4","retire":"0","hometype":"2","addresscat":"2","homeown":"0","spousedcat":"1","gender":"F","jobsat":"1","edcat":"1","marital":"1","agecat":"1"} {"default":"0","income":"18","othdebt":"2.740608","creddebt":"1.003392"}
Time taken: 0.149 seconds, Fetched: 1 row(s)
hive>
And in the above query i would suggest the value for "xmlinput.start"
as "
"
.I believe this would be helpful for you.