hive xml serDe : table is empty

前端 未结 5 533
我在风中等你
我在风中等你 2021-01-21 12:38

I want to store xml data into hive table, XML data :


   1266 
     

        
5条回答
  •  一整个雨季
    2021-01-21 13:14

    I came out through the same Problem when dealing with XML Serde. After some struggle, I fixed it by using the "Load data" statement separately and avoiding addition of "LOCATION" property in "CREATE" statement. the following is my XML data.

    
            200000     
            
                F
                1
                1
                2
                2
                0
                1
                1
                1
                4
                0
                2
                2
            
            
                18
                1.003392
                2.740608
                0
            
        
    

    CREATE TABLE Statement:

    CREATE TABLE xml_bank(customer_id STRING, income BIGINT, demographics map, financial map)
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
    WITH SERDEPROPERTIES (
    "column.xpath.customer_id"="/record/@customer_id",
    "column.xpath.income"="/record/income/text()",
    "column.xpath.demographics"="/record/demographics/*",
    "column.xpath.financial"="/record/financial/*"
    )
    STORED AS
    INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    TBLPROPERTIES (
    "xmlinput.start"="

    CREATE Query Result:

    OK
    Time taken: 0.925 seconds
    hive>
    

    for the above create statement, I used the following "LOAD DATA" statement to load the data contained in an XML file in to the above created table.

    hive> load data local inpath '/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml' overwrite into table xml_bank6;
    

    LOAD Query Result:

    Copying data from file:/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml
    Copying file: file:/home/mahesh/hive_input_datasets/XMLdata/XMLdatafile.xml
    Loading data to table default.xml_bank6
    Table default.xml_bank6 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 500, raw_data_size: 0]
    OK
    Time taken: 0.879 seconds
    hive>
    

    And finally,

    SELECT Query and Result:

    hive> select * from xml_bank6;
    OK
    0000-JTALA  200000  {"empcat":"2","jobcat":"2","residecat":"4","retire":"0","hometype":"2","addresscat":"2","homeown":"0","spousedcat":"1","gender":"F","jobsat":"1","edcat":"1","marital":"1","agecat":"1"}    {"default":"0","income":"18","othdebt":"2.740608","creddebt":"1.003392"}
    Time taken: 0.149 seconds, Fetched: 1 row(s)
    hive>
    

    And in the above query i would suggest the value for "xmlinput.start" as ", instead of ",because the XML start tag is in the pattern .I believe this would be helpful for you.

提交回复
热议问题