In Hive, how to read through NULL / empty tags present within an XML using explode(XPATH(..)) function?

杀马特。学长 韩版系。学妹 提交于 2021-01-28 11:51:40

问题


In below Hive-query, I need to read the null / empty "string" tags as well, from the XML content. Only the non-null "string" tags are getting considered within the XPATH() list now.

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>

        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select Name, Value 
  from your_data d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;

Expected output from query:

Name    Value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

回答1:


The problem here is that XPATH returns NodeList and if it contains empty node, it is not included in the list.

Concatenation with some string (in XPATH): concat(/Value/string/text()," ") does not work here:

Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!

at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)

Easy solution is to replace <string></string> and <string/> with <string>NULL</string> and then you can convert 'NULL' string to null.

Demo:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select name, case when value='NULL' then null else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value

Result:

name    value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555


来源:https://stackoverflow.com/questions/65594410/in-hive-how-to-read-through-null-empty-tags-present-within-an-xml-using-explo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!