问题
In below Hive-query, I need to read the null / empty "string" tags as well, from the XML content. Only the non-null "string" tags are getting considered within the XPATH()
list now.
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select Name, Value
from your_data d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;
Expected output from query:
Name Value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555
回答1:
The problem here is that XPATH
returns NodeList and if it contains empty node, it is not included in the list.
Concatenation with some string (in XPATH): concat(/Value/string/text()," ")
does not work here:
Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)
Easy solution is to replace <string></string>
and <string/>
with <string>NULL</string>
and then you can convert 'NULL' string to null.
Demo:
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then null else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
Result:
name value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555
来源:https://stackoverflow.com/questions/65594410/in-hive-how-to-read-through-null-empty-tags-present-within-an-xml-using-explo