问题
I am using cloudera Hadoop 2.6, pig 0.15 versions.
I am trying to extract data from xml file. Below you can see part of xml file.
<product productID="MICROLITEMX1600LAMP">
<basicInfo>
<category lang="NL" id="OT1006">Output Accessoires</category>
</basicInfo>
</product>
I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
dump productsOneByOne;
Please help me out to resolve this issue.
回答1:
Adding more to How to extract xml attributes using Xpath in Pig?
Bug is there in XPath.java as it is ignoring 4th parameter.
By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java
if(input.size() > 3){
ignoreNamespace=input.get(3);
}
above code should be added before
if (ignoreNamespace) {
xpathString = createNameSpaceIgnoreXpathString(xpathString);
}
来源:https://stackoverflow.com/questions/35887260/hadoop-pig-xpath-returning-empty-attribute-value