Hadoop pig XPath returning empty attribute value

回眸只為那壹抹淺笑 提交于 2019-12-11 16:50:48

问题


I am using cloudera Hadoop 2.6, pig 0.15 versions.

I am trying to extract data from xml file. Below you can see part of xml file.

<product productID="MICROLITEMX1600LAMP">
  <basicInfo>
                <category lang="NL" id="OT1006">Output Accessoires</category>
  </basicInfo>
</product>

I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();   
    allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
    productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
    dump productsOneByOne;

Please help me out to resolve this issue.


回答1:


Adding more to How to extract xml attributes using Xpath in Pig?

Bug is there in XPath.java as it is ignoring 4th parameter.

By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

if(input.size() > 3){
  ignoreNamespace=input.get(3);
}

above code should be added before

if (ignoreNamespace) {
                xpathString = createNameSpaceIgnoreXpathString(xpathString);
 }


来源:https://stackoverflow.com/questions/35887260/hadoop-pig-xpath-returning-empty-attribute-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!