Android SAX parser not getting full text from between tags

前端 未结 3 667
醉酒成梦
醉酒成梦 2020-11-30 02:29

I\'ve created my own DefaultHandler to parse rss feeds and for most feeds it\'s working fine, however, for ESPN, it is cutting off part of the article url due to the way ESP

相关标签:
3条回答
  • 2020-11-30 03:00

    As you can see, it's cutting everything off the url from the ampersand escape code and after.

    From the documentation of the characters() method:

    The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

    When I write SAX parsers, I use a StringBuilder to append everything passed to characters():

    public void characters (char ch[], int start, int length) {
        if (buf!=null) {
            for (int i=start; i<start+length; i++) {
                buf.append(ch[i]);
            }
        }
    }
    

    Then in endElement(), I take the contents of the StringBuilder and do something with it. That way, if the parser calls characters() several times, I don't miss anything.

    0 讨论(0)
  • 2020-11-30 03:09
    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        // TODO Auto-generated method stub
        sb=new StringBuilder();
        if(localName.equals("icon"))
        {
            iconflag=true;
        }
    }
    
    @Override
    public void characters (char ch[], int start, int length) {
        if (sb!=null && iconflag == true) {
            for (int i=start; i<start+length; i++) {
                sb.append(ch[i]);
            }
        }
    }
    
    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        // TODO Auto-generated method stub
        if(iconflag)
        {
            info.setIcon(sb.toString().trim());
            iconflag=false;
        }
    }
    

    So I figured it out, the code above is the solution.

    0 讨论(0)
  • 2020-11-30 03:19

    I ran into this problem the other day, it turns out the reason for this is the CHaracters method is being called multiple times in case any of these Characters are contained in the Value:

    "   &quot;
    '   &apos;
    <   &lt;
    >   &gt;
    &   &amp;
    

    Also be careful about Linebreaks / newlines within the value!!! If the xml is linewrapped without your controll the characters method wil also be called for each line that is in the statement, plus it will return the linebreak! (which you manually need to strip out in turn).

    A sample Handler taking care of all these problems is this one:

     DefaultHandler handler = new DefaultHandler() {
       private boolean isInANameTag = false;
       private String localname;
       private StringBuilder elementContent;
    
       @Override
       public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
        if (qname.equalsIgnoreCase("myfield")) {
          isInMyTag = true;
          this.localname = localname;
          this.elementContent = new StringBuilder();
        }
       }
    
       public void characters(char[] buffer, int start, int length) {
          if (isInMyTag) {
             String content = new String(ch, start, length);
             if (StringUtils.equals(content.substring(0, 1), "\n")) {
                  // remove leading newline
                  elementContent.append(content.substring(1));
             } else {
                  elementContent.append(content);
             }
          }
       }
    
       public void endElement(String uri, String localName, String qName) throws SAXException {
         if (qname.equalsIgnoreCase("myfield")) {
           isInMyTag = false;
           // do something with elementContent.toString());
           System.out.println(elementContent.toString());
           this.localname = "";
         }
       }
    }
    

    I hope this helps.

    0 讨论(0)
提交回复
热议问题