Sax parsing and encoding

前端 未结 3 599
醉酒成梦
醉酒成梦 2020-12-10 18:05

I have a contact that is experiencing trouble with SAX when parsing RSS and Atom files. According to him, it\'s as if text coming from the Item elements is truncated at an a

相关标签:
3条回答
  • 2020-12-10 18:11

    The characters() method is not guaranteed to give you the complete character content of a text element in one pass - the full text may span buffer boundaries. You need to buffer the characters yourself between the start and end element events.

    e.g.

    StringBuilder builder;
    
    public void startElement(String uri, String localName, String qName, Attributes atts) {
       builder = new StringBuilder();
    }
    
    public void characters(char[] ch, int start, int length) {
       builder.append(ch,start,length);
    }
    
    public void endElement(String uri, String localName, String qName) {
      String theFullText = builder.toString();
    }
    
    0 讨论(0)
  • 2020-12-10 18:29

    XML entities generate special events in SAX. You can catch them with a LexicalHandler, though it's generally not necessary. But this explain why can't assume that you will recieve only one characters event per tag. Use a buffer as explained in other answers.

    For instance hello&world will generate the sequence

    • startElement
    • characters hello
    • startEntity
    • characters &
    • endEntity
    • characters world

    Have a look at Auxialiary SAX interface, if you want some more examples. Other special events are external entities, comments, CDATA, etc.

    0 讨论(0)
  • 2020-12-10 18:32

    How are you passing the input to SAX? As InputStream (recommended) or Reader? So, starting from your byte[], try using the ByteArrayInputStream.

    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题