Best way to convert xml to have CDATA around text (in java)

前端 未结 4 584
[愿得一人]
[愿得一人] 2021-01-23 02:14

I have a weird requirement where I need to take some xml and re-write it so that the text nodes are wrapped in CDATA (this is for a client that won\'t allow normal escaping).

相关标签:
4条回答
  • 2021-01-23 02:40

    Thanks for all of your answers. I found a way to do this using dom4j. My implementation does not work if elements have "mixed" children (i.e. text element), but in my case this isn't a problem. It works because dom4j will output CDATA if you add CDATA nodes:

        public void replaceTextWithCdataNoMixedText(Document doc) {
            if( doc == null )
                return;
            replaceTextWithCdata(doc.content());
        }
    
        private void replaceTextWithCdata(List content) {
            if (content == null)
                return;
            for (Object o : content) {
                if (o instanceof Element) {
                    Element e = (Element) o;
                    String t = e.getTextTrim();
                    if (textNeedsEscaping(t)) {
                        e.clearContent();
                        e.addCDATA(t);
                    } else {
                        List childContent = e.content();
                        replaceTextWithCdata(childContent);
                    }
                }
            }
        }
    
    
        private boolean textNeedsEscaping(String t) {
            if (t == null)
                return false;
            for (int i = 0; i < t.length(); i++) {
                char c = t.charAt(i);
                if (c == '<' || c == '>' || c == '&') {
                    return true;
                }
            }
            return false;
        }
    
    0 讨论(0)
  • 2021-01-23 02:41

    I think it could work with an XSLT transformation, but I am not sure regarding the performance of the transformation. Take a look to CDATA Sections and XSLT, it may help you.

    0 讨论(0)
  • 2021-01-23 02:48

    Taking premade xml and parsing (with an xml parser) it is just going to make the parser choke on the unescaped characters. The only solution I can think of is to make your own tag soup parser to parse it, modify and dump it back to xml.

    0 讨论(0)
  • 2021-01-23 02:49

    You can use XSLT to accomplish this, as long as a) all of the text you need to output is in elements, b) you only care about text nodes, c) you know the names of all the elements that contain text, and d) it's okay to emit any text in all of those output elements as CDATA. If all of those cases are true, then you could write an identity transform and add this element to it:

    <xsl:output method="xml" cdata-section-elements="elm1 elm2 elm3..."/>
    

    See the W3C XSLT recommendation on this subject.

    0 讨论(0)
提交回复
热议问题