Normalization in DOM parsing with java - how does it work?

前端 未结 3 1875
一向
一向 2020-11-22 04:48

I saw the line below in code for a DOM parser at this tutorial.

doc.getDocumentElement().normalize();

Why do we do this normalization ?

3条回答
  •  名媛妹妹
    2020-11-22 05:13

    As an extension to @JBNizet's answer for more technical users here's what implementation of org.w3c.dom.Node interface in com.sun.org.apache.xerces.internal.dom.ParentNode looks like, gives you the idea how it actually works.

    public void normalize() {
        // No need to normalize if already normalized.
        if (isNormalized()) {
            return;
        }
        if (needsSyncChildren()) {
            synchronizeChildren();
        }
        ChildNode kid;
        for (kid = firstChild; kid != null; kid = kid.nextSibling) {
             kid.normalize();
        }
        isNormalized(true);
    }
    

    It traverses all the nodes recursively and calls kid.normalize()
    This mechanism is overridden in org.apache.xerces.dom.ElementImpl

    public void normalize() {
         // No need to normalize if already normalized.
         if (isNormalized()) {
             return;
         }
         if (needsSyncChildren()) {
             synchronizeChildren();
         }
         ChildNode kid, next;
         for (kid = firstChild; kid != null; kid = next) {
             next = kid.nextSibling;
    
             // If kid is a text node, we need to check for one of two
             // conditions:
             //   1) There is an adjacent text node
             //   2) There is no adjacent text node, but kid is
             //      an empty text node.
             if ( kid.getNodeType() == Node.TEXT_NODE )
             {
                 // If an adjacent text node, merge it with kid
                 if ( next!=null && next.getNodeType() == Node.TEXT_NODE )
                 {
                     ((Text)kid).appendData(next.getNodeValue());
                     removeChild( next );
                     next = kid; // Don't advance; there might be another.
                 }
                 else
                 {
                     // If kid is empty, remove it
                     if ( kid.getNodeValue() == null || kid.getNodeValue().length() == 0 ) {
                         removeChild( kid );
                     }
                 }
             }
    
             // Otherwise it might be an Element, which is handled recursively
             else if (kid.getNodeType() == Node.ELEMENT_NODE) {
                 kid.normalize();
             }
         }
    
         // We must also normalize all of the attributes
         if ( attributes!=null )
         {
             for( int i=0; i

    Hope this saves you some time.

提交回复
热议问题