Node.getTextContent() is there a way to get text content of the current node, not the descendant's text

后端未结

关注

 4  761

眼角桃花

Node.getTextContent() returns the text content of the current node and its descendants.

is there a way to get text content of the current node, not the descendant\'s tex

相关标签:

4条回答

旧时难觅i

2021-02-02 15:24

If you change the last for loop into the following one it behaves as you wanted

for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {           
    String tagname = ((Element) n).getTagName();
    StringBuilder content = new StringBuilder();
    NodeList children = n.getChildNodes();
    for(int i=0; i<children.getLength(); i++) {
        Node child = children.item(i);
        if(child.getNodeName().equals("#text"))
            content.append(child.getTextContent());
    }
    System.out.println(tagname + "=" + content);
}

0 讨论(0)

悲哀的现实

2021-02-02 15:28

I do this with Java 8 streams and a helper class:

import java.util.*;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class NodeLists
{
    /** converts a NodeList to java.util.List of Node */
    static List<Node> list(NodeList nodeList)
    {
        List<Node> list = new ArrayList<>();
        for(int i=0;i<nodeList.getLength();i++) {list.add(nodeList.item(i));}
        return list;
    }
}

And then

 NodeLists.list(node)
.stream()
.filter(node->node.getNodeType()==Node.TEXT_NODE)
 .map(Node::getTextContent)
 .reduce("",(s,t)->s+t);

0 讨论(0)

耶瑟儿～

2021-02-02 15:37

What you want is to filter children of your node <paragraph> to only keep ones with node type Node.TEXT_NODE.

This is an example of method that will return you the desired content

public static String getFirstLevelTextContent(Node node) {
    NodeList list = node.getChildNodes();
    StringBuilder textContent = new StringBuilder();
    for (int i = 0; i < list.getLength(); ++i) {
        Node child = list.item(i);
        if (child.getNodeType() == Node.TEXT_NODE)
            textContent.append(child.getTextContent());
    }
    return textContent.toString();
}

Within your example it means:

String str = "<paragraph>" + //
        "<link>XML</link>" + //
        " is a " + //
        "<strong>browser based XML editor</strong>" + //
        "editor allows users to edit XML data in an intuitive word processor." + //
        "</paragraph>";
Document domDoc = null;
try {
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
    ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
    domDoc = docBuilder.parse(bis);
} catch (Exception e) {
    e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
    String tagname = ((Element) n).getTagName();
    System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}

Output:

paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

What it does is iterating on all the children of a Node, keeping only TEXT (thus excluding comments, node and so on) and accumulating their respective text content.

There is no direct method in Node or Element to get only the text content at first level.

0 讨论(0)

无人及你

2021-02-02 15:41

Implicitly don't have any function for the actual node text but with a simple trick you can do it. Ask if the node.getTextContent() contains "\n", if that is the case then the actual node don't have any text.

Hope this help.

0 讨论(0)
发布评论:

提交评论
- 加载中...