JTidy Node.findBody() — How to use?

僤鯓⒐⒋嵵緔 提交于 2019-12-07 07:31:15

问题


I'm trying to do XHTML DOM parsing with JTidy, and it seems to be rather counterintuitive task. In particular, there's a method to parse HTML:

Node Tidy.parse(Reader, Writer)

And to get the <body /> of that Node, I assume, I should use

Node Node.findBody(TagTable)

Where should I get an instance of that TagTable? (Constructor is protected, and I haven't found a factory to produce it.)

I use JTidy 8.0-SNAPSHOT.


回答1:


I found there's much simpler method to extract the body:

tidy = new Tidy();
tidy.setXHTML(true);
tidy.setPrintBodyOnly(true);

And then use tidy on the Reader-Writer pair.

Simple as it should be.




回答2:


You could use the parseDOM method instead, which would give you a org.w3c.dom.Document back:

Document document = Tidy.parseDOM(reader, writer);
Node body = document.getElementsByTagName("body").item(0);


来源:https://stackoverflow.com/questions/221277/jtidy-node-findbody-how-to-use

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!