Jericho-html: is it possible to extract text with reference to positions in source file?

不问归期 提交于 2019-12-24 07:26:33

问题


I use Jericho HTML Parser 3.1.

I need to extract text from html, handle it and according to this, I need to insert tags to original html.

But for this I need matching between extracted text and source html.

net.htmlparser.jericho.TextExtractor extracts text pretty good, but I was not able to find how to find the location in original file.

Is it possible to do so with Jericho-html?


回答1:


You cann't do this with the TextExtractor as is, but I've needed to do similar things in the past and the simplest solution is to copy Jericho's TextExtractor implementation and edit it to add your own custom behaviour. It's a pretty simple class so you'll be able to easily see where to add your own hooks.



来源:https://stackoverflow.com/questions/5579392/jericho-html-is-it-possible-to-extract-text-with-reference-to-positions-in-sour

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!