HTML Parsing and removing anchor tags while preserving inner html using Jsoup

我只是一个虾纸丫 提交于 2021-01-27 21:14:49

问题


I have to parse some html and remove the anchor tags , but I need to preserve the innerHTML of anchor tags

For example, if my html text is:

String html = "<div> <p> some text <a href="#"> some link text </a> </p> </div>"

Now I can parse the above html and select for a tag in jsoup like this,

Document doc = Jsoup.parse(inputHtml);

//this would give me all elements which have anchor tag
Elements elements = doc.select("a");

and I can remove all of them by,

element.remove()

But it would remove the complete achor tag from start bracket to close bracket, and the inner html would be lost, How can I preserve the inner HTML which removing only the start and close tags.

Also, Please Note : I know there are methods to get outerHTML() and innerHTML() from the element, but those methods only give me ways to retrieve the text, the remove() method removes the complete html of the tag. Is there any way in which I can only remove the outer tags and preserve the innerHTML ?

Thanks a lot in advance and appreciate your help.

--Rajesh


回答1:


use unwrap, it preserves the inner html

doc.select("a").unwrap();

check the api-docs for more info:
http://jsoup.org/apidocs/org/jsoup/select/Elements.html#unwrap%28%29




回答2:


How about extracting the inner HTML first, adding it to the DOM and then removing your tags? This code is untested, but should do the trick:

Edit:

I updated the code to use replaceWith(), making the code more intuitive and probably more efficient; thanks to A.J.'s hint in the comments.

Document doc = Jsoup.parse(inputHtml);
Elements links = doc.select("a");
String baseUri = links.get(0).baseUri();
for(Element link : links) {
    Node linkText = new TextNode(link.html(), baseUri);
    // optionally wrap it in a tag instead:
    // Element linkText = doc.createElement("span");
    // linkText.html(link.html());
    link.replaceWith(linkText);
}

Instead of using a text node, you can wrap the inner html in anything you want; you might even have to, if there's not just text inside your links.



来源:https://stackoverflow.com/questions/17032677/html-parsing-and-removing-anchor-tags-while-preserving-inner-html-using-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!