how to get text from <a href> in nested html elements using jericho?

倾然丶 夕夏残阳落幕 提交于 2020-01-06 12:49:47

问题


I have some html code like this

<div class="itm hasOverlay lastrow">
<a id="3:LE343SPABGLIANID" class="itm-link itm-drk trackingOnClick" title="League Sepatu Casual Geof S/L LO - Hitam/Biru" href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html" rel="-standard|">
</a>
<div class="itm-overlay itm-group-mainbox-with-group"></div>
</div>

What should I do to get text league-sepatu-casual-geof-sl-lo-hitambiru-68166.html in

<a href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html">?


回答1:


That should be rather simple...

Source source=new Source(new StringReader(inputString));
Element aElement = source.getFirstElement(HTMLElementName.A);
String href = aElement.getAttributeValue("href");
System.out.println(href);

... although this makes some assumptions, of course: Namely, that the inputString is only the string that you posted (and that this part is not enclosed in other tags), and that this part only contains a single link (a).

(If these assumptions are not valid, one somehow has to identify this particular div and the correct a tag. For example, by searching for a div with the attribute class="itm hasOverlay lastrow" and for a a with the class class="itm-link itm-drk trackingOnClick" - in any case, one has to know more about the actual structure of the document from which this information should be extracted)



来源:https://stackoverflow.com/questions/21904177/how-to-get-text-from-a-href-in-nested-html-elements-using-jericho

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!