I am using the Jsoup library to read a URL. This url has text within a few tags. Is it possible for me to obtain the text within each
Yes. You can use Element#getElementsByTag() to get all the script
tag . Each script tags will be represented by the DataNode.
Document doc =Jsoup.connect("http://stackoverflow.com/questions/16780517/java-obtain-text-within-script-tag-using-jsoup").timeout(10000).get();
Elements scriptElements = doc.getElementsByTag("script");
for (Element element :scriptElements ){
for (DataNode node : element.dataNodes()) {
System.out.println(node.getWholeData());
}
System.out.println("-------------------");
}
Document doc = Jsoup.parse(html);
Elements scripts = doc.getElementsByTag("script");
for (Element script : scripts) {
System.out.println(script.data());
}
Alternatively, you could use the Element#html() method that returns the inner html of an element.
Since 1.11.1: Use efficient Element#selectFirst() method to find the script element.
Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get(); Element scriptElement = doc.selectFirst("script"); // Don't forget to check scriptElement is not null... String jsCode = scriptElement.html();
Up to Jsoup 1.10.3: Combine Element#select() and Elements#first() calls to find the script element.
Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get(); Element scriptElement = doc.select("script").first(); // Don't forget to check scriptElement is not null... String jsCode = scriptElement.html();
According to your case the solution will be as below.
Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get();
Elements scripts = doc.select("script");
for (Element script : scripts) {
String type = script.attr("type");
if (type.contentEquals("text/javascript")) {
String scriptData = script.data(); // your text from the script
break;
}
}