Using XmlSlurper: How to select sub-elements while iterating over a GPathResult

╄→尐↘猪︶ㄣ 提交于 2019-12-05 03:56:05

Replace grep with find:

html.'**'.find { it.@class == 'divclass' }.ol.li.each { linkItem ->
    def link = linkItem.h3.a.@href
    def address = linkItem.address.text()
    println "$link: $address\n"
}

then you'll get

#href1: Here is the addressTelephone number: telephone

#href2: Here is another addressAnother telephone: 0845 1111111

grep returns an ArrayList but find returns a NodeChild class:

println html.'**'.grep { it.@class == 'divclass' }.getClass()
println html.'**'.find { it.@class == 'divclass' }.getClass()

results in:

class java.util.ArrayList
class groovy.util.slurpersupport.NodeChild

thus if you wanted to use grep you could then nest another each like this for it to work

html.'**'.grep { it.@class == 'divclass' }.ol.li.each {
    it.each { linkItem ->
        def link = linkItem.h3.a.@href
        def address = linkItem.address.text()
        println "$link: $address\n"
    }
}

Long story short, in your case, use find rather than grep.

This was is a tricky one. When there is just one element with class='divclass' the previous answer sure is fine. If there were multiple results from grep, then a find() for a single result is not the answer. Pointing out that the result is an ArrayList is correct. Inserting an outer nested .each() loop provides a GPathResult in the closure parameter div. From here the drill down can continue with the expected result.

html."**".grep { it.@class == 'divclass' }.each { div -> div.ol.li.each { linkItem ->
   def link = linkItem.h3.a.@href
   def address = linkItem.address.text()
   println "$link: $address\n"
}}

The behavior of the original code can use a bit more of an explanation as well. When a property is accessed on a List in Groovy, you'll get a new list (same size) with the property of each element in the list. The list found by grep() has just one entry. Then we get one entry for property ol, which is fine. Next we get the result of ol.it for that entry. It is a list of size() == 1 again, but this time with an entry of size() == 2. We could apply the outer loop there and get the same result, if we wanted to:

html."**".grep { it.@class == 'divclass' }.ol.li.each { it.each { linkItem ->
   def link = linkItem.h3.a.@href
   def address = linkItem.address
   println "$link: $address\n"
}}

On any GPathResult representing multiple nodes, we get the concatenation of all text. That is the original result, first for @href, then for address.

I believe the previous answers are all correct at the time of writing, for the version used. But I am using HTTPBuilder 0.7.1 and Grails 2.4.4 with Groovy 2.3.7 and there is a big issue - HTML elements are transformed to uppercase. It appears this is due to NekoHTML used under the hood:

http://nekohtml.sourceforge.net/faq.html#uppercase

Because of this, the solution in the accepted answer must be written as:

html.'**'.find { it.@class == 'divclass' }.OL.LI.each { linkItem ->
    def link = linkItem.H3.A.@href
    def address = linkItem.ADDRESS.text()
    println "$link: $address\n"
}

This was very frustrating to debug, hope it helps someone.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!