How to select nth element of particular type in enlive?

后端 未结 1 2041
一生所求
一生所求 2021-02-10 20:23

I am trying to scrape some data from a page with a table based layout. So, to get some of the data I need to get something like 3rd table inside 2nd table inside 5th table insid

相关标签:
1条回答
  • 2021-02-10 21:11

    For nth-of-type, does the following example help?

    user> (require '[net.cgrand.enlive-html :as html])
    user> (def test-html
               "<html><head></head><body><p>first</p><p>second</p><p>third</p></body></html>")
    #'user/test-html
    user> (html/select (html/html-resource (java.io.StringReader. test-html))
                       [[:p (html/nth-of-type 2)]])
    ({:tag :p, :attrs nil, :content ["second"]})
    

    No idea about the second issue. Your approach seems to work with a naive test:

    user> (def test-html "<html><head></head><body><div><p>in div</p></div><p>not in div</p></body></html>")
    #'user/test-html
    user> (html/select (html/html-resource (java.io.StringReader. test-html)) [:body :> :p])
    ({:tag :p, :attrs nil, :content ["not in div"]})
    

    Any chance of looking at your actual HTML?

    Update: (in response to the comment)

    Here's another example where "the second <p> inside the <div> inside the second <div> inside whatever" is returned:

    user> (def test-html "<html><head></head><body><div><p>this is not the one</p><p>nor this</p><div><p>or for that matter this</p><p>skip this one too</p></div></div><span><p>definitely not this one</p></span><div><p>not this one</p><p>not this one either</p><div><p>not this one, but almost</p><p>this one</p></div></div><p>certainly not this one</p></body></html>")
    #'user/test-html
    user> (html/select (html/html-resource (java.io.StringReader. test-html))
                       [[:div (html/nth-of-type 2)] :> :div :> [:p (html/nth-of-type 2)]])
    ({:tag :p, :attrs nil, :content ["this one"]})
    
    0 讨论(0)
提交回复
热议问题