R How to Check if XPath Exists

前端 未结 2 1230
旧巷少年郎
旧巷少年郎 2021-01-24 17:17

hoping someone more knowledgeable than me can throw some light here.

As part of a larger web-scraper I want to pull meta data out of a set of pages. When I ran this it

相关标签:
2条回答
  • 2021-01-24 17:39

    Assuming the error comes when you try and process the empty list...

    > parsed <- htmlParse("http://www.coindesk.com/information")
    > meta <- xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content")
    > meta
    list()
    > length(meta)==0
    [1] TRUE
    

    Then test for length(meta)==0 - which is TRUE if the element is missing. Otherwise its FALSE - as in this example of extracting the title property:

    > meta <- xpathApply(parsed, "//meta[starts-with(@property, \"og:title\")]", xmlGetAttr,"content")
    > meta
    [[1]]
    [1] "Beginner's guide to bitcoin - CoinDesk's Information Center"
    
    > length(meta)==0
    [1] FALSE
    
    0 讨论(0)
  • 2021-01-24 17:42

    The answer to this has been hard to nail down. Whilst there are a couple of custom implementations of xpathApply knocking around that handle NULL results the solution to the question posed did lay in Spacedman's suggestion.

    The first part of the IF statement calls the xPath and checks to see if the return length = 0. If it does then it applies a custom message to the list, "Title NA" or "Description NA" but if the length isn't 0 (i.e. there is a match) then it applies the xPath to the list.

    Simples.

     require(XML)
        require(RCurl)
        parsed <- htmlParse("http://www.coindesk.com/information")
    
        meta    <- list()
        meta[1] <- if(length(xpathSApply(parsed, "//meta[starts-with(@property, \"og:title\")]", xmlGetAttr,"content"))==0) 
                   {
                     "Title NA"
                   } 
                   else 
                   {
                     xpathSApply(parsed, "//meta[starts-with(@property, \"og:title\")]", xmlGetAttr,"content")
                   }
        meta[2] <- if(length(xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content"))==0) 
                   {  
                     "Description NA" 
                   } 
                   else 
                   {
                      xpathApply(parsed, "//meta[starts-with(@property, \"og:description\")]", xmlGetAttr,"content")
                   } 
    
    0 讨论(0)
提交回复
热议问题