How to return NA when nothing is found in an xpath?

前端 未结 2 845
孤独总比滥情好
孤独总比滥情好 2021-01-28 22:33

It is difficult to formulate the question, but with an example, it is simple to understand.

I use R to parse html code.

In the following, I have a html code call

相关标签:
2条回答
  • 2021-01-28 23:01

    It's easier to select the enclosing tag (the div here) for each, and look for each tag inside. With rvest and purrr, which I find simpler,

    library(rvest)
    library(purrr)
    
    html %>% read_html() %>% 
        html_nodes('.line') %>% 
        map_df(~list(number = .x %>% html_node('.number') %>% html_text(), 
                     surface = .x %>% html_node('.surface') %>% html_text()))
    
    #> # A tibble: 2 × 2
    #>     number   surface
    #>      <chr>     <chr>
    #> 1 Number 1 Surface 1
    #> 2     <NA> Surface 2
    
    0 讨论(0)
  • 2021-01-28 23:10
    library( 'XML' )  # load library
    doc = htmlParse( html )  # parse html
    # define xpath expression. div contains class = line, within which span has classes number and surface
    xpexpr <- '//div[ @class = "line" ]'  
    
    a1 <- lapply( getNodeSet( doc, xpexpr ), function( x ) { # loop through nodeset
          y <- xmlSApply( x, xmlValue, trim = TRUE )  # get xmlvalue
          names(y) <- xmlApply( x, xmlAttrs ) # get xmlattributes and assign it as names to y
          y   # return y
        } )
    

    loop through a1 and extract values of number and surface and set names accordingly. Then column bind number and surface values

    nm <- c( 'number', 'surface' )
    do.call( 'cbind', lapply( a1, function( x ) setNames( x[ nm ], nm ) ) )
    #                [,1]        [,2]       
    # number  "Number 1"  NA         
    # surface "Surface 1" "Surface 2"
    

    Data:

    html <- '<div class="line">
    <span class="number">Number 1</span>
    <span class="surface">Surface 1</span>
    </div>
    <div class="line">
    <span class="surface">Surface 2</span>
    </div>' 
    
    0 讨论(0)
提交回复
热议问题