Parsing XML file with known structure and repeating elements

前端 未结 2 1369
孤街浪徒
孤街浪徒 2021-01-29 03:11

I\'m trying to parse information from a XML file that contains a lot of elements with repeating names.

Here is an example of the type of file I am trying to parse, conta

2条回答
  •  孤街浪徒
    2021-01-29 04:13

    New answer after the significant edit to the question.

    I stored OP's XML in a file BUT DUPLICATED THE SINGLE RECORD PROVIDED! I'm letting myself use %>% now. I get 16 elements per record where OP gets 18 because the actual XML posted contains no evidence of HT_CAPS_IE and HT_IE. Given the way we're doing this now, it's more about computation on lists than XML, which seems unavoidable. The link between keys and data is more based on adjacency than structure.

    library(magrittr)
    library(xml2)
    
    ## ugly workaround: xml2 does not seem to ignore insignificant whitespace?
    x <- "so.xml" %>%
      scan(what = character(), sep = "\n", strip.white = TRUE) %>%
      paste0(collapse = "") %>% 
      read_xml
    
    ## isolate each record
    (records <- x %>%
      xml_children() %>%
      xml_children())
    #> {xml_nodeset (2)}
    #> [1] \n  80211D_IE\n  \n    IE_KEY_80211D_CHA ...
    #> [2] \n  80211D_IE\n  \n    IE_KEY_80211D_CHA ...
    
    ## turn each record into a list
    records_list <- records %>% lapply(as_list)
    str(records_list, max.level = 1)
    #> List of 2
    #>  $ :List of 32
    #>  $ :List of 32
    
    ## IRL here's where I check that ...
    ##  we have key, THINGY, key, THINGY, etc. within each record
    ##  we have THINGY1, THINGY2, etc. across all records
    
    ## store item names from record 1
    keys <- records_list[[1]][c(TRUE, FALSE)] %>% unlist
    
    ## isolate the data, do obvious simplifications, apply item names
    jfun <- function(x) if(is.list(x) && length(x) > 1) x else unlist(x)
    z <- records_list %>%
      lapply(`[`, c(FALSE, TRUE)) %>% 
      lapply(`names<-`, keys) %>% 
      lapply(lapply, jfun)
    
    ## done!
    str(z[[1]], max.level = 1)
    #> List of 16
    #>  $ 80211D_IE    :List of 4
    #>  $ AGE          : chr "0"
    #>  $ AP_MODE      : chr "2"
    #>  $ BEACON_INT   : chr "100"
    #>  $ BSSID        : chr "ac:5d:10:73:c3:11"
    #>  $ CAPABILITIES : chr "1073"
    #>  $ CHANNEL      : chr "2"
    #>  $ CHANNEL_FLAGS: chr "10"
    #>  $ IE           : chr "AAZPbGl2ZXIBCIKEiwwSlhgkAwECBwZVUyABCxswGAEAAA+sAgIAAA+sBAAPrAIBAAAPrAIAAN0aAFDyAQEAAFDyAgIAAFDyBABQ8gIBAABQ8gIqAQAyBDBIYGw="
    #>  $ NOISE        : chr "0"
    #>  $ RATES        :List of 12
    #>  $ RSN_IE       :List of 8
    #>  $ RSSI         : chr "-74"
    #>  $ SSID         : chr "T2xpdmVy"
    #>  $ SSID_STR     : chr "Oliver"
    #>  $ WPA_IE       :List of 8
    

提交回复
热议问题