I\'m trying to parse information from a XML file that contains a lot of elements with repeating names.
Here is an example of the type of file I am trying to parse, conta
New answer after the significant edit to the question.
I stored OP's XML in a file BUT DUPLICATED THE SINGLE RECORD PROVIDED!
I'm letting myself use %>%
now. I get 16 elements per record where OP
gets 18 because the actual XML posted contains no evidence of
HT_CAPS_IE
and HT_IE
. Given the way we're doing this now, it's more
about computation on lists than XML, which seems unavoidable. The link
between keys and data is more based on adjacency than structure.
library(magrittr)
library(xml2)
## ugly workaround: xml2 does not seem to ignore insignificant whitespace?
x <- "so.xml" %>%
scan(what = character(), sep = "\n", strip.white = TRUE) %>%
paste0(collapse = "") %>%
read_xml
## isolate each record
(records <- x %>%
xml_children() %>%
xml_children())
#> {xml_nodeset (2)}
#> [1] \n 80211D_IE \n \n IE_KEY_80211D_CHA ...
#> [2] \n 80211D_IE \n \n IE_KEY_80211D_CHA ...
## turn each record into a list
records_list <- records %>% lapply(as_list)
str(records_list, max.level = 1)
#> List of 2
#> $ :List of 32
#> $ :List of 32
## IRL here's where I check that ...
## we have key, THINGY, key, THINGY, etc. within each record
## we have THINGY1, THINGY2, etc. across all records
## store item names from record 1
keys <- records_list[[1]][c(TRUE, FALSE)] %>% unlist
## isolate the data, do obvious simplifications, apply item names
jfun <- function(x) if(is.list(x) && length(x) > 1) x else unlist(x)
z <- records_list %>%
lapply(`[`, c(FALSE, TRUE)) %>%
lapply(`names<-`, keys) %>%
lapply(lapply, jfun)
## done!
str(z[[1]], max.level = 1)
#> List of 16
#> $ 80211D_IE :List of 4
#> $ AGE : chr "0"
#> $ AP_MODE : chr "2"
#> $ BEACON_INT : chr "100"
#> $ BSSID : chr "ac:5d:10:73:c3:11"
#> $ CAPABILITIES : chr "1073"
#> $ CHANNEL : chr "2"
#> $ CHANNEL_FLAGS: chr "10"
#> $ IE : chr "AAZPbGl2ZXIBCIKEiwwSlhgkAwECBwZVUyABCxswGAEAAA+sAgIAAA+sBAAPrAIBAAAPrAIAAN0aAFDyAQEAAFDyAgIAAFDyBABQ8gIBAABQ8gIqAQAyBDBIYGw="
#> $ NOISE : chr "0"
#> $ RATES :List of 12
#> $ RSN_IE :List of 8
#> $ RSSI : chr "-74"
#> $ SSID : chr "T2xpdmVy"
#> $ SSID_STR : chr "Oliver"
#> $ WPA_IE :List of 8