parse an XML with R

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-11 08:25:12

问题


I'm starting a project in R language and I have to parse an XML, I'm using the XML library and functions xmlToDataFrame, XMLPARSE, etc.. I want to store the information in a structured way on a dataframe but I've encountered a problem. I can not get variables to take within a node separately, each in its appropriate column. By using the above-mentioned functions, it saves all the data of the variables in the dataframe a single cell in a single line.

The XML I use is as follows:

<?xml version="1.0" encoding="UTF-8"?>
-<rest-response>

<type>rest-response</type>

<time-stamp>1392217780000</time-stamp>

<status>OK</status>

<msg-version>1.0.0</msg-version>

<op>inventory</op>


-<response>

<inventorySize>3</inventorySize>

<inventoryMode>SYNCHRONOUS</inventoryMode>

<time>4952</time>


-<items>


-<item>

<epc>00000000000000000000A195</epc>

<ts>1392217779060</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-49.0</tag-rssi>

<tag-readcount>36.0</tag-readcount>

<tag-phase>168.0</tag-phase>

</item>


-<item>

<epc>00000000000000000000A263</epc>

<ts>1392217779065</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-49.0</tag-rssi>

<tag-readcount>36.0</tag-readcount>

<tag-phase>0.0</tag-phase>

</item>


-<item>

<epc>B00000000000001101080802</epc>

<ts>1392217779323</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-72.0</tag-rssi>

<tag-readcount>27.0</tag-readcount>

<tag-phase>157.0</tag-phase>

</item>

</items>

</response>

</rest-response>

Everything is inside item gets it as a single value, and I want to put asunder by different concepts.

Another important point is that the XML may change, but its structure will always be the same, but there may be more items

Any idea?


回答1:


So I assume to want the <items> in a data frame. Assuming your xml is in the variable xml.text, this will work:

library(XML)
xml   <- xmlInternalTreeParse(xml.text)  # assumes your xml in variable xml.text
items <- getNodeSet(xml,"//items/item")
df    <- xmlToDataFrame(items)
df
#                        epc            ts location-id location-pos device-id device-reader device-readerPort device-readerMuxPort device-readerMuxPort2 tag-rssi tag-readcount tag-phase
# 1 00000000000000000000A195 1392217779060        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -49.0          36.0     168.0
# 2 00000000000000000000A263 1392217779065        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -49.0          36.0       0.0
# 3 B00000000000001101080802 1392217779323        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -72.0          27.0     157.0

I also assumed that you displayed this xml in a browser and cut/paste (which would explain the -<tag>). Otherwise, your xml is not well-formed.



来源:https://stackoverflow.com/questions/21736927/parse-an-xml-with-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!