Rails nokogiri parse XML file

心已入冬 提交于 2019-12-25 03:12:36

问题


I'm a little bit confused: could not find in web good examples of parsing xml with nokogiri...

example of my data:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <rows SessionGUID="6448680D1">
        <row>
            <AnalogueCode>0451103079</AnalogueCode>
            <AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
            <AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
            <AnalogueWeight>0.000</AnalogueWeight>
            <CodeAsIs>OC90</CodeAsIs>
            <DeliveryVariantPriceAKiloForClientDescription />
            <DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
            <DeliveryVariantPriceNote />
            <PriceListItemDescription />
            <PriceListItemNote />
            <IsAvailability>1</IsAvailability>
            <IsCross>1</IsCross>
            <LotBase>1</LotBase>
            <LotType>1</LotType>
            <ManufacturerName>KNECHT/MAHLE</ManufacturerName>
            <OfferName>MSC-STC-58</OfferName>
            <PeriodMin>2</PeriodMin>
            <PeriodMax>4</PeriodMax>
            <PriceListDiscountCode>31087</PriceListDiscountCode>
            <ProductName>Фильтр масляный</ProductName>
            <Quantity>41</Quantity>
            <SupplierID>30</SupplierID>
            <GroupTitle>Замена</GroupTitle>
            <Price>203.35</Price>
        </row>
        <row>
            <AnalogueCode>0451103079</AnalogueCode>
            <AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
            <AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
            <AnalogueWeight>0.000</AnalogueWeight>
            <CodeAsIs>OC90</CodeAsIs>
            <DeliveryVariantPriceAKiloForClientDescription />
            <DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
            <DeliveryVariantPriceNote />
            <PriceListItemDescription />
            <PriceListItemNote>[0451103079] Bosch,MTGC@0451103079</PriceListItemNote>
            <IsAvailability>1</IsAvailability>
            <IsCross>1</IsCross>
            <LotBase>1</LotBase>
            <LotType>0</LotType>
            <ManufacturerName>KNECHT/MAHLE</ManufacturerName>
            <OfferName>MSC-STC-1303</OfferName>
            <PeriodMin>3</PeriodMin>
            <PeriodMax>5</PeriodMax>
            <PriceListDiscountCode>102134</PriceListDiscountCode>
            <ProductName>Фильтр масляный</ProductName>
            <Quantity>5</Quantity>
            <SupplierID>666</SupplierID>
            <GroupTitle>Замена</GroupTitle>
            <Price>172.99</Price>
        </row>
      </rows>
</root>

and ruby code:

...
xml_doc  = Nokogiri::XML(response.body)
parts = xml_doc.xpath('/root/rows/row')

with the help of xpath i could do this? also how to get this parts object (row)?


回答1:


You're on the right track. parts = xml_doc.xpath('/root/rows/row') gives you back a NodeSet i.e. a list of the <row> elements.

You can loop through these using each or use row indexes like parts[0], parts[1] to access specific rows. You can then get the values of child nodes using xpath on the individual rows.

e.g. you could build a list of the AnalogueCode for each part with:

codes = []
parts.each do |row|
  codes << row.xpath('AnalogueCode').text
end

Looking at the full example of the XML you're processing there are 2 issues preventing your XPath from matching:

  1. the <root> tag isn't actually the root element of the XML so /root/.. doesn't match

  2. The XML is using namespaces so you need to include these in your XPaths

so there are a couple of possible solutions:

  1. use CSS selectors rather than XPaths (i.e. use search) as suggested by the Tin Man

  2. after xml_doc = Nokogiri::XML(response.body) do xml_doc.remove_namespaces! and then use parts = xml_doc.xpath('//root/rows/row') where the double slash is XPath syntax to locate the root node anywhere in the document

  3. specify the namespaces:

e.g.

xml_doc  = Nokogiri::XML(response.body)
ns = xml_doc.collect_namespaces
parts = xml_doc.xpath('//xmlns:rows/xmlns:row', ns)

codes = []
parts.each do |row|
  codes << xpath('xmlns:AnalogueCode', ns).text
end

I would go with 1. or 2. :-)




回答2:


First, Nokogiri supports XPath AND CSS. I recommend using CSS because it's more easily read:

doc.search('row')

will return a NodeSet of every <row> in the document.

The equivalent XPath is:

doc.search('//row')

...how to get this parts object (row)?

I'm not sure what that means, but if you want to access individual elements inside a <row>, it's easily done several ways.

If you only want one node inside each of the row nodes:

doc.search('row Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]

doc.search('//row/Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]

If you only want the first such occurrence, use at, which is the equivalent of search(...).first:

doc.at('row Price').to_xml
# => "<Price>203.35</Price>"

Typically we want to iterate over a number of blocks and return an array of hashes of the data found:

row_hash = doc.search('row').map{ |row|
  {
    AnalogueCode: row.at('AnalogueCode').text,
    Price: row.at('Price').text,
  }
}
row_hash 
# => [{:AnalogueCode=>"0451103079", :Price=>"203.35"},
#     {:AnalogueCode=>"0451103079", :Price=>"172.99"}]

These are ALL covered in Nokogiri's tutorials and are answered many times here on Stack Overflow, so take the time to read and search.



来源:https://stackoverflow.com/questions/28886363/rails-nokogiri-parse-xml-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!