Converting XML with namespaces to CSV using powershell

荒凉一梦 提交于 2020-01-13 19:54:27

问题


I have this XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:BOX xmlns="urn:loc.gov:item" 
         xmlns:ns2="urn:loc.gov:box" 
         xmlns:ns3="http://www.example.com/inverter" 
         xmlns:ns4="urn:loc.gov:xyz">
    <ns3:Item>
        <Description>ITEM1</Description>
        <PackSizeNumeric>6</PackSizeNumeric>
        <ns2:BuyersItemIdentification>
            <ID>75847589</ID>
        </ns2:BuyersItemIdentification>
        <ns2:CommodityClassification>
            <CommodityCode>856952</CommodityCode>
        </ns2:CommodityClassification>
        <ns2:AdditionalItemProperty>
            <Name>Weight</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:AdditionalItemProperty>
            <Name>Tare</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:ManufacturerParty>
            <ns2:PartyIdentification>
                <ID>847532</ID>
            </ns2:PartyIdentification>
        </ns2:ManufacturerParty>
    </ns3:Item>
    <ns3:Item>
        <Description>ITEM2</Description>
        <PackSizeNumeric>10</PackSizeNumeric>
        <ns2:BuyersItemIdentification>
            <ID>9568475</ID>
        </ns2:BuyersItemIdentification>
        <ns2:CommodityClassification>
            <CommodityCode>348454</CommodityCode>
        </ns2:CommodityClassification>
        <ns2:AdditionalItemProperty>
            <Name>Weight</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:AdditionalItemProperty>
            <Name>Tare</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:ManufacturerParty>
            <ns2:PartyIdentification>
                <ID>7542125</ID>
            </ns2:PartyIdentification>
        </ns2:ManufacturerParty>
    </ns3:Item>
</ns3:BOX>

I'm trying to convert it to a CSV file.

I get the content:

[xml]$inputFile = Get-Content test.xml

Then I export to CSV:

$inputfile.BOX.childnodes | Export-Csv "Stsadm-EnumSites.csv" -NoTypeInformation -Delimiter:";" -Encoding:UTF8

I get the Description and PackSizeNumeric fields but not the other fields which are in :

"Description";"PackSizeNumeric";"BuyersItemIdentification";"CommodityClassification";"AdditionalItemProperty";"ManufacturerParty"
"ITEM1";"6";"System.Xml.XmlElement";"System.Xml.XmlElement";"System.Object[]";"System.Xml.XmlElement"
"ITEM2";"10";"System.Xml.XmlElement";"System.Xml.XmlElement";"System.Object[]";"System.Xml.XmlElement"

Which is the best way to obtain the fields that are contained in other namespaces?

I would like to get this

"Description";"PackSizeNumeric";"BuyersItemIdentification";"CommodityClassification";"Weight";"Tare";PartyIdentification
"ITEM1";"6";"75847589";"856952";"0";"0";"847532"
"ITEM2";"10";"9568475";"348454";"0";"0";"7542125"

回答1:


A combination of Select-Object and Select-Xml seems to work pretty well:

$ns = @{
    item="urn:loc.gov:item"
    ns2="urn:loc.gov:box"
    ns3="http://www.example.com/inverter"
    ns4="urn:loc.gov:xyz"
}

$doc = New-Object xml
$doc.Load("test.xml")

$doc.BOX.ChildNodes | Select-Object -Property `
    Description,`
    PackSizeNumeric, `
    @{Name="BuyersItemIdentification_ID"; Expression={$_.BuyersItemIdentification.ID}}, `
    @{Name="CommodityClassification_CommodityCode"; Expression={$_.CommodityClassification.CommodityCode}}, `
    @{Name="Weight"; Expression={Select-Xml -Namespace $ns -Xml $_ -XPath "./ns2:AdditionalItemProperty[item:Name = 'Weight']/item:Value"}}, `
    @{Name="Tare"; Expression={Select-Xml -Namespace $ns -Xml $_ -XPath "./ns2:AdditionalItemProperty[item:Name = 'Tare']/item:Value"}}, `
    @{Name="ManufacturerParty_ID"; Expression={$_.ManufacturerParty.PartyIdentification.ID}} `
| Export-Csv "Stsadm-EnumSites.csv" -NoTypeInformation -Delimiter:";" -Encoding:UTF8

result (Stsadm-EnumSites.csv)

"Description";"PackSizeNumeric";"BuyersItemIdentification_ID";"CommodityClassification_CommodityCode";"Weight";"Tare";"ManufacturerParty_ID"
"ITEM1";"6";"75847589";"856952";"0";"0";"847532"
"ITEM2";"10";"9568475";"348454";"0";"0";"7542125"



回答2:


Tomalak's answer is succinct and seems best solution for the problem at hand.

I was trying to make something generic, but the result is not even in the format requested (the additional properties list is hard to convert in a generic way, fieldnames are clunky). Anyway, the below solution walks down the XML tree flattening the data. It is not bound by the element names (except for the initial select)

After finishing my generic answer, I'm now wondering if it wouldn't be better to write & apply an XSLT transformation.

#[xml]$xml = Get-Content test.xml
#xml to process
$xml = [xml]@"
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns3:BOX xmlns="urn:loc.gov:item" 
         xmlns:ns2="urn:loc.gov:box" 
         xmlns:ns3="http://www.example.com/inverter" 
         xmlns:ns4="urn:loc.gov:xyz">
    <ns3:Item>
        <Description>ITEM1</Description>
        <PackSizeNumeric>6</PackSizeNumeric>
        <ns2:BuyersItemIdentification>
            <ID>75847589</ID>
        </ns2:BuyersItemIdentification>
        <ns2:CommodityClassification>
            <CommodityCode>856952</CommodityCode>
        </ns2:CommodityClassification>
        <ns2:AdditionalItemProperty>
            <Name>Weight</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:AdditionalItemProperty>
            <Name>Tare</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:ManufacturerParty>
            <ns2:PartyIdentification>
                <ID>847532</ID>
            </ns2:PartyIdentification>
        </ns2:ManufacturerParty>
    </ns3:Item>
    <ns3:Item>
        <Description>ITEM2</Description>
        <PackSizeNumeric>10</PackSizeNumeric>
        <ns2:BuyersItemIdentification>
            <ID>9568475</ID>
        </ns2:BuyersItemIdentification>
        <ns2:CommodityClassification>
            <CommodityCode>348454</CommodityCode>
        </ns2:CommodityClassification>
        <ns2:AdditionalItemProperty>
            <Name>Weight</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:AdditionalItemProperty>
            <Name>Tare</Name>
            <Value>0</Value>
        </ns2:AdditionalItemProperty>
        <ns2:ManufacturerParty>
            <ns2:PartyIdentification>
                <ID>7542125</ID>
            </ns2:PartyIdentification>
        </ns2:ManufacturerParty>
    </ns3:Item>
</ns3:BOX>
"@

$nsm = [Xml.XmlNamespaceManager]$xml.NameTable

$nsm.AddNamespace("ns1","urn:loc.gov:item")
$nsm.AddNamespace("ns2","urn:loc.gov:box")
$nsm.AddNamespace("ns3","http://www.example.com/inverter")
$nsm.AddNamespace("ns4","urn:loc.gov:xyz")

#function to recursively flatten xml subtree into a hashtable (passed in)
function flatten-xml {
  param (
    $Parent,
    $Element,
    $Fieldname,
    $HashTable
  )

  if ($parent -eq "") {
    $label = $fieldname
  } else {
    $label = $parent + "_" + $fieldname 
  }

  #write-host "$label is $($element.GetType())"

  if ($element.GetType() -eq [System.Xml.XmlElement]) { 
    #get property fields

    $element | Get-Member | ? { $_.MemberType -eq "Property" } | % {
      #write-host "moving from $label to $($_.Name)"
      flatten-xml -Parent $label -Element $element.($_.Name) -FieldName $_.Name -HashTable $HashTable
    }
  }elseif($element.GetType() -eq [System.Object[]]) { 
    #write-host "$label is an array"
    $i = 0
    $element | % { flatten-xml -Parent $label -Element $_ -FieldName "item$i" -HashTable $HashTable; $i++ }
  }else {
    $HashTable[$label] = $element
  }
 }

#convert the nodecollection returned by xpath query into hashtables and write them out to CSV
$xml.SelectNodes("//ns3:BOX/ns3:Item",$nsm) | % { 
    $element = $_
    $ht = @{}
    $element | Get-Member | ? { $_.MemberType -eq "Property" } | % {
      flatten-xml -Parent "" -Element $element.($_.Name) -FieldName $_.Name -HashTable $ht 
    }

    [PSCustomObject]$ht
}  | Export-Csv "test2.csv" -NoTypeInformation -Delimiter:";" -Encoding:UTF8

Result:

> gc .\test2.csv

"AdditionalItemProperty_item0_Name";"AdditionalItemProperty_item0_Value";"AdditionalItemProperty_item1_Name";"AdditionalItemProperty_item1_Value";"BuyersItemIdentification_ID";"CommodityClassification_CommodityCode";"Description";"ManufacturerParty_PartyIdentification_ID";"PackSizeNumeric"
"Weight"                            ;"0"                                  ;"Tare"                              ;"0"                                  ;"75847589"                   ;"856952"                               ;"ITEM1"      ;"847532"                                  ;"6"
"Weight"                            ;"0"                                  ;"Tare"                              ;"0"                                  ;"9568475"                    ;"348454"                               ;"ITEM2"      ;"7542125"                                 ;"10"

References:

  • Powershell loop through xml to create a jagged array
  • flatten xml structure


来源:https://stackoverflow.com/questions/30575274/converting-xml-with-namespaces-to-csv-using-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!