问题
<Filer>
<ID>123456789</ID>
<Name>
<BusinessNameLine1>Stackoverflow</BusinessNameLine1>
</Name>
<NameControl>stack</NameControl>
<USAddress>
<AddressLine1>123 CHERRY HILL LANE</AddressLine1>
<City>LA</City>
<State>CA</State>
<ZIPCode>90210</ZIPCode>
</USAddress>
</Filer>
Here I have a sample of xml code given to me. With this xml I need to grasp a certain attribute from this xml.
I simply need to extract all the <BusinessNameLine1>
from the file. The issue is that this tag appears multiple times through out the file but I only need to extract it if it false in the <Filer>
Tag.
I would do this with PHP but I am at work and I am not able to run php code due to not being able to install software on my computer. I can execute bash files however. The file is also extremely large so I can not put it in excel. I have no idea how to do this. I would appreciate some help or guidance on where to start.
回答1:
Use a proper XML parser. For example, xsh:
open file.xml ;
ls //Filer//BusinessNameLine1 ;
回答2:
xpath is your friend: there is xmllint tool, which could evaluate xpath
xmllint --xpath '//Filer//BusinessNameLine1/text()' yourXML
output:
Stackoverflow
test on an example with <Busn..>
tag out of <Filer>
:
kent$ cat t.xml
<root>
<Trash>
<BusinessNameLine1>trash</BusinessNameLine1>
</Trash>
<Filer>
<ID>123456789</ID>
<Name>
<BusinessNameLine1>Stackoverflow</BusinessNameLine1>
</Name>
<NameControl>stack</NameControl>
<USAddress>
<AddressLine1>123 CHERRY HILL LANE</AddressLine1>
<City>LA</City>
<State>CA</State>
<ZIPCode>90210</ZIPCode>
</USAddress>
</Filer>
</root>
kent$ xmllint --xpath '//Filer//BusinessNameLine1/text()' t.xml
Stackoverflow
回答3:
You could try this combined awk and sed commands,
$ awk -v RS='</Filer>' '/^<Filer>/ {gsub (/\n/," "); print}' file | sed -r 's/.*<BusinessNameLine1>([^<]*)<\/BusinessNameLine1>.*/\1/g'
Stackoverflow
来源:https://stackoverflow.com/questions/23998986/xml-data-extraction