How do I print the contents of an XML element - from the starting tag to the closing tag - using AWK?
For example, consider the following XML:
Solutions that parse XML with tools like awk and sed are imperfect. You cannot rely on XML always having a human readable layout. For example some web services will omit new-lines, resulting in the entire XML document appearing on one line.
I would recommend using xmllint, which has the ability to select nodes using XPATH, a query language designed for XML.
The following command will select the city tags:
xmllint --xpath "//city" data.xml
XPath is extremely useful. It makes the every part of the XML document addressable:
xmllint --xpath "string(//city[1]/@id)" data.xml
Returns the string "AT".
This time return the first occurrence of the "city" tag. xmllint can also be used to pretty print the result:
$ xmllint --xpath "//city[1]" data.xml | xmllint -format -
Athens
GA
Home of the University of Georgia
100,000
Located about 60 miles Northeast of Atlanta
33 57' 39" N
83 22' 42" W
In this same data the first "city" tag appears all on one line. This is valid XML.
Delta
22
Atlanta
Paris
5:40pm
8:10am
Athens GA Home of the University of Georgia 100,000 Located about 60 miles Northeast of Atlanta 33 57' 39" N 83 22' 42" W
Dublin
Dub
Dublin
1,500,000
Ireland
NA
NA