Print XML element with AWK

后端 未结 2 1726
情书的邮戳
情书的邮戳 2021-01-21 10:03

How do I print the contents of an XML element - from the starting tag to the closing tag - using AWK?

For example, consider the following XML:



        
相关标签:
2条回答
  • 2021-01-21 10:36
    $ awk -v tag='city' '$0~"^<"tag"\\>"{inTag=1} inTag; $0~"^</"tag">"{inTag=0}' file
    <city id="AT">
           <cityname>Athens</cityname>
           <state>GA</state>
           <description> Home of the University of Georgia</description>
           <population>100,000</population>
           <location>Located about 60 miles Northeast of Atlanta</location>
           <latitude>33 57' 39" N</latitude>
           <longitude>83 22' 42" W</longitude>
    </city>
    

    Using GNU awk above for \> word boundary functionality. With other awks use [^[:alnum:]_] or similar.

    To only print the first occurrence:

    $ awk -v tag='city' '$0~"^<"tag"\\>"{inTag=1} inTag{print; if ($0~"^</"tag">") exit}' file
    <city id="AT">
           <cityname>Athens</cityname>
           <state>GA</state>
           <description> Home of the University of Georgia</description>
           <population>100,000</population>
           <location>Located about 60 miles Northeast of Atlanta</location>
           <latitude>33 57' 39" N</latitude>
           <longitude>83 22' 42" W</longitude>
    </city>
    
    0 讨论(0)
  • 2021-01-21 10:39

    Solutions that parse XML with tools like awk and sed are imperfect. You cannot rely on XML always having a human readable layout. For example some web services will omit new-lines, resulting in the entire XML document appearing on one line.

    I would recommend using xmllint, which has the ability to select nodes using XPATH, a query language designed for XML.

    The following command will select the city tags:

    xmllint --xpath "//city" data.xml
    

    XPath is extremely useful. It makes the every part of the XML document addressable:

    xmllint --xpath "string(//city[1]/@id)" data.xml
    

    Returns the string "AT".

    Poorly formatted XML data

    This time return the first occurrence of the "city" tag. xmllint can also be used to pretty print the result:

    $ xmllint --xpath "//city[1]" data.xml  | xmllint -format -
    <?xml version="1.0"?>
    <city id="AT">
      <cityname>Athens</cityname>
      <state>GA</state>
      <description> Home of the University of Georgia</description>
      <population>100,000</population>
      <location>Located about 60 miles Northeast of Atlanta</location>
      <latitude>33 57' 39" N</latitude>
      <longitude>83 22' 42" W</longitude>
    </city>
    

    data.xml

    In this same data the first "city" tag appears all on one line. This is valid XML.

    <data>
      <flight>
        <airline>Delta</airline>
        <flightno>22</flightno>
        <origin>Atlanta</origin>
        <destination>Paris</destination>
        <departure>5:40pm</departure>
        <arrival>8:10am</arrival>
      </flight>
      <city id="AT"> <cityname>Athens</cityname> <state>GA</state> <description> Home of the University of Georgia</description> <population>100,000</population> <location>Located about 60 miles Northeast of Atlanta</location> <latitude>33 57' 39" N</latitude> <longitude>83 22' 42" W</longitude> </city>
      <city id="DUB">
        <cityname>Dublin</cityname>
        <state>Dub</state>
        <description> Dublin</description>
        <population>1,500,000</population>
        <location>Ireland</location>
        <latitude>NA</latitude>
        <longitude>NA</longitude>
      </city>
    </data>
    
    0 讨论(0)
提交回复
热议问题