问题
I can't seem to get this basic xslt query working via xmlstarlet.
I'm sure I'm missing something obvious, but for the life of me I cannot figure out this syntax, so someone please illuminate me.
XML Starlet Command:
xml sel -t -m "//rdf:RDF/item" -v link -v description -v link ./sss.rdf
sss.rdf:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:ev="http://purl.org/rss/1.0/modules/event/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/">
<channel rdf:about="http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;format=rss&amp;maxAsk=150&amp;minAsk=50&amp;query=ipod%20touch%205g&amp;srchType=A">
<title>craigslist baltimore | all for sale / wanted search "ipod touch 5g"</title>
<link>http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;maxAsk=150&amp;minAsk=50&amp;query=ipod%20touch%205g&amp;srchType=A</link>
<description />
<dc:language>en-us</dc:language>
<dc:rights>&copy; 2013 craigslist</dc:rights>
<dc:publisher>robot@craigslist.org</dc:publisher>
<dc:creator>robot@craigslist.org</dc:creator>
<dc:source>http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;format=rss&amp;maxAsk=150&amp;minAsk=50&amp;query=ipod%20touch%205g&amp;srchType=A</dc:source>
<dc:title>craigslist baltimore | all for sale / wanted search "ipod touch 5g"</dc:title>
<dc:type>Collection</dc:type>
<syn:updateBase>2013-09-20T09:23:41-07:00</syn:updateBase>
<syn:updateFrequency>1</syn:updateFrequency>
<syn:updatePeriod>hourly</syn:updatePeriod>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://baltimore.craigslist.org/ele/4039527375.html" />
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://baltimore.craigslist.org/ele/4039527375.html">
<title><![CDATA[Unlocked Optimus Lg Phone (Baltimore) $150]]></title>
<link>http://baltimore.craigslist.org/ele/4039527375.html</link>
<description>OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g<![CDATA[
●Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
●It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
●The Phone Was Only Used For 2-3 Months
&# [...]]]></description>
<dc:date>2013-09-01T10:14:06-07:00</dc:date>
<dc:language>en-us</dc:language>
<dc:rights>&copy; 2013 craigslist</dc:rights>
<dc:source>http://baltimore.craigslist.org/ele/4039527375.html</dc:source>
<dc:title><![CDATA[Unlocked Optimus Lg Phone (Baltimore) $150]]></dc:title>
<dc:type>text</dc:type>
<dcterms:issued>2013-09-01T10:14:06-07:00</dcterms:issued>
</item>
</rdf:RDF>
My Desired Output:
Unlocked Optimus Lg Phone (Baltimore) $150
OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g
●Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
●It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
●The Phone Was Only Used For 2-3 Months
&# [...]
http://baltimore.craigslist.org/ele/4039527375.html
回答1:
This XmlStarlet command:
xml sel -N purl="http://purl.org/rss/1.0/" -t -m "//rdf:RDF/purl:item" -v purl:title -n -v purl:description -n -v purl:link -n ./sss.rdf
Yields the desired output:
Unlocked Optimus Lg Phone (Baltimore) $150
OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g
&#9679;Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
&#9679;It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
&#9679;The Phone Was Only Used For 2-3 Months
&# [...]
http://baltimore.craigslist.org/ele/4039527375.html
Explanation:
The key is to notice that the input document has a default namespace, which causes item
, title
, description
, and link
to be in the http://purl.org/rss/1.0/
namespace. Defining -N purl="http://purl.org/rss/1.0/"
allows us to use the purl
prefix when specifying these elements in the XPaths. Without the purl
namespace prefixes, the XPaths weren't matching.
来源:https://stackoverflow.com/questions/18927226/xmlstarlet-and-rss