Ideally, what I would like to be able to do is:
cat xhtmlfile.xhtml |
getElementViaXPath --path=\'/html/head/title\' |
sed -e \'s%(^|
Another command line tool is my new Xidel. It also supports XPath 2 and XQuery, contrary to the already mentioned xpath/xmlstarlet.
The title can be read like:
xidel xhtmlfile.xhtml -e /html/head/title > titleOfXHTMLPage.txt
And it also has a cool feature to export multiple variables to bash. For example
eval $(xidel xhtmlfile.xhtml -e 'title := //title, imgcount := count(//img)' --output-format bash )
sets $title
to the title and $imgcount
to the number of images in the file, which should be as flexible as parsing it directly in bash.