I want to convert this piece of xml:
Apples
Bananas
Here's how you could do it with hxpipe
and hxunpipe
from the W3C HTML-XML-utils (packaged for many distributions):
$ hxpipe infile | sed 's/^\([()]\)v1:/\1/g' | hxunpipe
Apples
Bananas
hxpipe
parses XML/HTML and turns it into an awk/sed-friendly line based format:
$ hxpipe infile
(v1:table
-\n
(v1:tr
-\n
(v1:td
-Apples
)v1:td
-\n
(v1:td
-Bananas
)v1:td
-\n
)v1:tr
-\n
)v1:table
-\n
where lines starting with (
and )
are opening and closing tags, so removing the first v1:
from lines starting with (
or )
(which is what the sed command above does) achieves the desired effect. Notice that text lines start with a -
, so there can't be any false positives.