I want to convert this piece of xml:
Apples
Bananas
Here's how you could do it with hxpipe
and hxunpipe
from the W3C HTML-XML-utils (packaged for many distributions):
$ hxpipe infile | sed 's/^\([()]\)v1:/\1/g' | hxunpipe
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
hxpipe
parses XML/HTML and turns it into an awk/sed-friendly line based format:
$ hxpipe infile
(v1:table
-\n
(v1:tr
-\n
(v1:td
-Apples
)v1:td
-\n
(v1:td
-Bananas
)v1:td
-\n
)v1:tr
-\n
)v1:table
-\n
where lines starting with (
and )
are opening and closing tags, so removing the first v1:
from lines starting with (
or )
(which is what the sed command above does) achieves the desired effect. Notice that text lines start with a -
, so there can't be any false positives.
This sed works for your example:
sed -E 's~(</?)v1:~\1~g' file
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
However just a note that sed
is not the best tool for parsing HTML/XML. Consider using HTML parsers.