I\'ve inherited some xml files which has all tags in uppercase. I would like to convert them to lowercase using either a regular expression or via XSLT. It would be handy to be
By using PHP you can do it like this...
<?php
$pattern= '/<\\w+|<\/\\w+/';
$fp = fopen("/Applications/XAMPP/htdocs/test/test.xml", "r") or die("can't read stdin");
while (!feof($fp)) {
$line = fgets($fp);
$line = preg_replace_callback(
$pattern,
function ($matches) {
return strtolower($matches[0]);
},
$line
);
echo htmlentities($line);
}
fclose($fp);
?>
It work fine ;)
try to this regex:
<(\/?[a-zA-Z]*)\b.*?>
online tester: http://regex101.com/#PCRE
Enjoy your code
You might need 2 regexes in my opinion - one to convert the tag name, and another to convert the variable number of attribute-value pairs.
Here is how I could do it -
blah:tmp shreyas$ cat old.xml | perl -pe "s|(</?)([^> ]+)(.*?>)|\1\L\2\E\3|g" | perl -pe "s|(\w+)( ?= ?\".*?\")|\L\1\E\2|g" > processed.xml
blah:tmp shreyas$ diff new.xml processed.xml
4c4
< <P>It would be remiss of me to neglect to thank the bottle.</P>
---
> <p>It would be remiss of me to neglect to thank the bottle.</p>
9,10c9,10
< <P>It seems a violent betrayal, me divulging how...</P>
< <P>The years had not been kind Felix Lake. His constant...</P>
---
> <p>It seems a violent betrayal, me divulging how...</p>
> <p>The years had not been kind Felix Lake. His constant...</p>
15c15
< <P>As luck would not have it, he did.</P>
---
> <p>As luck would not have it, he did.</p>
old.xml is your Before xml and new.xml is your After xml. processed.xml is the one generated by the command.
As you can see, the P tags in your after xml are still capital. I am not sure if they were typos or exceptions. I trreated them as typos since you mentioned changing all tags to small case.
With a small modification, you could run these commands on all of you inherited set of XMLs, and get them converted quickly.
Try (untested):
XSLT 2.0:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*">
<xsl:element name="{lower-case(local-name())}" namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{lower-case(local-name())}" namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
The XSLT 1.0 version of the above would go like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:template match="*">
<xsl:element name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
However, this is assuming your element and attribute names do not contain upper-case characters other than the 26 explicitly listed (i.e. no Russian, Greek, diacritics, etc.).