I am trying to use XSLT 2.0 (Saxon-PE 9.6) on an HTML document to create tags that surround all contiguous runs of characters from a specified non-Latin Unicode block (space
Complementing the previous answers, you might like to note that you can write \p{IsDevanagari}
in place of [ऀ-ॿ]
This should work (some comments after the code):
XSLT 2.0
<xsl:analyze-string select="$textValue" regex="([ऀ-ॿ]+)((\s+[ऀ-ॿ]+)*)">
<xsl:matching-substring>
<span xml:lang="hi-Deva"><xsl:value-of select="regex-group(1)"/><xsl:value-of select="regex-group(2)"/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
matching-substring
branch puts the span
around the Hindi textnon-matching-substring
branch just returns the unmodified "normal" text substring (you were returning the whole text!)I came up with http://xsltransform.net/jyH9rMo which just does
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<hmtl>
<head>
<title>New Version!</title>
</head>
<xsl:apply-templates/>
</hmtl>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:analyze-string select="." regex="([ऀ-ॿ]+)((\s+[ऀ-ॿ]+)*)">
<xsl:matching-substring>
<span xml:lang="hi-Deva"><xsl:value-of select="."/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:transform>