Alphanumeric sort on mixed string value

匿名 (未验证) 提交于 2019-12-03 09:06:55


Given XML snippet of:

<forms> <FORM lob="BO" form_name="AI OM 10"/> <FORM lob="BO" form_name="CL BP 03 01"/> <FORM lob="BO" form_name="AI OM 107"/> <FORM lob="BO" form_name="CL BP 00 02"/> <FORM lob="BO" form_name="123 DDE"/> <FORM lob="BO" form_name="CL BP 00 02"/> <FORM lob="BO" form_name="AI OM 98"/> </forms> 

I need to sort the FORM nodes by form_name alphabetically so all the forms containing 'AI OM' in the form_name are grouped together and then within that they are in numeric order by the integers (same for other forms).

The form_name can be is open season as letters and numbers can be in any order:

XX ## ##
XX XX ##
XX XX ###
XX XX ## ##
XX ###
'## XXX

What I THINK needs to happen is that string needs to be split between alpha and numeric. The numeric part could probably be sorted with any spaces removed I suppose.

I am at a loss as to how to split the string and then cover all the sorting/grouping combinations given that there are no rules around the 'form_name' format.

We are using XSLT 2.0. Thanks.


This transformation:

<xsl:stylesheet version="1.0"  xmlns:xsl="">  <xsl:output omit-xml-declaration="yes" indent="yes"/>   <xsl:variable name="vDigits" select="'0123456789 '"/>  <xsl:variable name="vAlpha" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ '"/>   <xsl:template match="/*">   <forms>    <xsl:for-each select="FORM">     <xsl:sort select="translate(@form_name,$vDigits,'')"/>     <xsl:sort select="translate(@form_name,$vAlpha,'')"         data-type="number"/>     <xsl:copy-of select="."/>    </xsl:for-each>   </forms>  </xsl:template> </xsl:stylesheet> 

when applied on the provided XML document:

<forms>     <FORM lob="BO" form_name="AI OM 10"/>     <FORM lob="BO" form_name="CL BP 03 01"/>     <FORM lob="BO" form_name="AI OM 107"/>     <FORM lob="BO" form_name="CL BP 00 02"/>     <FORM lob="BO" form_name="123 DDE"/>     <FORM lob="BO" form_name="CL BP 00 02"/>     <FORM lob="BO" form_name="AI OM 98"/> </forms> 

produces the wanted, correct result:

<forms>     <FORM lob="BO" form_name="AI OM 10"/>     <FORM lob="BO" form_name="AI OM 98"/>     <FORM lob="BO" form_name="AI OM 107"/>     <FORM lob="BO" form_name="CL BP 00 02"/>     <FORM lob="BO" form_name="CL BP 00 02"/>     <FORM lob="BO" form_name="CL BP 03 01"/>     <FORM lob="BO" form_name="123 DDE"/> </forms> 

Do note:

  1. Two <xsl:sort> instructions implement the two-phase sorting

  2. The XPath translate() function is used to produce either the alpha-only sort-key or the digits-only sort-key.


This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="">     <xsl:template match="node()|@*">         <xsl:copy>             <xsl:apply-templates select="node()|@*"/>         </xsl:copy>     </xsl:template>     <xsl:template match="forms">         <xsl:apply-templates>             <xsl:sort select="normalize-space(                                 translate(@form_name,                                           '0123456789',                                           ''))"/>             <xsl:sort select="substring-before(                                 concat(                                   normalize-space(                                     translate(@form_name,                                               translate(@form_name,                                                         '0123456789 ',                                                         ''),                                               '')),                                   ' '),' ')" data-type="number"/>             <xsl:sort select="substring-after(                                 normalize-space(                                   translate(@form_name,                                             translate(@form_name,                                                       '0123456789 ',                                                       ''),                                             '')),                                   ' ')" data-type="number"/>         </xsl:apply-templates>     </xsl:template> </xsl:stylesheet> 


<FORM lob="BO" form_name="AI OM 10"></FORM> <FORM lob="BO" form_name="AI OM 98"></FORM> <FORM lob="BO" form_name="AI OM 107"></FORM> <FORM lob="BO" form_name="CL BP 00 02"></FORM> <FORM lob="BO" form_name="CL BP 00 02"></FORM> <FORM lob="BO" form_name="CL BP 03 01"></FORM> <FORM lob="BO" form_name="123 DDE"></FORM> 

XSLT 2.0 solution: this stylesheet

<xsl:stylesheet version="2.0"  xmlns:xsl=""  xmlns:xs="">     <xsl:output method="xml" indent="yes"/>     <xsl:template match="node()|@*">         <xsl:copy>             <xsl:apply-templates select="node()|@*"/>         </xsl:copy>     </xsl:template>     <xsl:template match="forms">         <xsl:apply-templates>             <xsl:sort select="string-join(tokenize(@form_name,' ')                                             [not(. castable as xs:integer)],                                           ' ')"/>             <xsl:sort select="xs:integer(tokenize(@form_name,' ')                                             [. castable as xs:integer][1])"/>             <xsl:sort select="xs:integer(tokenize(@form_name,' ')                                             [. castable as xs:integer][2])"/>         </xsl:apply-templates>     </xsl:template> </xsl:stylesheet> 


It should be noted that the marked answer doesn't work in all cases.


<forms>   <FORM lob="BO" form_name="AA 11 AB"/>   <FORM lob="BO" form_name="AA AZ 01"/> </forms> 

Expected Output:

<forms>   <FORM lob="BO" form_name="AA AZ 01"/>   <FORM lob="BO" form_name="AA 11 AB"/> </forms> 

Actual Output:

<forms>   <FORM lob="BO" form_name="AA 11 AB"/>   <FORM lob="BO" form_name="AA AZ 01"/> </forms> 

If letters are allowed after numbers, you cannot strip them out in the first sort key.
