XSLT multiple string replacement with recursion

前端 未结 5 1563
花落未央
花落未央 2021-01-16 13:38

I have been attempting to perform multiple (different) string replacement with recursion and I have hit a roadblock. I have sucessfully gotten the first replacement to work

相关标签:
5条回答
  • 2021-01-16 14:08

    This transformation is fully parameterized and doesn't need any tricks with default namespaces:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:my="my:my">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <my:params xml:space="preserve">
      <pattern>
       <old>&#xA;</old>
       <new><br/></new>
      </pattern>
      <pattern>
       <old>quick</old>
       <new>slow</new>
      </pattern>
      <pattern>
       <old>fox</old>
       <new>elephant</new>
      </pattern>
      <pattern>
       <old>brown</old>
       <new>white</new>
      </pattern>
     </my:params>
    
     <xsl:variable name="vPats"
          select="document('')/*/my:params/*"/>
    
     <xsl:template match="text()" name="multiReplace">
      <xsl:param name="pText" select="."/>
      <xsl:param name="pPatterns" select="$vPats"/>
    
      <xsl:if test=
       "string-length($pText) >0">
    
        <xsl:variable name="vPat" select=
         "$vPats[starts-with($pText, old)][1]"/>
        <xsl:choose>
         <xsl:when test="not($vPat)">
           <xsl:copy-of select="substring($pText,1,1)"/>
         </xsl:when>
         <xsl:otherwise>
           <xsl:copy-of select="$vPat/new/node()"/>
         </xsl:otherwise>
        </xsl:choose>
    
        <xsl:call-template name="multiReplace">
          <xsl:with-param name="pText" select=
           "substring($pText, 1 + not($vPat) + string-length($vPat/old/node()))"/>
        </xsl:call-template>
      </xsl:if>
     </xsl:template>
    </xsl:stylesheet>
    

    when it is applied on this XML document:

    <t>The quick
    brown fox</t>
    

    the wanted, correct result is produced:

    The slow<br/>white elephant
    

    Explanation:

    The text is scanned from left to right and at any position, if the remaining string starts with one of the specified patterns, then the starting substring is replaced by the replacement specified for the firat matching patterns.

    Do note: If we have search patterns:

       "relation"   --> "mapping" 
       "corelation" --> "similarity"
    

    in the above order, and text:

       "corelation"
    

    then this solution produces the more correct result:

    "similarity"
    

    and the currently accepted solution by @Alejandro) produces:

    "comapping"
    

    Edit: With a small update we get another improvement: If at a given location more than one replace is possible, we perform the longest replace.

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:ext="http://exslt.org/common"
     xmlns:my="my:my">
        <xsl:output omit-xml-declaration="yes"/>
        <xsl:strip-space elements="*"/>
    
        <my:params xml:space="preserve">
            <pattern>
                <old>&#xA;</old>
                <new><br/></new>
            </pattern>
            <pattern>
                <old>quick</old>
                <new>slow</new>
            </pattern>
            <pattern>
                <old>fox</old>
                <new>elephant</new>
            </pattern>
            <pattern>
                <old>brown</old>
                <new>white</new>
            </pattern>
        </my:params>
    
        <xsl:variable name="vrtfPats">
         <xsl:for-each select="document('')/*/my:params/*">
          <xsl:sort select="string-length(old)"
               data-type="number" order="descending"/>
           <xsl:copy-of select="."/>
         </xsl:for-each>
        </xsl:variable>
    
        <xsl:variable name="vPats" select=
         "ext:node-set($vrtfPats)/*"/>
    
        <xsl:template match="text()" name="multiReplace">
            <xsl:param name="pText" select="."/>
            <xsl:param name="pPatterns" select="$vPats"/>
            <xsl:if test=    "string-length($pText) >0">      
                <xsl:variable name="vPat" select=
                "$vPats[starts-with($pText, old)][1]"/>
    
                <xsl:choose>
                    <xsl:when test="not($vPat)">
                        <xsl:copy-of select="substring($pText,1,1)"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:copy-of select="$vPat/new/node()"/>
                    </xsl:otherwise>
                </xsl:choose>
    
                <xsl:call-template name="multiReplace">
                    <xsl:with-param name="pText" select=
                    "substring($pText,
                              1 + not($vPat) + string-length($vPat/old/node())
                              )"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:template>
    </xsl:stylesheet>
    

    Thus, if we have two reps such as "core" --> "kernel" and "corelation" --> "similarity", The second would be used for a text containing the word "corelation", regardless of how the reps are ordered.

    0 讨论(0)
  • 2021-01-16 14:08

    I modified the Dimitrie answer to put his solution in a template and using an exsl extension. Please check it, may be can be useful for someone.

    <?xml version='1.0' ?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
        xmlns:exsl="http://exslt.org/common"
        xmlns:a="http://www.tralix.com/cfd/2" 
        extension-element-prefixes="exsl">
        <xsl:output indent="yes"/>
        <xsl:template match="/*">
            <xsl:variable name="replacesList">
                <replaces>
                    <replace><old>01</old><new>01 - Efectivo</new></replace>
                    <replace><old>02</old><new>02 - Cheque nominativo</new></replace>
                    <replace><old>03</old><new>03 - Transferencia electrónica de fondos</new></replace>
                    <replace><old>04</old><new>04 - Tarjeta de Crédito</new></replace>
                    <replace><old>05</old><new>05 - Monedero Electrónico</new></replace>
                    <replace><old>06</old><new>06 - Dinero electrónico</new></replace>
                    <replace><old>08</old><new>08 - Vales de despensa</new></replace>
                    <replace><old>28</old><new>28 - Tarjeta de Débito</new></replace>
                    <replace><old>29</old><new>29 - Tarjeta de Servicio</new></replace>
                    <replace><old>99</old><new>99 - Otros</new></replace>
                </replaces>
            </xsl:variable>     
            <descripcionMetodoDePago>
                <xsl:call-template name="replaces">
                    <xsl:with-param name="text" select="text"/>
                    <xsl:with-param name="replaces">
                        <xsl:copy-of select="exsl:node-set($replacesList/*/*)"/>
                    </xsl:with-param>
                </xsl:call-template>
            </descripcionMetodoDePago>
        </xsl:template>
        <xsl:template name="replaces">
            <xsl:param name="text"/>
            <xsl:param name="replaces"/>
            <xsl:if test="$text!=''">
                <xsl:variable name="replace" select="$replaces/*[starts-with($text, old)][1]"/>
                <xsl:choose>
                    <xsl:when test="not($replace)">
                        <xsl:copy-of select="substring($text,1,1)"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:copy-of select="$replace/new/node()"/>
                    </xsl:otherwise>
                </xsl:choose>
                <xsl:call-template name="replaces">
                    <xsl:with-param name="text" select=
                    "substring($text, 1 + not($replace) + string-length($replace/old/node()))"/>
                    <xsl:with-param name="replaces" select="$replaces"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:template>
    </xsl:stylesheet>
    
    0 讨论(0)
  • 2021-01-16 14:13

    Although this question was asked (and answered) several years ago, neither this answer nor the (many!) other variants I found while searching the 'net over the last couple of days were able to do what I needed: replace multiple strings in nodes which may contain several kb of text.

    Dimitre's version works well when nodes contain very little text, but when I tried to use it I almost immediately fell foul of the dreaded stack overflow (recursive calls, remember!) The problem with Dimitre's solution is that it tries to match the search patterns to the beginning of the text. This means that many (recursive) calls are made, each call using the right-most n-1 characters of the original text. For a 1k text that means over 1000 recursive calls!

    After digging around for alternatives I came across an example by Ibrahim Naji (http://thinknook.com/xslt-replace-multiple-strings-2010-09-07/) which uses the more conventional substring-before/substring-after combination to perform the replacement. However, that code is limited to a single replacement string for any number of search strings.

    I decided, therefore, that it was time to actually get my hands dirty (and learn XSLT at the same time!) The result is the following code which performs multiple string replacements (specified via an internal template, but that could easily be replaced with an external file, for example), and which (so far in my tests) doesn't suffer from excessive recursive calls.

    It should be noted that the replacements are very basic (as are most other existing implementations) meaning that no attempts are made to only match entire words, for example. I hope the comments are enough to explain the way it works, particularly for other XSLT beginners (like myself).

    And now the code...

    <?xml version="1.0"?>
    <xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:exsl="http://exslt.org/common"
        xmlns:dps="dps:dps">
    
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
        <!--
            The original version of this code was published by Ibrahim Naji (http://thinknook.com/xslt-replace-multiple-strings-2010-09-07/).
            It works but suffered the limitation of only being able to supply a single replacement text. An alternative implementation, which
            did allow find/replace pairs to be specified, was published by Dimitre Novatchev
            (https://stackoverflow.com/questions/5213644/xslt-multiple-string-replacement-with-recursion).
            However, that implementation suffers from stack overflow problems if the node contains more than a few hundred bytes of text (and
            in my case I needed to process nodes which could include several kb of data). Hence this version which combines the best features
            of both implementations.
    
            John Cullen, 14 July 2017.
         -->
    
        <!-- IdentityTransform, copy the input to the output -->
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <!-- Process all text nodes. -->
        <xsl:template match="text()">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="."/>
            </xsl:call-template>
        </xsl:template>
    
        <!-- Table of replacement patterns -->
        <xsl:variable name="vPatterns">
            <dps:patterns>
                <pattern>
                    <old>&lt;i&gt;</old>
                    <new>&lt;em&gt;</new>
                </pattern>
                <pattern>
                    <old>&lt;/i&gt;</old>
                    <new>&lt;/em&gt;</new>
                </pattern>
                <pattern>
                    <old>&lt;b&gt;</old>
                    <new>&lt;strong&gt;</new>
                </pattern>
                <pattern>
                    <old>&lt;/b&gt;</old>
                    <new>&lt;/strong&gt;</new>
                </pattern>
            </dps:patterns>
        </xsl:variable>
    
        <!--
            Convert the internal table into a node-set. This could also be done via a call to document()
            for example select="document('')/*/myns:params/*" with a suitable namespace declaration, but
            in my case that was not possible because the code is being used in with a StreamSource.
         -->
        <xsl:variable name="vPats" select="exsl:node-set($vPatterns)/dps:patterns/*"/>
    
        <!-- This template matches all text() nodes, and calls itself recursively to performs the actual replacements. -->
        <xsl:template name="string-replace-all">
            <xsl:param name="text"/>
            <xsl:param name="pos" select="1"/>
            <xsl:variable name="replace" select="$vPats[$pos]/old"/>
            <xsl:variable name="by" select="$vPats[$pos]/new"/>
            <xsl:choose>
    
                <!-- Ignore empty strings -->
                <xsl:when test="string-length(translate(normalize-space($text), ' ', '')) = 0"> 
                    <xsl:value-of select="$text"/>
                </xsl:when>
    
                <!-- Return the unchanged text if the replacement is larger than the input (so no match possible) -->
                <xsl:when test="string-length($replace) > string-length($text)">
                    <xsl:value-of select="$text"/>
                </xsl:when>
    
                <!-- If the current text contains the next pattern -->
                <xsl:when test="contains($text, $replace)">
                    <!-- Perform a recursive call, each time replacing the next occurrence of the current pattern -->
                    <xsl:call-template name="string-replace-all">
                        <xsl:with-param name="text" select="concat(substring-before($text,$replace),$by,substring-after($text,$replace))"/>
                        <xsl:with-param name="pos" select="$pos"/>
                    </xsl:call-template>
                </xsl:when>
    
                <!-- No (more) matches found -->
                <xsl:otherwise>
                    <!-- Bump the counter to pick up the next pattern we want to search for -->
                    <xsl:variable name="next" select="$pos+1"/>
                    <xsl:choose>
                        <!-- If we haven't finished yet, perform a recursive call to process the next pattern in the list. -->
                        <xsl:when test="boolean($vPats[$next])">
                            <xsl:call-template name="string-replace-all">
                                <xsl:with-param name="text" select="$text"/>
                                <xsl:with-param name="pos" select="$next"/>
                            </xsl:call-template>
                        </xsl:when>
    
                        <!-- No more patterns, we're done. Return the fully processed text. -->
                        <xsl:otherwise>
                            <xsl:value-of select="$text"/>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:template>
    </xsl:stylesheet>
    
    0 讨论(0)
  • 2021-01-16 14:24

    The problem may stem from differences in the encoding of newline, causing the XSLT processor not to recognize the CRLF in your match strings. I suggest testing with using comma in place of newline. The following will give you the expected result when called with the parameter "abc,def,ghi":

    <xsl:template name="replace">
      <xsl:param name="string" select="." />
      <xsl:choose>
        <xsl:when test="contains($string, ',')">
            <xsl:value-of select="substring-before($string, ',')" />
            <br/>
            <xsl:call-template name="replace">
                <xsl:with-param name="string" select="substring-after($string, ',')"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$string"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>
    
    0 讨论(0)
  • 2021-01-16 14:29

    This stylesheet shows a verbose solution just for you to learn the pattern:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:template match="node()|@*">
            <xsl:copy>
                <xsl:apply-templates select="node()|@*"/>
            </xsl:copy>
        </xsl:template>
        <xsl:template match="text()" name="replace">
            <xsl:param name="pString" select="string()"/>
            <xsl:param name="pSearch" select="'THIS'"/>
            <xsl:param name="pReplace" select="'THAT'"/>
            <xsl:choose>
                <xsl:when test="contains($pString, '&#xA;')">
                    <xsl:call-template name="replace">
                        <xsl:with-param
                             name="pString"
                             select="substring-before($pString, '&#xA;')"/>
                    </xsl:call-template>
                    <br/>
                    <xsl:call-template name="replace">
                        <xsl:with-param
                             name="pString"
                             select="substring-after($pString, '&#xA;')"/>
                    </xsl:call-template>
                </xsl:when>
                <xsl:when test="contains($pString, $pSearch)">
                    <xsl:call-template name="replace">
                        <xsl:with-param
                             name="pString"
                             select="substring-before($pString, $pSearch)"/>
                    </xsl:call-template>
                    <xsl:value-of select="$pReplace"/>
                    <xsl:call-template name="replace">
                        <xsl:with-param
                             name="pString"
                             select="substring-after($pString, $pSearch)"/>
                    </xsl:call-template>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="$pString"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:template>
    </xsl:stylesheet>
    

    With this input:

    <t>THIS is a test.
    But THAT is not.
    THIS is also a test.</t>
    

    Output:

    <t>THAT is a test.<br />But THAT is not.<br />THAT is also a test.</t>
    

    EDIT: Full parameterized solution.

    <stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
        <param name="pMap">
            <s t="&#xA;" xmlns=""><br/></s>
            <s t="THIS" xmlns="">THAT</s>
        </param>
        <template match="node()|@*">
            <copy>
                <apply-templates select="node()|@*"/>
            </copy>
        </template>
        <template match="text()" name="replace">
            <param name="pString" select="string()"/>
            <param name="pSearches"
                       select="document('')/*/*[@name='pMap']/s"/>
            <param name="vMatch" select="$pSearches[contains($pString,@t)][1]"/>
            <choose>
                <when test="$vMatch">
                    <call-template name="replace">
                        <with-param
                             name="pString"
                             select="substring-before($pString, $vMatch/@t)"/>
                    </call-template>
                    <copy-of select="$vMatch/node()"/>
                    <call-template name="replace">
                        <with-param
                             name="pString"
                             select="substring-after($pString, $vMatch/@t)"/>
                    </call-template>
                </when>
                <otherwise>
                    <value-of select="$pString"/>
                </otherwise>
            </choose>
        </template>
    </stylesheet>
    

    Output:

    <t>THAT is a test.<br/>But THAT is not.<br/>THAT is also a test.</t>
    

    Note: There is a problem when using inline data in XML 1.0: you can't reset prefixed namespace declaration as in XML 1.1. The solution is to use a not common but valid notation: declare XSLT namespace as default namespace.

    0 讨论(0)
提交回复
热议问题