XSLT- normalize non-breaking whitespace characters

后端 未结 1 598
长情又很酷
长情又很酷 2021-01-20 18:09

I have a sample xml file like this,


    

text1 text2

text1 text2

text1 text2   

相关标签:
1条回答
  • 2021-01-20 18:23

    You could do:

    <xsl:value-of select="normalize-space(translate(., '&#160;', ' '))"/>
    

    This will work in XSLT 1.0 and 2.0 alike.


    In XSLT 2.0, you could also use regex - for example:

    <xsl:value-of select="replace(., '[\t\p{Zs}]', '')"/>
    

    will remove the horizontal tab character as well as any character in the Unicode Space_Separator category, which includes not only the space and non-breaking space characters but also other space characters. Documentation is hard to find, but I believe this is currently the complete list: (extracted from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):

    &#x0020; SPACE
    &#x00A0; NO-BREAK SPACE
    &#x1680; OGHAM SPACE MARK
    &#x2000; EN QUAD
    &#x2001; EM QUAD
    &#x2002; EN SPACE
    &#x2003; EM SPACE
    &#x2004; THREE-PER-EM SPACE
    &#x2005; FOUR-PER-EM SPACE
    &#x2006; SIX-PER-EM SPACE
    &#x2007; FIGURE SPACE
    &#x2008; PUNCTUATION SPACE
    &#x2009; THIN SPACE
    &#x200A; HAIR SPACE
    &#x202F; NARROW NO-BREAK SPACE
    &#x205F; MEDIUM MATHEMATICAL SPACE
    &#x3000; IDEOGRAPHIC SPACE
    
    &#x10CB0; OLD HUNGARIAN CAPITAL LETTER EZS
    &#x10CF0; OLD HUNGARIAN SMALL LETTER EZS
    &#x16F36; MIAO LETTER ZSHA
    &#x16F3C; MIAO LETTER ZSA
    &#x16F3E; MIAO LETTER ZZSA
    &#x16F41; MIAO LETTER ZZSYA
    

    However, testing with Saxon 9.5 shows that the last 6 characters are not recognized: http://xsltransform.net/ncntCSo

    0 讨论(0)
提交回复
热议问题