I have a question for the clever people of the SO community.
Below is a snippet of XML generated by the Symphony CMS.
<news>
<entry>
<title>Lorem Ipsum</title>
<body>
<p><strong>Lorem Ipsum</strong></p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
</body>
</entry>
</news>
What I need to do is take a portion of the <body>
element, based on a specified length, for display in the blog style of:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem... more
...where more is a link to the full news item. I know I can select specific paragraphs and I also know I can use the substring function to bring a specified number of characters. However, I need to preserve the formatting of the text, i.e. the HTML tags within the <body>
element.
I realise this raises issues of tag closure but there must surely be a way. Hopefully someone more experienced with XSLT can shed some light on this issue.
Here's my version. I've tested it over your XML sample and it works.
To invoke it, use <xsl:apply-templates select="path/to/body/*" mode="truncate"/>
.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<!-- limit: the truncation limit -->
<xsl:variable name="limit" select="250"/>
<!-- t: Total number of characters in the set -->
<xsl:variable name="t" select="string-length(normalize-space(//body))"/>
<xsl:template match="*" mode="truncate">
<xsl:variable name="preceding-strings">
<xsl:copy-of select="preceding::text()[ancestor::body]"/>
</xsl:variable>
<!-- p: number of characters up to the current node -->
<xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>
<xsl:if test="$p < $limit">
<xsl:element name="{name()}">
<xsl:apply-templates select="@*" mode="truncate"/>
<xsl:apply-templates mode="truncate"/>
</xsl:element>
</xsl:if>
</xsl:template>
<xsl:template match="text()" mode="truncate">
<xsl:variable name="preceding-strings">
<xsl:copy-of select="preceding::text()[ancestor::body]"/>
</xsl:variable>
<!-- p: number of characters up to the current node -->
<xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>
<!-- c: number of characters including current node -->
<xsl:variable name="c" select="$p + string-length(.)"/>
<xsl:choose>
<xsl:when test="$limit <= $c">
<xsl:value-of select="substring(., 1, ($limit - $p))"/>
<xsl:text>…</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@*" mode="truncate">
<xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Here is a complete XSLT 1.0 transformation that solves exactly the problem.
This XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
xmlns:f="http://fxsl.sf.net/"
xmlns:myAdd="f:myAdd"
xmlns:myParam="f:myParam"
exclude-result-prefixes="ext f myAdd myParam"
>
<xsl:import href="scanl.xsl"/>
<!-- -->
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- -->
<myAdd:myAdd/>
<myParam:myParam>0</myParam:myParam>
<!-- -->
<xsl:param name="pTruncateLength" select="772"/>
<!-- -->
<xsl:variable name="vFun" select="document('')/*/myAdd:*[1]"/>
<xsl:variable name="vZero" select="document('')/*/myParam:*[1]"/>
<!-- -->
<xsl:variable name="vrtfScanResults">
<xsl:call-template name="scanl">
<xsl:with-param name="pFun" select="$vFun"/>
<xsl:with-param name="pQ0" select="$vZero" />
<xsl:with-param name="pList" select="/*/*/body//text()"/>
</xsl:call-template>
</xsl:variable>
<!-- -->
<xsl:variable name="vScanResults"
select="ext:node-set($vrtfScanResults)"/>
<xsl:variable name="vindNode" select=
"count($vScanResults/*[. > $pTruncateLength][1]
/preceding-sibling::*)"/>
<!-- -->
<xsl:variable name="vrtfTruncInfo">
<xsl:for-each select="/*/*/body//text()">
<!-- -->
<xsl:variable name="vPos" select="position()"/>
<tNode id="{generate-id()}">
<xsl:attribute name="preserve">
<xsl:if test="$vPos < $vindNode">
<xsl:value-of select="string-length(.)"/>
</xsl:if>
<xsl:if test="$vPos > $vindNode">
<xsl:value-of select="0"/>
</xsl:if>
<xsl:if test="$vPos = $vindNode">
<xsl:value-of select=
"$vScanResults/*[$vindNode+1]
-
$pTruncateLength"/>
</xsl:if>
</xsl:attribute>
</tNode>
</xsl:for-each>
</xsl:variable>
<!-- -->
<xsl:variable name="vTruncInfo" select="ext:node-set($vrtfTruncInfo)"/>
<!-- -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- -->
<xsl:template match="text()[ancestor::body]">
<xsl:variable name="vAllowedLength"
select="$vTruncInfo/*[@id = generate-id(current())]/@preserve"
/>
<!-- -->
<xsl:value-of select="substring(.,1,$vAllowedLength)"/>
<xsl:if test="string-length(.) > $vAllowedLength
and
$vAllowedLength > 0
">
<strong> ...more</strong>
</xsl:if>
</xsl:template>
<!-- -->
<xsl:template match="myAdd:*" mode="f:FXSL">
<xsl:param name="pArg1"/>
<xsl:param name="pArg2"/>
<xsl:value-of select="$pArg1 + string-length($pArg2)"/>
</xsl:template>
</xsl:stylesheet>
when applied on the original source XML document:
<news>
<entry>
<title>Lorem Ipsum</title>
<body>
<p>
<strong>Lorem Ipsum</strong>
</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
<p>This text should not be displayed</p>
</body>
</entry>
</news>
produces the wanted result:
<news>
<entry>
<title>Lorem Ipsum</title>
<body>
<p>
<strong>Lorem Ipsum</strong>
</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
<p>Lorem <strong> ...more</strong>
</p>
<p/>
</body>
</entry>
</news>
Do note the following:
The
scanl
stylesheet from the FXSL library is imported. This template is commonly used to accumulate data from processing a list of items. The function (the template matchingmyAdd:*
) that does the actual processing is passed as a parameter to thescanl
template. The other parameter that must be passed to it is the "initial" value from processing, which is to be returned if the passed list of items is empty.The global parameter
$pTruncateLength
holds the maximum string length exceeding which the text must be truncated
What you are asking is an XSLT ellipsis generator.
May be this xslt 1.0 template might give you some idea:
Here is the main gist of it:
<xsl:template match="text()" mode="label">
<xsl:param name="self-x"/>
<xsl:param name="self-y"/>
<xsl:variable name="text" select="normalize-space(.)"/>
<!-- a quick and dirty way to avoid problems with line breaks -->
<!-- replace the select attribute with this call
if you want to use a fancier way to escape whitespace
characters:
<xsl:call-template name="escape-ws"
<xsl:with-param name="text" select="." /
</xsl:call-template
-->
<use xlink:href="#text-box" transform="translate({$self-x}
{$self-y})"/>
<!-- text nodes are marked with a little box -->
<text x="{$self-x + $writing-bump-over}"
y="{$self-y - $writing-bump-up}"
style="{$text-font-style}; stroke:none; fill:{$text-color}">
<xsl:text>"</xsl:text>
<xsl:value-of select="substring($text,1,$max-text-length)"/>
<!-- truncate the text node to $max-text-length -->
<xsl:if test="string-length($text) > $max-text-length">
<!-- add an ellipsis if necessary -->
<xsl:text>...</xsl:text>
</xsl:if>
<xsl:text>"</xsl:text>
</text>
</xsl:template>
Note:
- you will need to replace the ellipsis by a link, but the main idea is there.
- this represents only a small extract of the all script
- you may not need everything in it: if you need "
<use xlink:href="...
", you need to declare the xlink namespace
After much hacking, I came to this solution:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!--
Author: Neil Albrock
Version: 1.0
Description: Truncate by a character limit and retain HTML content.
Usage:
<xsl:call-template name="truncate">
<xsl:with-param name="data" select="path/to/your/body" />
<xsl:with-param name="length" select="250" />
<xsl:with-param name="link" select="'href'" />
</xsl:call-template>
-->
<xsl:template name="truncate">
<!-- The node set to be worked on. -->
<xsl:param name="data"/>
<!-- The desired truncate length. Default to length of data. -->
<xsl:param name="length" select="string-length($data)"/>
<!-- More link -->
<xsl:param name="link"/>
<xsl:choose>
<!-- Return whole data if it's within length. -->
<xsl:when test="string-length($data) <= $length">
<xsl:copy-of select="$data" />
</xsl:when>
<!-- Truncate to desired length. -->
<xsl:otherwise>
<xsl:for-each select="$data/*">
<xsl:variable name="this-node" select="string-length(.)"/>
<xsl:variable name="preceding-nodes">
<xsl:copy-of select="preceding-sibling::*"/>
</xsl:variable>
<xsl:variable name="node-sum" select="string-length(normalize-space($preceding-nodes))"/>
<xsl:variable name="limit" select="$node-sum + $this-node"/>
<xsl:choose>
<xsl:when test="$limit > $length and $node-sum <= $length">
<p>
<xsl:value-of select="substring(.,1,$length - $node-sum)"/>
<xsl:text>…</xsl:text>
<a>
<xsl:attribute name="href">
<xsl:value-of select="$link"/>
</xsl:attribute>
<xsl:text>more</xsl:text>
</a>
</p>
</xsl:when>
<xsl:when test="$limit < $length">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:otherwise/>
</xsl:choose>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I would use the solution by Chaotic Pattern though, it's more elegant ;-)
This will be an episode in pain using XSLT. I would strongly recommend using a scripting language like Perl/Python to attempt this.
来源:https://stackoverflow.com/questions/532147/truncate-xml-with-xslt