Requirements that regularly come up are to generate indexes and reports for the dataset. This is nice and simple using XSLT 2.0's grouping but require the whole dataset to be in memory, unless you use saxon:discard-document(). It can also be quite slow, if only because you have to read GB's from disk and parse the whole of each and every XML input file to just get the snippet that you're interested in (such as the title, or say all of the elements).
Conversely, XQuery doesn't suffer from the dataset size but lacks XSLT 2.0's grouping features. It's perfectly possible (although a bit involved - you could say "a bit XSLT 1.0") to recreate the grouping in XQuery, but it's just so much nicer in XSLT 2.0. So to get the best of both, you can use eXist's fanstastic REST style interface to select the parts of the XML you're interested in, and then use XSLT 2.0's for-each-group to arrange the results.
In the example stylesheet below I create an index by getting the <title> for each XML document, and then grouping the titles by their first letter, then sorting by title itself. I use eXist to get the <title> element, then XSLT 2.0 to do the sorting and grouping.
I have an instance of eXist running on my local machine and fully populated with the XML dataset. The function fn:eXist() takes the collection I'm interested in and the XQuery to execute against that collection, constructs the correct URI for the REST interface and calls doc() with that URI. The result is a proprietary XML format containing each tuple that I then group using xsl:for-each-group. It's worth noting the -1 value for the
_howmany
parameter on the query - without this it defaults to 10.<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="fn"
xmlns:exist="http://exist.sourceforge.net/NS/exist"
version="2.0">
<xsl:output indent="yes" />
<xsl:param name="db-uri" select="'http://localhost:8080/exist/rest'" />
<xsl:function name="fn:eXist">
<xsl:param name="collection" />
<xsl:param name="query" />
<xsl:sequence select="doc(concat($db-uri, $collection, '?_query=', $query, '&_start=1&_howmany=-1'))/exist:result/node()" />
</xsl:function>
<xsl:template match="/">
<div>
<xsl:for-each-group select="fn:eXist('/db/mycomp/myproject', '/doc/head/title')" group-by="substring(., 1, 1)">
<xsl:sort select="." />
<div>
<div><xsl:value-of select="current-grouping-key()" /></div>
<xsl:for-each select="current-group()">
<xsl:sort select="." />
<div><xsl:value-of select="." /></div>
</xsl:for-each>
</div>
</xsl:for-each-group>
</div>
</xsl:template>
</xsl:stylesheet>
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="fn"
xmlns:exist="http://exist.sourceforge.net/NS/exist"
version="2.0">
<xsl:output indent="yes" />
<xsl:param name="db-uri" select="'http://localhost:8080/exist/rest'" />
<xsl:function name="fn:eXist">
<xsl:param name="collection" />
<xsl:param name="query" />
<xsl:sequence select="doc(concat($db-uri, $collection, '?_query=', $query, '&_start=1&_howmany=-1'))/exist:result/node()" />
</xsl:function>
<xsl:template match="/">
<div>
<xsl:for-each-group select="fn:eXist('/db/mycomp/myproject', '/doc/head/title')" group-by="substring(., 1, 1)">
<xsl:sort select="." />
<div>
<div><xsl:value-of select="current-grouping-key()" /></div>
<xsl:for-each select="current-group()">
<xsl:sort select="." />
<div><xsl:value-of select="." /></div>
</xsl:for-each>
</div>
</xsl:for-each-group>
</div>
</xsl:template>
</xsl:stylesheet>
This article is repeated here
No comments:
Post a Comment