Thursday, November 09, 2006

Using collection() and saxon:discard-document() to create reports

You can process directories of XML using the collection() function, and keep memory usage constant by using the Saxon extension saxon:discard-document()


<xsl:for-each select="for $x in collection('file:///c:/xmlDir?select=*.xml;recurse=yes;on-error=ignore') return saxon:discard-document($x)">


You have to be careful that Saxon doesn't optimize out the call to saxon:discard-document() - this basic outer xsl:for-each works well and has become boilerplate code for whenever I start a new report.

This technique allows you to do things that would otherwise not be feasible with XSLT, and would take longer in another language. For example finding, grouping and sorting all links in your collection of XML files. Coding the XSLT takes minutes and running it takes time proportional to your dataset size, but the restriction of system memory has gone.

2 comments:

Lars said...

Very handy! Thanks. I was not familiar with collection().

I suppose this can be used to get directories of files even if the files are not XML documents?

Tareq Hasan said...

Do you use spell checker software to look for errors on your articles? They do help but they can not find everything that is wrong. If you do not proofread your work you could end up with more mistakes then you expected. See more english spell check online