Friday, June 29, 2007

Kernow 1.5.0.9 [beta] available

I've made a new beta version of Kernow available:

  • XSLT directory transforms are now multi-threaded, with the size of the thread pool configurable in the options. For systems with multi-core processors and enough available memory this can really improve directory transform time.
  • Single File / Standalone transforms now have a "Performance Testing" feature where the transforms are run repeatedly and the average time displayed.
  • The panels can now be resized for the XQuery Sandbox tab. The fixed size XQuery Sandbox made it pretty useless before - at least now you can see more than four lines!
  • Other minor bug fixes

There's loads still to do, but I'm trying to stick to the "release early, release often" policy.

Monday, June 18, 2007

UK Postcode Googlemaps mashup

Everyone else has done a mashup so I thought I should do one too. My mashup combines GoogleMaps with StreetMap.co.uk's GridConvert to get the nearest postcode for a given longitude and latitude. I had to use some server side PHP to get around cross domain scripting issues and to scrape the postcode from result HTML, but other than that it's all through the GoogleMaps API.

Friday, June 15, 2007

Sukoku Solver - much improved.

I've just uploaded the latest version of my Sudoku Solver.

I wrote the initial version of this in March 2006, and have since been in competition with Dimitre Novatchev to have the fastest solution. Dimitre's held that title for while, but with this version I think I may have won it back.

The additions here are Naked Tuple discovery in the first phase, and then repeatedly re-ordering the cells by least-number-of-possible-values as each cell is populated in the second phase. Sounds obvious but its something I missed first time around.

On my machine this latest version solves the vast majority of puzzles (including AI Escargot) in under a second, with the occasional one taking just under two seconds. The previous version would take up to a minute on some puzzles, so I'm pleased with improvement.

Hopefully Dimitre will do some comparisons using his latest version, and then tell me the good news :)

My new homepage

I'd like to draw attention to my new homepage: andrewjwelch.com. I've created this as a better place to store the code samples that were posted in this blog, and to link to my open source projects.

I'll try to keep news and opinions to this blog, and code samples on the homepage.

Friday, June 08, 2007

A long drawn out redundancy...

I've been informed that I will be made redundant... in October! That's a long 4 1/2 months away. In the mean time I'm meant to be performing maintenance and handover tasks (4 months of handover ?!?!) while the redundancy + retention carrot is tentatively dangled to keep us here until then...

So if you can see that far ahead and need a Java/XML/XSLT bod to fill a high paying contract, let me know :)

Friday, March 30, 2007

Er, the real "hardest Sudoku"

Following the comment from Malcom below, I've run my Sudoku solver against the real "AL Escargot"...

Using Saxon 8.9.0.3b from the command line with the -3 option (to discount java startup time) the average time across three runs is 2213ms... which isn't bad at all.

This is when I just solve the puzzle starting at the top-left - if I turn on the ordering feature where empty cells are processing by least number of possibilities first, then the time dramatically increases to ~58 seconds, which shows why Sudoku is NP-complete.

If you're interested the solution it produced is:


1, 6, 2,   8, 5, 7,   4, 9, 3
5, 3, 4,   1, 2, 9,   6, 7, 8
7, 8, 9,   6, 4, 3,   5, 2, 1

4, 7, 5,   3, 1, 2,   9, 8, 6
9, 1, 3,   5, 8, 6,   7, 4, 2
6, 2, 8,   7, 9, 4,   1, 3, 5

3, 5, 6,   4, 7, 8,   2, 1, 9,
2, 4, 1,   9, 3, 5,   8, 6, 7,
8, 9, 7,   2, 6, 1,   3, 5, 4,

Thursday, March 29, 2007

The Worlds Hardest Sudoku

A mathematician claims to have penned the world's hardest Sudoku puzzle.

Running it through my Sudoku Solver it took ~24 seconds on my work machine (most puzzles are sub 1 second on the same machine) so I would think it must be reasonably hard.

Friday, February 23, 2007

A CSV to XML converter in XSLT 2.0

* Note: This transform has been updated http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html


I wrote a rudimentary csv to XML converter a while back which broke when the csv contained quoted values, eg foo, "foo, bar", bar Dealing with these quotes is surprisingly hard, especially when you take into account quotes are escaped by doubling them.

I raised it on xsl-list, and Abel Braaksma came up with a genious solution - the technique is to use both sides of analyze-string - read his post for the best explanation.

To keep the transform generic I've used an attribute instead of an element for the column names to cope with names that aren't valid QNames (for example ones that contain a space) - for my own use would add a function to convert names to valid QNames and then change <elem name="{...}"> to <xsl:element name="{fn:getQName(...)}"> as it generates nicer XML. I've also modified the non-matching-substring side of analyze-string to only return tokens that contain values.

So this sample input:

Col 1, Col 2, Col 3
foo, "foo,bar", "foo:""bar"""


...creates this output:

<root>
<row>
<elem name="Col 1">foo</elem>
<elem name="Col 2">foo,bar</elem>
<elem name="Col 3">foo:"bar"</elem>
</row>
</root>

Here's the finished transform:


<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="fn"
exclude-result-prefixes="xs fn">

<xsl:output indent="yes" encoding="US-ASCII"/>

<xsl:param name="pathToCSV" select="'file:///c:/csv.csv'"/>

<xsl:function name="fn:getTokens" as="xs:string+">
<xsl:param name="str" as="xs:string"/>
<xsl:analyze-string regex='("[^"]*")+' select="$str">
<xsl:matching-substring>
<xsl:sequence select='replace(., "^""|""$|("")""", "$1")'/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:for-each select="tokenize(., '\s*,\s*')">
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>

<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV)"/>
<xsl:variable name="lines" select="tokenize($csv, '&#xa;')" as="xs:string+"/>
<xsl:variable name="elemNames" select="fn:getTokens($lines[1])" as="xs:string+"/>
<root>
<xsl:for-each select="$lines[position() > 1]">
<row>
<xsl:variable name="lineItems" select="fn:getTokens(.)" as="xs:string+"/>

<xsl:for-each select="$elemNames">
<xsl:variable name="pos" select="position()"/>
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]"/>
</elem>
</xsl:for-each>
</row>
</xsl:for-each>
</root>
</xsl:when>
<xsl:otherwise>
<xsl:text>Cannot locate : </xsl:text>
<xsl:value-of select="$pathToCSV"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

</xsl:stylesheet>

Monday, February 05, 2007

A Soap Extension Function

Recently I had to write a client to retrieve XML from a web service that required authentication. All this meant was that the credentials needed to be in the soap header, eg:

<soapenv:Envelope>
<soapenv:Header>
<c:AuthHeader>
<c:Username>foo</c:Username>
<c:Password>bar</c:Password>
</c:AuthHeader>
</soapenv:Header>
<soapenv:Body>
...
Anyone that's coded any Java web service clients will know how hard it is to get the methods generated for you to add this soap header to your calls... its a nightmare. It varies between generation tool and the specification level you're coding to. What makes it worse is the whole reason you go through this pain is to send XML down the wire and get XML back. It would be much nicer if you could just make the call in XSLT....

I've written the following extension function to do just that. It accepts a soap request and an endpoint, makes the request and returns the soap response. It leaves the "complexity" of creating the request and processing to response to the XSLT, where it's pretty straightforward. Here's the java:

package net.sf.kernow.soapextension;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.StringWriter;
import java.io.UnsupportedEncodingException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.om.NodeInfo;

/**
* Enables the calling of SOAP based web services from XSLT.
* @author Andrew Welch
*/
public class SOAPExtension {

public static String soapRequest(NodeInfo requestXML, String endpoint) {
String result = makeCall(transformToString(requestXML), endpoint);
return result;
}

public static String soapRequest(String requestXML, String endpoint) {
String result = makeCall(requestXML, endpoint);
return result;
}

private static String transformToString(NodeInfo sourceXML) {

StringWriter sw = new StringWriter();

try {
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer();
transformer.transform(sourceXML, new StreamResult(sw));
} catch (TransformerConfigurationException ex) {
ex.printStackTrace();
} catch (TransformerException ex) {
ex.printStackTrace();
}

return sw.toString();
}

private static String makeCall(String requestXML, String endpoint) {

String SOAPUrl = endpoint;
StringBuffer responseBuf = new StringBuffer();

try {
// Create the connection to the endpoint
URL url = new URL(SOAPUrl);
URLConnection connection = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) connection;

byte[] b = requestXML.getBytes("UTF-8");

// Set the appropriate HTTP parameters.
httpConn.setRequestProperty( "Content-Length", String.valueOf(b.length));
httpConn.setRequestProperty("Content-Type","text/xml; charset=utf-8");

httpConn.setRequestMethod("POST");
httpConn.setDoOutput(true);
httpConn.setDoInput(true);

// Send the the request
OutputStream out = httpConn.getOutputStream();
out.write(b);
out.close();

// Read the response and write it to the response buffer.
InputStreamReader isr = new InputStreamReader(httpConn.getInputStream());
BufferedReader in = new BufferedReader(isr);

String line;
do {
line = in.readLine();
if (line != null) {
responseBuf.append(line);
}
} while (line != null);

in.close();

} catch (ProtocolException ex) {
ex.printStackTrace();
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}

return responseBuf.toString();
}
}

I've put the extension function in the "net.sf.kernow.soapextension" package and called it SOAPExtension (it will be in the 1.5 version of Kernow when I eventually release it). Now the XSLT to make and process the requests:

<xsl:stylesheet version="2.0"
xmlns:soap="net.sf.kernow.soapextension.SOAPExtension"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://saxon.sf.net/"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">

<xsl:param name="endpoint" select="'http://somewebservice'"/>

<xsl:variable name="request">
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://somewebservice/">
<soapenv:Body>
<ws:getSomething>
<urn>name123</urn>
</ws:getSomething>
</soapenv:Body>
</soapenv:Envelope>
</xsl:variable>

<xsl:template match="/" name="main">
<xsl:apply-templates select="saxon:parse(soap:soapRequest($request, $endpoint))" mode="process-SOAP-message"/>
</xsl:template>

<xsl:template match="/" mode="process-SOAP-message">
<xsl:apply-templates select="saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())" mode="process-response-payload"/>
</xsl:template>

<xsl:template match="/" mode="process-response-payload">
<xsl:apply-templates/>
</xsl:template>


</xsl:stylesheet>
There are a couple of thing to notice - firstly you call soapRequest() with the message as a document node, and the endpoint as a string. The extension will also accept the message as a string, but that would just request the extra step of saxon:serialize($request).

Secondly you need to use saxon:parse to parse the response string into XML. Applying templates to saxon:parse() will search for the root matching template, so to avoid endless loops different modes are used to separate the various root matching templates.

The template in the mode "process-SOAP-message" deals with processing the soap response, so the root element here would be , so in order to get to the actual payload (and to treat it as a document in its own right) I use:

saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())

...and a third root matching template in the mode "process-response-payload" (the actual path may vary for your payload). In this template you deal with actual response, so you can apply-templates to it, write it to disk etc

And that's it, it really is as simple as that. The S in SOAP can mean Simple :)

Sunday, February 04, 2007

Testing XSLT - CheckXML

It's well known that XSLT isn't easily unit-testable, and there isn't currently a standard way of testing transforms for correctness. I've long thought that the only way to do this is to run the transform using a given input and check the result, and through that infer correctness in the transform.


I wrote a stylesheet to do this in XSLT 2.0 with heavy use of extensions (to perform the transforms and execute the XPaths) which was nice from an academic standpoint, but it soon became clear that this would be more useful as a Java app runnable from Ant.


This little project grew into a way of checking any XML file (to check a transform it runs the transform first and then checks the result). I'm provisionally calling this "CheckXML" and it's still early days but I think it's got the potential to be something really good.

CheckXML will allow you perform various checks on an XML file - XML Schema, XPath 2.0, XSLT 2.0, XQuery and Relax NG. This allows users to augment schema checks with XPath/XSLT checks to fully check the correctness of an XML file. A sample CheckXML configuration file would be:

<checkXML>
 <xml src="SampleXML.xml">
  <check>SampleXSD.xsd</check>
  <check>count(distinct-values(//@id)) = count(//id)</check>
  <check>SampleXSLT.xslt</check>
 </xml>
</checkXML>

Here the XML file "SampleXML.xml" first has the XML Schema SampleXSD.xsd applied to it, the then the XPath in the second check and finally the XSLT check. The CheckXML app will run each check and look for a result of "true" - any value that isn't true will be reported as a fail. This is the crucial point - it offloads the reponsbility of a creating a useful error to the check writer, and because the check write has the full power of XSLT/XPath/XQuery the error message can be as detailed as necessary (XSD/RNG check will return the error message if the XML isn't valid).

For example, modifying the XPath above to return a helpful message could be like this:

<check>if (count(distinct-values(//@id)) = count(//@id)) then 'true' else concat('The following id is not unique: ', distinct-values(for $x in //@id return $x[count(//@id[. = $x]) > 1]))</check>

This would output "The following id is not unique: 123" if you had two @id's with the value "123". The XPath's can get pretty complicated pretty quickly, which is why most of them should be moved into XSLT files, but it still might be more convenient to just use the XPath in the check config file itself. As the check writer has full control over the return message, it can be as simple or complicated as needed.

The CheckConfig files can have multiple <xml> elements (<transform> elements for checking transforms) with each one having as many <check>'s as required. A CheckSuite will point to multiple CheckConfig files. CheckXML will be callable from Ant, with any fails causing the Ant build to fail (with all details of the fail in the logs). I'm also planning a GUI with a nice green/red progress bar :) and a CheckConfig editor, but that's down the line.

Thursday, November 09, 2006

Using collection() and saxon:discard-document() to create reports

You can process directories of XML using the collection() function, and keep memory usage constant by using the Saxon extension saxon:discard-document()


<xsl:for-each select="for $x in collection('file:///c:/xmlDir?select=*.xml;recurse=yes;on-error=ignore') return saxon:discard-document($x)">


You have to be careful that Saxon doesn't optimize out the call to saxon:discard-document() - this basic outer xsl:for-each works well and has become boilerplate code for whenever I start a new report.

This technique allows you to do things that would otherwise not be feasible with XSLT, and would take longer in another language. For example finding, grouping and sorting all links in your collection of XML files. Coding the XSLT takes minutes and running it takes time proportional to your dataset size, but the restriction of system memory has gone.

Monday, September 04, 2006

Table Normalization in XSLT 2.0

Some time ago I wrote a 1.0 stylesheet that normalizes a table - it's currently available on Dave Pawson's FAQ.

The problem of table normalization is to remove colspans and rowspans from a table placing a copy of the content in each cell.

This table:


+-----------+-----------+
| a | b |
| +-----+-----+
| | c | d |
+-----------+-----+ |
| e | |
+-----+-----+-----+ |
| f | g | h | |
+-----+-----+-----+-----+


Would become this:


+-----+-----+-----+-----+
| a | a | b | b |
+-----+-----+-----+-----+
| a | a | c | d |
+-----+-----+-----+-----+
| e | e | e | d |
+-----+-----+-----+-----+
| f | g | h | d |
+-----+-----+-----+-----+


The 1.0 solution I wrote used a neat recursive template that maintained pointers to the previous and current rows to deal with the tricky rowspans (colspans are the easy part). In rewriting the transform in 2.0 I've simplified it a great deal, but sadly not with any new 2.0 only feature, just with a better algorithm. Here's the transform:


<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">

<xsl:output method="xhtml" indent="yes"
omit-xml-declaration="yes"
encoding="UTF-8" />

<xsl:variable name="table_with_no_colspans">
<xsl:apply-templates mode="colspan"/>
</xsl:variable>

<xsl:variable name="table_with_no_rowspans">
<xsl:for-each select="$table_with_no_colspans">
<xsl:apply-templates mode="rowspan"/>
</xsl:for-each>
</xsl:variable>

<xsl:template match="/">
<xsl:apply-templates select="$table_with_no_rowspans" mode="final"/>
</xsl:template>

<xsl:template match="@*|*" mode="#all">
<xsl:copy>
<xsl:apply-templates select="@*|*" mode="#current"/>
</xsl:copy>
</xsl:template>

<xsl:template match="td" mode="colspan">
<xsl:choose>
<xsl:when test="@colspan">
<xsl:variable name="this" select="." as="element()"/>
<xsl:for-each select="1 to @colspan">
<td>
<xsl:copy-of select="$this/@*[not(name() = 'colspan')][not(name() = 'width')]"/>
<xsl:copy-of select="$this/node()"/>
</td>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="tbody" mode="rowspan">
<xsl:copy>
<xsl:copy-of select="tr[1]" />
<xsl:apply-templates select="tr[2]" mode="rowspan">
<xsl:with-param name="previousRow" select="tr[1]"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>

<xsl:template match="tr" mode="rowspan">
<xsl:param name="previousRow"/>

<xsl:variable name="currentRow" select="."/>

<xsl:variable name="normalizedTDs">
<xsl:for-each select="$previousRow/td">
<xsl:choose>
<xsl:when test="@rowspan > 1">
<xsl:copy>
<xsl:attribute name="rowspan">
<xsl:value-of select="@rowspan - 1" />
</xsl:attribute>

<xsl:copy-of select="@*[not(name() = 'rowspan')]"/>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="$currentRow/td[1 + count(current()/preceding-sibling::td[not(@rowspan) or (@rowspan = 1)])]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:variable>

<xsl:variable name="newRow" as="element(tr)">
<xsl:copy>
<xsl:copy-of select="$currentRow/@*"/>
<xsl:copy-of select="$normalizedTDs"/>
</xsl:copy>
</xsl:variable>

<xsl:copy-of select="$newRow"/>

<xsl:apply-templates select="following-sibling::tr[1]" mode="rowspan">
<xsl:with-param name="previousRow" select="$newRow"/>
</xsl:apply-templates>
</xsl:template>

<xsl:template match="td" mode="final">
<xsl:choose>
<xsl:when test="@rowspan">
<xsl:copy>
<xsl:copy-of select="@*[not(name() = 'rowspan')]"/>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

</xsl:stylesheet>

Friday, July 07, 2006

Kernow 1.4 released

I've finally got around to releasing a new version of Kernow.

This version adds a few new features and I've also fixed a few annoying bugs:


  • Added optional caching URI and entity resolvers (including an entity resolver for documents referenced through the collection() function)
  • Added directory validation using SaxonSA, Xerces or JAXP (choice of XSD or RNG where possible).
  • Added the ability to run schema aware XQueries
  • Added default, lax and strict validation for schema aware transforms and queries
  • Added the ability to choose the type of files to process in a directory (eg, xml, xhtml, or well-formed html)
  • Fixed a number of small bugs, such as the progress bar occasionally throwing an NPE.

Wednesday, June 14, 2006

Saxon 8.7.3 released

Mike Kay has released Saxon 8.7.3

Saxon-B: http://saxon.sf.net/
Saxon-SA: http://www.saxonica.com/

"This maintenance release clears 35 documented bugs, and on .NET it adds the
capability to process a DOM Document "in situ" rather than by first copying
it to create a Saxon tree.

(Saxon 8.7.2, in case you missed it, was a rebuild of Saxon-SA on .NET with
no changed functionality.)"

Monday, June 12, 2006

A Caching EntityResolver for the collection() function

XSLT 2.0 allows you to write a "standalone stylesheet", that is, a transform with no input XML. Instead you must provide the name of the template where execution should begin. The stylesheet can then use the document() or doc() functions to process individual files, and the collection() function to process directories of XML. I use the standalone XSLT model a lot where you have multiple input files and a single output file, for example generating a report or creating an index page.

I recently ran into a problem though, where a few thousand of several thousand input files referenced entity files hosted at the w3.org. This only became apparent when the transforms started failing for no reason. It turns out that the network admins where I work had turned off the proxy, so each and every entity file was being fetched from the w3c - where making 500 requests in 10 minutes mean you get a 24 hour ban...

In a standard transform catching and caching requests for entity files is easy - you write a custom EntityResolver and tell the XMLReader to use that. For the collection() function however it's a lot more complicated. In Saxon 8.7.1, the URI used in the collection() function is resolved using the StandardCollectionURIResolver(). You can use your own one of these by calling setCollectionURIResolver() on the Configuration, and then do what you like within that class. To create a CollectionURIResolver() that caches entities files, it's pretty much a case of copying whats in the StandardCollectionURIResolver() and then modifying it to suit your needs.

I've added this functionality to the next release of Kernow, so that standalone transforms and XQueries have the option of a caching EntityResolver. I get the impression it's early days here, and this will become a common enough requirement that a setter will be exposed for an EntityResolver on the XMLReader used by the collection() function, making this kind of task much easier in the future.

Thursday, May 25, 2006

Character encoding, character references and Windows-1252

Consider this XML:

<?xml version="1.0" encoding="Windows-1252" ?>
<foo>&_#150;0x96</foo>

Here there is the character reference #150 and the actual character 0x96 that character reference #150 resolves to. The underscore is used to prevent the character reference being resolved by the browser, and the string '0x96' is used instead of the actual character 0x96 because blogger complains otherwise (stick with me...) Note the encoding used in the prolog.

Transforming that XML with this stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output encoding="US-ASCII"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Gives this result:

<foo>&_#150;&_#8211;</foo>

Here you can see the character reference has been written back out as the same character, however character 0x96 has become 0x2013 (8211 in decimal). Character 150 is the unicode character "START OF GUARDED AREA" in the non-displayed C1 control character range, but in the Windows-1252 encoding it's mapped to to the displayable character 0x2013 "en-dash" (a short dash). Microsoft squeezed more characters into the single byte range by replacing non-displayed control characters with more useful displayable characters, but mistakenly went on to label files encoded in this way as ISO-8859-1 in some MS Office applications. In ISO-8859-1 the characters in the C0 and C1 ranges are the non-displayable control characters, but this mis-labelling was so widespread that parsers began detecting this situation and silently switching the read encoding to Windows-1252.

This problem surfaces when serving XHTML to an XHTML browser. While browsers are reading files using their HTML parsers, any file mis-labelled as ISO-8859-1 that contains characters in the C0 or C1 ranges will still auto-magically display the characters in those ranges, as the forgiving parsers auto-switch the read encoding. However, when an XHTML file is served (using the correct mime type eg "application/xhtml+xml") XHTML browsers such as Firefox will parse the file using its stricter XML parser - and all the characters in the C0 and C1 ranges will remain as those non-displayed characters. The auto-switch won't take place and characters such as 0x96 (en-dash) that were once displayed will just disappear.

This problem only occurs when an XML file is saved in Windows-1252 but is labelled as something else, usually IS0-8859-1. The most common culprit is Notepad, where a user has edited and saved an XML file without realising/caring that Notepad is unaware of the XML prolog.

So back to the example above:

<?xml version="1.0" encoding="Windows-1252" ?>
<foo>&_#150;0x96</foo>

The main point to realise here is that character references (the #150 in the example) are always resolved using Unicode codepoints, regardless of the specified encoding. Actual characters in the file will be read using the specified encoding. Therefore the #150 is resolved to 0x96 (its Unicode codepoint), while the actual character 0x96 in the source becomes 0x2013 (#8211) as specified in the Windows-1252 encoding.

The result of the transformation demonstrates this when serialised using the US-ASCII encoding (so all bytes above 127 will be written out as character references) looks like this:

<foo>&_#150;&_#8211;</foo>

A great example I think :)

Wikipedia has lots more information: http://en.wikipedia.org/wiki/ISO_8859-1

*There are two versions of ISO-8859-1 - The ISO and IANA versions. The ISO versions doesn't contain the C0 and C1 control characters, the IANA version does contain them. The XML recommendation uses the IANA version.

Sunday, May 07, 2006

Kernow 1.3 released

I've uploaded a new version Kernow that contains Saxon 8.7.1 with a few minor bugs fixed. These mainly centered around paths - the systemId of the source XML and base uri used by xsl:result-document.

The latter was pretty straightforward, the base uri used by xsl:result-document is set using Controller.setBaseOutputURI() - this value is now set to either the uri of the stylesheet, or the output file/directory (depending if one is supplied). This makes sense, as if you are outputting to a given file/directory, you would like the output from your xsl:result-document instructions to be relative to that (in my view, anyway). If you haven't supplied an output file/directory and you're outputting to the Kernow output window, then it should be resovled against the stylesheet.

Setting the systemId of the TransformerHandler to be that of the XML file seemed wrong to me at first, as you would expect it to be used by the stylesheet. After thinking about it a little, it makes some sense: the TransformerHandler knows the systemId of the stylesheet from when it was created - it doesn't know the location of the source XML as that arrives as a series of SAX events - therefore it's possible to set it through setSystemId. Perhaps someone knows for sure...

Tuesday, May 02, 2006

Sudoku Solver complete - and shes fast

(NOTE: I've improved on this version here)

I finally got around to finishing the Sudoku solver after several weeks of other things getting the way.

I've improved the heuristics that check for the allowed values for a given cell. It now checks the row and column "friends" - the other rows and columns in a group (eg columns 1 and 2 when the cell is in column 3, or columns 1 and 3 when the cell is in 2). This is quite straightforward for a human and is done almost without noticing, but its reasonably involved to code.

The other change was to recursively insert all the values into cells with only one possible value, and then re-evaluate the allowed values with those cells inserted. This takes care of the situation where there are initially two possible values, but after inserting a definite value elsewhere there is only one possible values left.

Ultimately all this reduces the number of times the transform has to guess - on certain boards it would take a long time to solve purely because it was guessing wrongly early on, and then have to go through all the permutations before working its way back to that early point and trying the next number. Now there should be the maximum number of values in place before it has to guess (if at all) so that it doesn't waste time backtracking.

Here it is. To try it out run it using Saxon with "-it main" set, or run it against a dummy input XML. The are a few sample boards included, to try them out change the main template to point the variable containing the board, or cut and paste it's contents into the param at the top of the stylesheet.

If you find a board that this stylesheet can't process quickly, please let me know.



<xsl:stylesheet version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://saxon.sf.net/"
xmlns:fn="sudoku">

<xsl:param name="board" select="(
1,0,0, 3,0,0, 6,0,0,
0,2,0, 5,0,0, 0,0,4,
0,0,9, 0,0,0, 5,2,0,

0,0,0, 9,6,3, 0,0,0,
7,1,6, 0,0,0, 0,0,0,
0,0,0, 0,8,0, 0,4,0,

9,0,0, 0,0,5, 3,0,7,
8,0,0, 4,0,6, 0,0,0,
3,5,0, 0,0,0, 0,0,1
)" as="xs:integer+"/>

<xsl:param name="verbose" select="false()" as="xs:boolean"/>

<xsl:variable name="rowStarts" select="(1, 10, 19, 28, 37, 46, 55, 64, 73)" as="xs:integer+"/>
<xsl:variable name="topLeftGroup" select="(1, 2, 3, 10, 11, 12, 19, 20, 21)" as="xs:integer+"/>
<xsl:variable name="topGroup" select="(4, 5, 6, 13, 14, 15, 22, 23, 24)" as="xs:integer+"/>
<xsl:variable name="topRightGroup" select="(7, 8, 9, 16, 17, 18, 25, 26, 27)" as="xs:integer+"/>
<xsl:variable name="midLeftGroup" select="(28, 29, 30, 37, 38, 39, 46, 47, 48)" as="xs:integer+"/>
<xsl:variable name="center" select="(31, 32, 33, 40, 41, 42, 49, 50, 51)" as="xs:integer+"/>
<xsl:variable name="midRightGroup" select="(34, 35, 36, 43, 44, 45, 52, 53, 54)" as="xs:integer+"/>
<xsl:variable name="bottomLeftGroup" select="(55, 56, 57, 64, 65, 66, 73, 74, 75)" as="xs:integer+"/>
<xsl:variable name="bottomGroup" select="(58, 59, 60, 67, 68, 69, 76, 77, 78)" as="xs:integer+"/>
<xsl:variable name="bottomRightGroup" select="(61, 62, 63, 70, 71, 72, 79, 80, 81)" as="xs:integer+"/>

<xsl:function name="fn:getRowStart" as="xs:integer+" saxon:memo-function="yes">
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="xs:integer(floor(($index - 1) div 9) * 9) + 1"/>
</xsl:function>

<xsl:function name="fn:getRowNumber" as="xs:integer+" saxon:memo-function="yes">
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="(($index - 1) idiv 9) + 1"/>
</xsl:function>

<xsl:function name="fn:getColNumber" as="xs:integer+" saxon:memo-function="yes">
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="(($index - 1) mod 9) + 1"/>
</xsl:function>

<xsl:function name="fn:getRow" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="subsequence($board, fn:getRowStart($index), 9)"/>
</xsl:function>

<xsl:function name="fn:getCol" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:variable name="gap" select="($index - 1) mod 9" as="xs:integer"/>
<xsl:variable name="colIndexes" select="for $x in $rowStarts return $x + $gap" as="xs:integer+"/>
<xsl:sequence select="for $x in $colIndexes return $board[$x]"/>
</xsl:function>

<xsl:function name="fn:getGroup" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:choose>
<xsl:when test="$index = $topLeftGroup">
<xsl:sequence select="for $x in $topLeftGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $topGroup">
<xsl:sequence select="for $x in $topGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $topRightGroup">
<xsl:sequence select="for $x in $topRightGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $midLeftGroup">
<xsl:sequence select="for $x in $midLeftGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $center">
<xsl:sequence select="for $x in $center return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $midRightGroup">
<xsl:sequence select="for $x in $midRightGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $bottomLeftGroup">
<xsl:sequence select="for $x in $bottomLeftGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $bottomGroup">
<xsl:sequence select="for $x in $bottomGroup return $board[$x]"/>
</xsl:when>
<xsl:when test="$index = $bottomRightGroup">
<xsl:sequence select="for $x in $bottomRightGroup return $board[$x]"/>
</xsl:when>
</xsl:choose>
</xsl:function>

<xsl:function name="fn:getOtherValuesInTriple" as="xs:integer+" saxon:memo-function="yes">
<xsl:param name="num" as="xs:integer"/>
<xsl:sequence select="(xs:integer(((($num -1) idiv 3) * 3) + 1) to xs:integer(((($num - 1) idiv 3) * 3) + 3))[. != $num]"/>
</xsl:function>

<xsl:function name="fn:getRowFriends" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="for $x in fn:getOtherValuesInTriple($index) return $board[$x]"/>
</xsl:function>

<xsl:function name="fn:getColFriends" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:sequence select="for $x in fn:getOtherValuesInTriple(fn:getRowNumber($index))
return
$board[(($x * 9) - 8) + (($index - 1) mod 9)]"/>
</xsl:function>

<xsl:function name="fn:reducePossibleValues" as="xs:integer*">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>
<xsl:param name="possibleValues" as="xs:integer+"/>

<xsl:variable name="rowFriends" select="fn:getRowFriends($board, $index)" as="xs:integer+"/>
<xsl:variable name="colFriends" select="fn:getColFriends($board, $index)" as="xs:integer+"/>

<xsl:variable name="otherRow1" select="fn:getRow($board, 9 * fn:getOtherValuesInTriple(fn:getRowNumber($index))[1])" as="xs:integer+"/>
<xsl:variable name="otherRow2" select="fn:getRow($board, 9 * fn:getOtherValuesInTriple(fn:getRowNumber($index))[2])" as="xs:integer+"/>

<xsl:variable name="otherCol1" select="fn:getCol($board, fn:getOtherValuesInTriple($index)[1])" as="xs:integer+"/>
<xsl:variable name="otherCol2" select="fn:getCol($board, fn:getOtherValuesInTriple($index)[2])" as="xs:integer+"/>

<xsl:variable name="reducedValues" as="xs:integer*">
<xsl:for-each select="$possibleValues">
<xsl:if test=". = $otherCol1 and . = $otherCol2">
<xsl:choose>
<xsl:when test="not($colFriends = 0)">
<xsl:sequence select="."/>
</xsl:when>
<xsl:when test="$colFriends != 0">
<xsl:choose>
<xsl:when test="$colFriends[1] = 0">
<xsl:if test=". = $otherRow1">
<xsl:sequence select="."/>
</xsl:if>
</xsl:when>
<xsl:when test="$colFriends[2] = 0">
<xsl:if test=". = $otherRow2">
<xsl:sequence select="."/>
</xsl:if>
</xsl:when>
</xsl:choose>
</xsl:when>
</xsl:choose>
</xsl:if>

<xsl:if test=". = $otherRow1 and . = $otherRow2">
<xsl:choose>
<xsl:when test="not($rowFriends = 0)">
<xsl:sequence select="."/>
</xsl:when>
<xsl:when test="$rowFriends != 0">
<xsl:choose>
<xsl:when test="$rowFriends[1] = 0">
<xsl:if test=". = $otherCol1">
<xsl:sequence select="."/>
</xsl:if>
</xsl:when>
<xsl:when test="$rowFriends[2] = 0">
<xsl:if test=". = $otherCol2">
<xsl:sequence select="."/>
</xsl:if>
</xsl:when>
</xsl:choose>
</xsl:when>
</xsl:choose>
</xsl:if>
</xsl:for-each>
</xsl:variable>

<xsl:sequence select="if (count($reducedValues) = 1) then $reducedValues else $possibleValues"/>

</xsl:function>

<xsl:function name="fn:getAllowedValues" as="xs:integer*">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="index" as="xs:integer+"/>

<xsl:variable name="possibleValues" select="(1 to 9)[not(. = (fn:getRow($board, $index), fn:getCol($board, $index), fn:getGroup($board, $index)))]" as="xs:integer*"/>

<xsl:choose>
<xsl:when test="count($possibleValues) > 1">
<xsl:variable name="reducedValues" select="fn:reducePossibleValues($board, $index, $possibleValues)" as="xs:integer*"/>

<xsl:sequence select="$reducedValues"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$possibleValues"/>
</xsl:otherwise>
</xsl:choose>

</xsl:function>

<xsl:function name="fn:tryValues" as="xs:integer*">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="emptyCells" as="xs:integer+"/>
<xsl:param name="possibleValues" as="xs:integer+"/>

<xsl:variable name="index" select="$emptyCells[1]" as="xs:integer"/>

<xsl:variable name="newBoard" select="(subsequence($board, 1, $index - 1), $possibleValues[1], subsequence($board, $index + 1))" as="xs:integer+"/>

<xsl:if test="$verbose">
<xsl:message>
<xsl:value-of select="concat('Trying ', $possibleValues[1], ' out of a possible ', string-join(for $i in $possibleValues return xs:string($i), ' '), ' at index ', $index)"/>
</xsl:message>
</xsl:if>

<xsl:variable name="result" select="fn:populateValues($newBoard, subsequence($emptyCells, 2))" as="xs:integer*"/>

<xsl:sequence select="if (empty($result)) then
if (count($possibleValues) > 1) then
fn:tryValues($board, $emptyCells, subsequence($possibleValues, 2))
else
()
else
$result"/>
</xsl:function>

<xsl:function name="fn:populateValues" as="xs:integer*">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="emptyCells" as="xs:integer*"/>

<xsl:choose>
<xsl:when test="empty($emptyCells)">
<xsl:message>Done!</xsl:message>
<xsl:sequence select="$board"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="index" select="$emptyCells[1]" as="xs:integer"/>
<xsl:variable name="possibleValues" select="fn:getAllowedValues($board, $index)" as="xs:integer*"/>
<xsl:choose>
<xsl:when test="count($possibleValues) = 0">
<xsl:if test="$verbose">
<xsl:message>! Cannot go any further !</xsl:message>
</xsl:if>
<xsl:sequence select="()"/>
</xsl:when>
<xsl:when test="count($possibleValues) > 1">
<xsl:sequence select="fn:tryValues($board, $emptyCells, $possibleValues)"/>
</xsl:when>
<xsl:when test="count($possibleValues) = 1">
<xsl:variable name="newBoard" select="(subsequence($board, 1, $index - 1), $possibleValues[1], subsequence($board, $index + 1))" as="xs:integer+"/>

<xsl:if test="$verbose">
<xsl:message>
<xsl:value-of>
<xsl:text>Only one value </xsl:text>
<xsl:value-of select="$possibleValues[1]"/>
<xsl:text> for index </xsl:text>
<xsl:value-of select="$index"/>
</xsl:value-of>
</xsl:message>
</xsl:if>

<xsl:sequence select="fn:populateValues($newBoard, subsequence($emptyCells, 2))"/>
</xsl:when>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:function>

<xsl:function name="fn:populateSingleValueCells" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>
<xsl:param name="emptyCellsXML" as="element()*"/>

<xsl:for-each select="1 to 81">
<xsl:variable name="pos" select="." as="xs:integer"/>
<xsl:choose>
<xsl:when test="$emptyCellsXML[@index = $pos]/@possibleValues = 1">
<xsl:sequence select="$emptyCellsXML[@index = $pos]/@values"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$board[$pos]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>

</xsl:function>

<xsl:function name="fn:populateSingleValueCells" as="xs:integer+">
<xsl:param name="board" as="xs:integer+"/>

<xsl:variable name="emptyCells" select="for $x in (1 to 81) return $x[$board[$x] = 0]" as="xs:integer*"/>

<xsl:variable name="emptyCellsXML" as="element()*">
<xsl:for-each select="$emptyCells">
<xsl:variable name="allowedValues" select="fn:getAllowedValues($board, .)" as="xs:integer*"/>
<cell index="{.}" possibleValues="{count($allowedValues)}" values="{$allowedValues}"/>
</xsl:for-each>
</xsl:variable>

<xsl:choose>
<xsl:when test="$emptyCellsXML/@possibleValues = 1">
<xsl:variable name="newBoard" select="fn:populateSingleValueCells($board, $emptyCellsXML)" as="xs:integer+"/>
<xsl:sequence select="fn:populateSingleValueCells($newBoard)"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$board"/>
</xsl:otherwise>
</xsl:choose>

</xsl:function>

<xsl:function name="fn:solveSudoku" as="xs:integer+">
<xsl:param name="startBoard" as="xs:integer+"/>

<xsl:variable name="board" select="fn:populateSingleValueCells($startBoard)" as="xs:integer+"/>

<xsl:variable name="emptyCells" select="for $x in (1 to 81) return $x[$board[$x] = 0]" as="xs:integer*"/>

<xsl:variable name="emptyCellsXML" as="element()*">
<xsl:for-each select="$emptyCells">
<xsl:variable name="allowedValues" select="fn:getAllowedValues($board, .)" as="xs:integer*"/>
<cell index="{.}" possibleValues="{count($allowedValues)}" values="{$allowedValues}"/>
</xsl:for-each>
</xsl:variable>

<xsl:variable name="emptyCellsOrdered" as="xs:integer*">
<xsl:for-each select="$emptyCellsXML">
<xsl:sort select="@possibleValues" data-type="number" order="ascending"/>
<xsl:sequence select="@index"/>
</xsl:for-each>
</xsl:variable>

<xsl:variable name="endBoard" select="fn:populateValues($board, $emptyCells)" as="xs:integer*"/>

<xsl:choose>
<xsl:when test="empty($endBoard)">
<xsl:message>! Invalid board - The starting board is not correct</xsl:message>
<xsl:sequence select="$startBoard"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$endBoard"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>

<xsl:template match="/tt" name="maintt">
<div>
<xsl:variable name="emptyCellsXML" as="element()*">
<xsl:for-each select="for $x in (1 to 81) return $x[$board[$x] = 0]">
<cell index="{.}" possibleValues="{count(fn:getAllowedValues($board, .))}" values="{fn:getAllowedValues($board, .)}"/>
</xsl:for-each>
</xsl:variable>

<xsl:for-each select="$emptyCellsXML">
<xsl:sort select="@possibleValues" data-type="number" order="ascending"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</div>
</xsl:template>

<xsl:template match="/" name="main">
<xsl:call-template name="drawResult">
<xsl:with-param name="board" select="fn:solveSudoku($board)"/>
</xsl:call-template>
</xsl:template>

<xsl:template name="drawResult">
<xsl:param name="board" as="xs:integer+"/>
<html>
<head>
<title>Sudoku - XSLT</title>
<style>
table { border-collapse: collapse;
border: 1px solid black; }
td { padding: 10px; }
.norm { border-left: 1px solid #CCC;
border-top: 1px solid #CCC;}
.csep { border-left: 1px solid black; }
.rsep { border-top: 1px solid black; }
</style>
</head>
<body>
<xsl:call-template name="drawBoard">
<xsl:with-param name="board" select="$board"/>
</xsl:call-template>
</body>
</html>
</xsl:template>

<xsl:template name="drawBoard">
<xsl:param name="board" as="xs:integer+"/>
<table>
<xsl:for-each select="1 to 9">
<xsl:variable name="i" select="."/>
<tr>
<xsl:for-each select="1 to 9">
<xsl:variable name="pos" select="(($i - 1) * 9) + ."/>
<td class="{if (position() mod 3 = 1) then 'csep' else ('norm')} {if ($i mod 3 = 1) then 'rsep' else ('norm')}">
<xsl:value-of select="$board[$pos]"/>
</td>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
</xsl:template>

<xsl:template name="drawBasicBoard">
<xsl:param name="board" as="xs:integer+"/>
<xsl:for-each select="1 to 9">
<xsl:variable name="i" select="."/>

<xsl:for-each select="1 to 9">
<xsl:variable name="pos" select="(($i - 1) * 9) + ."/>
<xsl:value-of select="$board[$pos]"/>
<xsl:value-of select="if ($pos mod 3 = 0) then ', ' else ', '"/>
</xsl:for-each>

<xsl:value-of select="if ($i mod 3 = 0) then ' ' else ' '"/>
</xsl:for-each>
</xsl:template>

<!-- Easy board, 32 existing numbers -->
<xsl:variable name="testBoard1" select="(
0,2,0, 0,0,0, 0,3,6,
0,0,7, 4,0,0, 0,9,0,
0,0,5, 6,0,0, 0,4,8,

0,0,0, 9,3,0, 0,1,2,
2,9,0, 0,0,0, 0,7,5,
1,5,0, 0,8,2, 0,0,0,

6,7,0, 0,0,9, 1,0,0,
0,3,0, 0,0,7, 6,0,0,
4,8,0, 0,0,0, 0,2,0
)" as="xs:integer+"/>

<!-- Hard board, 24 existing numbers -->
<xsl:variable name="testBoard2" select="(
1,0,0, 5,6,0, 0,0,0,
9,0,0, 0,0,0, 2,0,8,
0,0,0, 0,0,0, 7,0,0,

0,8,0, 9,0,7, 0,0,2,
2,0,0, 0,0,0, 0,0,1,
6,0,0, 3,0,2, 0,4,0,

0,0,5, 0,0,0, 0,0,0,
4,0,3, 0,0,0, 0,0,9,
0,0,0, 0,4,1, 0,0,6
)" as="xs:integer+"/>

<!-- Extremely Hard board -->
<xsl:variable name="testBoard3" select="(
4,0,6, 7,8,0, 2,0,0,
0,5,7, 0,0,0, 0,0,0,
8,0,0, 0,0,0, 4,0,0,

0,0,0, 0,5,0, 0,9,0,
9,1,2, 0,0,0, 0,0,0,
0,4,0, 0,1,0, 0,0,3,

0,0,4, 3,0,0, 0,2,0,
7,0,3, 0,0,6, 0,0,0,
0,0,0, 5,0,7, 6,0,8
)" as="xs:integer+"/>

<!-- Fiendish board -->
<xsl:variable name="testBoard4" select="(
0,0,0, 0,0,5, 0,0,0,
0,0,0, 0,2,0, 9,0,0,
0,8,4, 9,0,0, 7,0,0,

2,0,0, 0,9,0, 4,0,0,
0,3,0, 6,0,2, 0,8,0,
0,0,7, 0,3,0, 0,0,6,

0,0,2, 0,0,9, 8,1,0,
0,0,6, 0,4,0, 0,0,0,
0,0,0, 5,0,0, 0,0,0
)" as="xs:integer+"/>

<!-- Fiendish 2 board -->
<xsl:variable name="testBoard5" select="(
0,0,0, 0,0,5, 1,0,0,
0,3,5, 0,0,0, 0,4,0,
8,0,0, 4,0,0, 0,2,0,

9,0,0, 0,3,0, 5,0,0,
0,0,0, 2,0,8, 0,0,0,
0,0,7, 0,9,0, 0,0,8,

0,5,0, 0,0,9, 0,0,2,
0,4,0, 0,0,0, 9,8,0,
0,0,1, 7,0,0, 0,0,0
)" as="xs:integer+"/>

<xsl:variable name="testBoard_Fiendish_060323" select="(
5,3,0, 0,9,0, 0,0,6,
0,0,0, 1,5,0, 0,0,3,
0,0,2, 0,0,0, 8,0,0,

0,0,0, 0,0,0, 0,8,0,
1,7,0, 0,4,0, 0,2,9,
0,9,0, 0,0,0, 0,0,0,

0,0,6, 0,0,0, 3,0,0,
4,0,0, 0,1,2, 0,0,0,
3,0,0, 0,6,0, 0,5,4
)" as="xs:integer+"/>

<!-- Fiendish 060403-->
<xsl:variable name="testBoard38" select="(
5,0,0, 0,0,1, 0,0,2,
0,6,0, 0,7,0, 0,0,0,
0,0,0, 6,0,0, 8,0,0,

0,0,3, 0,0,5, 0,0,8,
0,9,0, 0,1,0, 0,2,0,
2,0,0, 3,0,0, 7,0,0,

0,0,1, 0,0,2, 0,0,0,
0,0,0, 0,6,0, 0,9,0,
8,0,0, 7,0,0, 0,0,5
)" as="xs:integer+"/>

<!-- Failing board, an erroneous 9 has been added at index 1 -->
<xsl:variable name="testBoard_Fail" select="(
9,2,0, 0,0,0, 0,3,6,
0,0,7, 4,0,0, 0,9,0,
0,0,5, 6,0,0, 0,4,8,

0,0,0, 9,3,0, 0,1,2,
2,9,0, 0,0,0, 0,7,5,
1,5,0, 0,8,2, 0,0,0,

6,7,0, 0,0,9, 1,0,0,
0,3,0, 0,0,7, 6,0,0,
4,8,0, 0,0,0, 0,2,0
)" as="xs:integer+"/>

</xsl:stylesheet>

Monday, March 27, 2006

VTD-XML

Ian Jones pointed me to an article on javaworld about VTD-XML, which is apparently "A new option that overcomes the problems of DOM and SAX".

It all sounds promising, although the article is light on code examples and heavy on comparisons, so I'll reserve judgement until I've used it. It does seem to concentrate on speed without mentioning conformity, which is not a good sign.

Friday, March 24, 2006

Well when I said "blisteringly fast" I should have said "very fast, but give it a week or two and we'll get faster"...

I've since found a way of improving the performance of my solution... and I think Dimitre has improved his. Watch this space, because neither is quite ready yet. I'll stick my neck out and say once it's done, mine should solve any board pretty much instantly. (I'm going to regret saying that, because mine probably won't but Dimitre's will...)