Friday, April 25, 2008

The Scrabble Reference

The Scrabble Reference is an ebook I've created which allows Scrabble players to easily check if words are legal, to suggest longer words or sub-anagrams given a word and to show what words can be made given some letters (up to 15 letters).

There are two versions, TWL and SOWPODS (TWL is used in the USA, Canada, Thailand and Israel and SOWPODS in the rest of the world)

The ebook is in the Mobipocket format which deals with running the ebooks on all devices - PDAs, mobile phones, Blackberries etc so you just need to transform the input to suitable Mobipocket markup, compile the ebook and their client does the rest.

I must admit to not being interested in Scrabble, but of course I am interested in XSLT, and given the list of words allowed in Scrabble I thought I should turn them into a product - one that you can run on your phone seems perfect.

Generating the ebook was very straightforward - list of strings in, markup out... XSLT 2.0 is ideal for the task.

Relative paths and the document() function

A nice gotcha cropped up today on xsl-list...

Relative paths passed to the document() function are resolved against either the XML or the stylesheet depending on what is passed in: a node from the XML will mean the path is resolved against the XML, a string will mean it's resolved against the stylesheet.

The gotcha is this - if you modify this:

document(@path)

to this:

<xsl:variable name="path" select="@path" as="xs:string"/>
...
document($path)


and @path contains a relative path, then you could get a document not found error, or worse if your XML and XSLT are in the same directory, you won't notice...

Friday, February 29, 2008

XML Schema - element with text and attributes

For some reason I always forget how to define an element that contains only text but also has attributes. Perhaps it's because it's so verbose, or so non-intuitive for something so simple, who knows. Either way it's something that needs to be committed to memory...

So the element:


<foo bar="bar" baz="baz"/>

is described using:

<xs:complexType name="foo">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="bar" type="xs:string"/>
<xs:attribute name="baz" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

nice!

Wednesday, February 27, 2008

schema-aware.com

I've created a new website schema-aware.com which is inteded to contain lots of examples of schema-aware XSLT and XQuery. I've started it off with half a dozen or so and hope to add more as time goes on.

I also intend to add a few articles about schema-aware transforms - how the run them from the command line, from Java, the various flags involved, how to write schemas to allow you to use the types in your XSLT etc... My intentions are good, we'll have to see how much I actually do.

Thursday, January 31, 2008

Kernow 1.6 beta

I've uploaded a new version of Kernow (1.6) which contains the rather nice "XSLT Sandbox" tab. This tab has the XML pane on the left, the XSLT pane on the right and a transform button... and that's it. It's intended for anyone who wants to quickly try something out without the hassle of files, the command line or starting up a proper IDE. It does error checking as you type and highlights any problems.

It's available as the usual download from Sourceforge, or through Java Web Start. If you already run the JWS version it should automatically update itself (any problems just re-install it). I've finally figured out the temperamental errors with the JWS version - it turns out the ant jars included with Kernow were already signed by a previous version and so weren't being signed again, but because they were marked as "lazy" in the jnlp the JWS version would start anyway. (You can tell if a jar has been signed by looking for *.SF and *.DSA in the META-INF directory.)

The other improvement I'm pleased to have sorted out is that kernow.config (where all of the settings and combobox history are saved) is now stored in a directory called .kernow in your user.home (which is one up from My Documents in XP). Previously it would've been stored on the deskop for the JWS version which is really annoying - sorry about that. As usual it was a 10 minute job, but just took a while to get around to.

I've also separated out the SOAP and eXists extension functions into a separate package, so there's no longer the need for the largish eXist.jar, xmldb.jar and log4j.jar jars to be part of the download.

I'll release a non-beta version in a few weeks if no bugs are reported, and I've got around to updating all of the documentation.

Parsing XML into Java 5 Enums

Often when parsing XML into pojos I just resort to writing my own SAX based parser. It can be long winded but I think gives you the greatest flexibility and control over how you get from the XML to objects you can process.

One example is with Java 5's Enums, which are great. Given the kind of XML fragment where currencies are represented using internal codes:

<currency refid="001"/> <!-- 001 is Sterling -->
<currency refid="002"/> <!-- 002 is Euros -->
<currency refid="003"/> <!-- 003 is United States Dollars -->


You can represent each currency element with an Enum, which contains extra fields for the additional information:

public enum Currency {

GBP ("001", "GBP", "Sterling"),
USD ("002", "EUR", "Euros"),
USD ("003", "USD", "United States Dollar");

private final String refId;
private final String code;
private final String desc;

Currency(String refId, String code, String desc) {
this.refId = refId;
this.code = code;
this.desc = desc;
}

public String refId() {
return refId;
}

public String code() {
return code;
}

public String desc() {
return desc;
}

// Returns the enum based on it's property rather than its name
// (This loop could possibly be replaced with a static map, but be aware
// that static member variables are initialized *after* the enum and therefore
// aren't available to the constructor, so you'd need a static block.
public static Currency getTypeByRefId(String refId) {
for (Currency type : Currency.values()) {
if (type.refId().equals(refId)) {
return type;
}
}

throw new IllegalArgumentException("Don't have enum for: " + refId);
}
}


Notice how each enum calls its own contructor with the 3 parameters - the refId, the code, and the description.

You parse the XML into the enum by calling Currency.getTypeByRefId(String refId) passing in the @refid from the XML. The benefit of using the Enum is that you can then do things like:

if (currency.equals(Currency.GBP))

which is nice and clear, while at the same time being able to call currency.refId() and currency.desc() to get to the other values.

The drawback is that because static member variables are initialized after the enum, you can't create a HashMap and fill it for a faster lookup later (unless you use a static block). Instead you have to loop through all known values() for the enum given a refId. Although it feels wrong to loop, the worst case is only the size the of enum so I don't think it's too bad.

Tuesday, January 22, 2008

Portability of a stylesheet across schema-aware and non-schema-aware processors

I came across this today, which I thought was really cool and worth a post. It basically allows you to code a transform that is only schema-aware if a schema-aware processor is running it, otherwise it's just a standard transform.

In this case I want to do input and output validation, so first I sort out the schemas:

<xsl:import-schema schema-location="input.xsd"
    namespace="http://www.foo.com"
    use-when="system-property('xsl:is-schema-aware')='yes'"/>

<xsl:import-schema schema-location="output.xsd"
    use-when="system-property('xsl:is-schema-aware')='yes'"/>

Note the use-when...

Next define two root matching templates, one for schema-aware, one for basic:

<xsl:template match="/"
    use-when="system-property('xsl:is-schema-aware')='yes'"
    priority="2">
    
    <xsl:variable name="input" as="document-node()">
        <xsl:document validation="strict">
            <xsl:copy-of select="/"/>
        </xsl:document>
    </xsl:variable>
    
    <xsl:result-document validation="strict">
        <xsl:apply-templates select="$input/the-root-elem"/>
    </xsl:result-document>
    
</xsl:template>

<xsl:template match="/">
    <xsl:apply-templates select="the-root-elem"/>
</xsl:template>    
    
<xsl:template match="the-root-elem">
    ...
</xsl:template>

The root matching template for schema-aware processing uses xsl:document to validate the input, and xsl:result-document to validate the output. Validation can also be controlled from outside the transform, but this way forces it on.

I think this is great :)

The indentity transform for XSLT 2.0

I was looking at the standard identity transform the other day and realised that for nodes other than elements, the call to apply-templates is redundant.

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

Also, although it might be intuitive to think that attributes have separate nodes for their name and value, they are in fact a single node that's copied in it's entirety by xsl:copy.

I raised this on xsl-list and suggested seperating out the attribute into a template of its own with just xsl:copy for its body:

<xsl:template match="node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="@*">
  <xsl:copy/>
</xsl:template>


Mike Kay suggested a more logical version would be:

<xsl:template match="element()">
  <xsl:copy>
    <xsl:apply-templates select="@*,node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="attribute()|text()|comment()|processing-instruction()">
  <xsl:copy/>
</xsl:template>

This turned out to be ideal for three reasons:

- the comma between @* and node() will mean the selected nodes will be processed in that order, removing the sorting and deduplication that takes place with union |
- apply-templates is only called when it will have an effect
- it's clearer that attributes are leaf nodes

So there it is... the identity transform for XSLT 2.0

Friday, September 28, 2007

Kernow 1.5.2

I've just uploaded the non-beta version of Kernow 1.5.2.

This version contains:

- French and German translations
- XQuery syntax highlighting and checking as-you-type
- Improved cancelling of Single File and Standalone tasks
- icon and splash screen
- An exe to launch it (for windows users)
- context menus
- comboboxes remember their selected index
- individual combobox entries can be removed by deleting the entry
- other small fixes

Wednesday, September 12, 2007

Connecting to Oracle from XSLT

Today I generated a report by connecting directly to an Oracle database from XSLT, and thought I'd share the basic stylesheet. I used Saxon's SQL extension, which is available when saxon8-sql.jar is on the classpath. As I was connecting to Oracle, I also needed to put ojdcb14.jar on the classpath.

Here's the stylesheet in it's most basic form, formatted for display in this blog.

The important things to note here are:

- The sql prefix is bound to "/net.sf.saxon.sql.SQLElementFactory"
- The driver is "oracle.jdbc.driver.OracleDriver"
- The connection string format is "jdbc:oracle:thin:@1.2.3.4:1234:sid" (note the colon between thin and @ - I missed that first time round) where the IP, port and sid are placeholders for the real values
- remember that saxon8-sql.jar and ojdbc14.jar needed to be on the classpath


<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="/net.sf.saxon.sql.SQLElementFactory"
exclude-result-prefixes="xs"
extension-element-prefixes="sql">

<xsl:output indent="yes"/>

<xsl:param name="driver"
select="'oracle.jdbc.driver.OracleDriver'"
as="xs:string"/>

<xsl:param name="database"
select="'jdbc:oracle:thin:@123.123.123.123:1234:sid'"
as="xs:string"/>

<xsl:param name="user" select="'un'" as="xs:string"/>
<xsl:param name="password" select="'pw'" as="xs:string"/>

<xsl:variable name="connection"
as="java:java.sql.Connection"
xmlns:java="http://saxon.sf.net/java-type">

<sql:connect driver="{$driver}" database="{$database}"
user="{$user}" password="{$password}"/>
</xsl:variable>

<xsl:template match="/" name="main">
<root>
<sql:query connection="$connection"
table="some_table"
column="*"
row-tag="row"
column-tag="col"/>
</root>
</xsl:template>

</xsl:stylesheet>

The result of this transform outputs XML in the form:

<root>
<row>
<col>data1</col>
<col>data2</col>
<col>data3</col>
<col>data4</col>
</row>
....
</root>

where <root> is the wrapper element, and <row> and <col> are the element names specified in the <sql:query> element.

And that's it - connecting to an Oracle database from within XSLT.