Friday, September 28, 2007

Kernow 1.5.2

I've just uploaded the non-beta version of Kernow 1.5.2.

This version contains:

- French and German translations
- XQuery syntax highlighting and checking as-you-type
- Improved cancelling of Single File and Standalone tasks
- icon and splash screen
- An exe to launch it (for windows users)
- context menus
- comboboxes remember their selected index
- individual combobox entries can be removed by deleting the entry
- other small fixes

Wednesday, September 12, 2007

Connecting to Oracle from XSLT

Today I generated a report by connecting directly to an Oracle database from XSLT, and thought I'd share the basic stylesheet. I used Saxon's SQL extension, which is available when saxon8-sql.jar is on the classpath. As I was connecting to Oracle, I also needed to put ojdcb14.jar on the classpath.

Here's the stylesheet in it's most basic form, formatted for display in this blog.

The important things to note here are:

- The sql prefix is bound to "/net.sf.saxon.sql.SQLElementFactory"
- The driver is "oracle.jdbc.driver.OracleDriver"
- The connection string format is "jdbc:oracle:thin:@1.2.3.4:1234:sid" (note the colon between thin and @ - I missed that first time round) where the IP, port and sid are placeholders for the real values
- remember that saxon8-sql.jar and ojdbc14.jar needed to be on the classpath

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="/net.sf.saxon.sql.SQLElementFactory"
exclude-result-prefixes="xs"
extension-element-prefixes="sql">

<xsl:output indent="yes"/>

<xsl:param name="driver"
select="'oracle.jdbc.driver.OracleDriver'"
as="xs:string"/>

<xsl:param name="database"
select="'jdbc:oracle:thin:@123.123.123.123:1234:sid'"
as="xs:string"/>

<xsl:param name="user" select="'un'" as="xs:string"/>
<xsl:param name="password" select="'pw'" as="xs:string"/>

<xsl:variable name="connection"
as="java:java.sql.Connection"
xmlns:java="http://saxon.sf.net/java-type">

<sql:connect driver="{$driver}" database="{$database}"
user="{$user}" password="{$password}"/>
</xsl:variable>

<xsl:template match="/" name="main">
<root>
<sql:query connection="$connection"
table="some_table"
column="*"
row-tag="row"
column-tag="col"/>
</root>
</xsl:template>

</xsl:stylesheet>

The result of this transform outputs XML in the form:

<root>
<row>
<col>data1</col>
<col>data2</col>
<col>data3</col>
<col>data4</col>
</row>
....
</root>

where <root> is the wrapper element, and <row> and <col> are the element names specified in the <sql:query> element.

And that's it - connecting to an Oracle database from within XSLT.

Monday, September 03, 2007

Kernow 1.5.2 beta b2 available

I've just uploaded a new version of Kernow. This one was pretty much already available via Java Web Start, this makes it available via the normal download route.

New features/fixes:

- Added syntax highlighting and checking as-you-type to the XQuery Sandbox tab. Syntax highlighting's provided using Bounce's XMLEditorKit - I'm hoping to use Netbeans' nbEditorKit in a future version which will add line numbers, code completion etc. I've put together the checking-as-you-type and error highlighting using Saxon's error reporting. This is really cool, so I'm planning on doing an equivalent "XSLT Sandbox" soon... perhaps using Netbeans RCP. Not sure yet.

- Added an icon and splashsceen. These came about because JWS and the exe benefit from them. Are they any good? I'm not really a graphics person...

- Kernow.jar is now a proper executable jar, so you can double click it to run Kernow (if you're on a mac for example)

- It's all compiled using Java 1.5, again for mac users where 1.6 isn't supported yet.

It's available here: Kernow

Thursday, August 30, 2007

Kernow now available via Java Web Start

I've been playing around with making Kernow available through Java Web Start. This should be the ideal way to run Kernow as it places a shortcut on your desktop (and in your start menu in Windows) and auto-updates whenever a new version is available.

Reading around it seems Java Web Start has had mixed reviews. Personally I really like it, perhaps because I'm using Java 1.6 and Netbeans 6 M10 which makes it all pretty straightforward (auto jar-signing is really helpful in M10).

Give it a go, let me know what you think: Kernow - Java Web Start

Friday, August 17, 2007

Using XQuery and the slash operator to quickly try out XPaths

This is the coolest thing I've seen in a while...

In XQuery you can constuct a node just by writing it, eg:

<node>I'm a node</node>

and then you can use slash operator to apply an XPath to that node:

<node>I'm a node</node>/data(.)

returns "I'm a node"

The XML doesn't have to be limited to a single node - you can do:

<root>
  <node>foo</node>
  <node>bar</node>
</root>/node/data(.)

...to get "foo bar".

Or:

<root>
  <node>foo</node>
  <node>bar</node>
</root>/node[1]

to get:

<node>foo</node>


Using this technique in combination with Kernow's XQuery Sandbox makes it straightforward to paste in some XML and start trying out some XPaths.

Thursday, August 16, 2007

When a = b and a != b both return true...

In XPath = and != are set operators. That is, they return true if any item on the left hand side returns true when compared with any item on the right hand side. Or in other words:

some x in $seqA, y in $seqB satisfies x op y

...where "op" is = or != (or > or < etc)

To demonstrate this take the two sets ('a', 'b') and ('b', 'c'):

$seqA = $seqB returns true because both sets contains 'b'

$seqA != $seqB returns true because setA contains 'a' which is not equal to 'c' in setB

This catches me out a lot, even though I've been caught out before several times. I really have to think hard about what it is exactly that I'm comparing, and still end up getting it wrong.

A simple rules to follow is "never use != where both sides are sequences of more than one item". 99.9% of the time you won't need to, as much as it feels like the right thing to do.

Below are some of the most common operations on sequences, put together for a reference.

The two sequences are ('a', 'b') and ('b', 'c'), which can be defined in XSLT as:

<xsl:variable name="seqA" select="('a', 'b')" as="xs:string+"/>
<xsl:variable name="seqB" select="('b', 'c')" as="xs:string+"/>

or in XQuery as:

let $seqA := ('a', 'b')
let $seqB := ('b', 'c')



Select all items in both sequences

($seqA, $seqB)

Result: a b b c



Select all items in both sequences, eliminating duplicates

distinct-values(($seqA, $seqB))

Result: a b c



Select all items that occur in $seq1 but not $seq2

$seqA[not(. = $seqB)]

Result: a



Select all items that occur in both sequences

$seqA[. = $seqB]

Result: b



Select all items that do not occur in both sequences

($seqA[not(. = $seqB)],$seqB[not(. = $seqA)])

or

($seqA, $seqB)[not(. = $seqA[. = $seqB])]


Result: a c



Determine if both sequences are identical

deep-equal($seqA, $seqB)

Result: false



Test if all items in the sequence are different

count(distinct-values($seqA)) eq count($seqA)

Result: true


Wednesday, August 15, 2007

The Worlds Fastest Sudoku Solution in XSLT 2.0 for the Worlds Hardest Sudoku - Al Escargot


I get a lot of traffic to this blog because of my Sudoku solver. Google analytics tells me most of it lands on the original version that I wrote, and not the optimised version that's now the worlds fastest XSLT 2.0 solution - an issue which this post should hopefully rectify.

The puzzle on the right is apparently the worlds hardest Sudoku puzzle. If I run my solver "Sudoku.xslt" using Kernow's performance testing feature I get this result:

Ran 5 times
Run 1: 328 ms
Run 2: 344 ms
Run 3: 328 ms
Run 4: 391 ms
Run 5: 328 ms
Ignoring first 2 times
Total Time (last 3 runs): 1 second 47 ms
Average Time (last 3 runs): 349 ms


So on my machine the average execution time is 349ms, which is pretty good considering the original version would take minutes for several puzzles. As far as I know this version will solve all puzzles in under a second on my machine (Core 2 duo E6600, 2gb).

How does it do it? This is taken from the web page where it's hosted:

It accepts the puzzle as 81 comma separated integers in the range 0 to 9, with zero representing empty. It works by continuously reducing the number of possible values for each cell, and only when the possible values can't be reduced any further it starts backtracking.

The first phase attempts to populate as many cells of the board based on the start values. For each empty cell it works out the possible values using the "Naked Single", "Hidden Single" and "Naked Tuples" techniques in that order (read here for more on the techniques). Cells where only one possible value exists are populated and then the second phase begins.

The second phase follows this process:

* Find all empty cells and get all the possible values for each cell (using Naked Single and Hidden Single techniques)
* Sort the cells by least possible values first
* Populate the cells with only one possible value
* If more there's more than one value, go through them one by one
* Repeat

This is how it solves the Al Esgargot: A slightly modified version of the solution gives this output with the $verbose parameter set to true. As you can see it's found that it can insert a 1 at position 66 using the static analysis of the puzzle (position 66 is middle-right of the bottom-left group). Next it's decided that there are two possible values 4 and 6 at index 12 (middle-right cell of top-left group), so it tries 4 and continues. With that 4 in place it's found that there's only one possible value at index 39, a 3, so it inserts that and continues. It will keep reducing the possible values based on the current state of the board, inserting the only possible values or trying each one when there are many, until either there are no possible values for an empty cell, or the puzzle is solved.

(the solution is shown below)

Populated single value cell at index 66 with 1
Trying 4 out of a possible 4 6 at index 12
Only one value 3 for index 39
Trying 5 out of a possible 5 7 at index 10
Trying 1 out of a possible 1 9 at index 13
Only one value 9 for index 15
Trying 6 out of a possible 6 7 at index 16
Only one value 7 for index 17
Trying 2 out of a possible 2 4 at index 7
Trying 6 out of a possible 6 8 at index 2
Only one value 8 for index 3
Only one value 2 for index 48
Only one value 6 for index 57
Only one value 5 for index 60
Only one value 9 for index 63
Only one value 5 for index 74
Only one value 1 for index 78
Only one value 3 for index 69
Only one value 5 for index 71
Only one value 6 for index 81
Only one value 4 for index 36
! Cannot go any further !
Trying 8 out of a possible 8 at index 2
Only one value 6 for index 3
Trying 4 out of a possible 4 5 at index 4
Only one value 3 for index 9
Only one value 3 for index 23
Only one value 4 for index 26
Only one value 3 for index 53
Only one value 5 for index 54
Only one value 4 for index 36
Only one value 6 for index 44
Only one value 8 for index 48
Only one value 2 for index 57
Only one value 8 for index 73
Only one value 9 for index 47
Only one value 7 for index 50
Only one value 6 for index 60
Only one value 9 for index 64
! Cannot go any further !
Trying 5 out of a possible 5 at index 4
Only one value 9 for index 47
Only one value 2 for index 67
Only one value 7 for index 49
Only one value 9 for index 64
Only one value 2 for index 80
Only one value 1 for index 32
Only one value 2 for index 48
Only one value 5 for index 50
Only one value 8 for index 58
Only one value 8 for index 73
! Cannot go any further !
Trying 4 out of a possible 4 at index 7
Only one value 3 for index 9
Only one value 4 for index 23
Only one value 3 for index 24
Only one value 2 for index 26
Only one value 3 for index 53
Only one value 5 for index 54
Only one value 8 for index 4
Trying 2 out of a possible 2 6 at index 2
Only one value 6 for index 3
Trying 7 out of a possible 7 8 at index 19
Only one value 8 for index 20
Only one value 7 for index 29
Only one value 9 for index 40
Only one value 7 for index 50
Only one value 6 for index 44
Only one value 2 for index 49
Only one value 2 for index 28
Only one value 8 for index 48
Only one value 2 for index 57
Only one value 5 for index 67
Only one value 7 for index 58
Only one value 8 for index 61
Only one value 3 for index 68
Only one value 2 for index 70
! Cannot go any further !
Trying 8 out of a possible 8 at index 19
Only one value 7 for index 20
Only one value 8 for index 29
Only one value 9 for index 47
Only one value 2 for index 48
Only one value 8 for index 57
Only one value 4 for index 37
Only one value 9 for index 40
Only one value 7 for index 49
! Cannot go any further !
Trying 6 out of a possible 6 at index 2
Only one value 2 for index 3
Only one value 6 for index 57
Trying 7 out of a possible 7 8 at index 19
Only one value 8 for index 20
Trying 2 out of a possible 2 4 at index 28
Only one value 7 for index 29
Only one value 9 for index 47
Only one value 2 for index 49
Only one value 9 for index 40
Only one value 6 for index 44
Only one value 7 for index 50
Only one value 9 for index 59
! Cannot go any further !
Trying 4 out of a possible 4 at index 28
Only one value 9 for index 37
Only one value 4 for index 44
Only one value 6 for index 42
Trying 2 out of a possible 2 7 at index 29
Only one value 7 for index 47
Only one value 2 for index 49
Only one value 7 for index 32
Only one value 9 for index 50
! Cannot go any further !
Trying 7 out of a possible 7 at index 29
Only one value 1 for index 32
Only one value 2 for index 33
Only one value 2 for index 47
Trying 7 out of a possible 7 9 at index 49
Only one value 9 for index 50
Only one value 6 for index 77
Only one value 1 for index 78
Only one value 5 for index 80
Only one value 5 for index 56
Only one value 8 for index 60
Only one value 2 for index 61
Only one value 8 for index 70
Only one value 6 for index 71
Only one value 9 for index 74
Only one value 2 for index 64
Only one value 4 for index 81
Only one value 4 for index 58
Only one value 9 for index 67
Only one value 8 for index 73
Only one value 2 for index 76
Done!

1, 6, 2,   8, 5, 7,   4, 9, 3,
5, 3, 4,   1, 2, 9,   6, 7, 8,
7, 8, 9,   6, 4, 3,   5, 2, 1,


4, 7, 5,   3, 1, 2,   9, 8, 6,
9, 1, 3,   5, 8, 6,   7, 4, 2,
6, 2, 8,   7, 9, 4,   1, 3, 5,


3, 5, 6,   4, 7, 8,   2, 1, 9,
2, 4, 1,   9, 3, 5,   8, 6, 7,
8, 9, 7,   2, 6, 1,   3, 5, 4,


If you have a solution that can statically detect more cells to fill using different techniques than I have, or has a better strategy than simply backtracking when there's more than one value, then I'd be interested to know it works.

I'm pretty sure the XSLT is as good as it can be, but if you think it can be improved in any way then let me know.

Monday, July 23, 2007

Combining XSLT 2.0's Grouping with eXist

I work a lot with large XML datasets that are arranged as thousands of 1 - 10mb XML files. I spend most of my days writing transforms and transformation pipelines to process these files, which is where Kernow came from. I also like messing around with eXist (I'm yet to use it commercially, but I hope to one day) and enjoying the speed a native XML database gives you.

Requirements that regularly come up are to generate indexes and reports for the dataset. This is nice and simple using XSLT 2.0's grouping but require the whole dataset to be in memory, unless you use saxon:discard-document(). It can also be quite slow, if only because you have to read GB's from disk and parse the whole of each and every XML input file to just get the snippet that you're interested in (such as the title, or say all of the elements).

Conversely, XQuery doesn't suffer from the dataset size but lacks XSLT 2.0's grouping features. It's perfectly possible (although a bit involved - you could say "a bit XSLT 1.0") to recreate the grouping in XQuery, but it's just so much nicer in XSLT 2.0. So to get the best of both, you can use eXist's fanstastic REST style interface to select the parts of the XML you're interested in, and then use XSLT 2.0's for-each-group to arrange the results.

In the example stylesheet below I create an index by getting the <title> for each XML document, and then grouping the titles by their first letter, then sorting by title itself. I use eXist to get the <title> element, then XSLT 2.0 to do the sorting and grouping.

I have an instance of eXist running on my local machine and fully populated with the XML dataset. The function fn:eXist() takes the collection I'm interested in and the XQuery to execute against that collection, constructs the correct URI for the REST interface and calls doc() with that URI. The result is a proprietary XML format containing each tuple that I then group using xsl:for-each-group. It's worth noting the -1 value for the _howmany parameter on the query - without this it defaults to 10.


<xsl:stylesheet
xmlns:xsl
="http://www.w3.org/1999/XSL/Transform"
xmlns:fn
="fn"
xmlns:exist
="http://exist.sourceforge.net/NS/exist"
version="2.0">

<xsl:output indent="yes" />

<xsl:param name="db-uri" select="'http://localhost:8080/exist/rest'" />

<xsl:function name="fn:eXist">
    
<xsl:param name="collection" />
    
<xsl:param name="query" />
    
<xsl:sequence select="doc(concat($db-uri, $collection, '?_query=', $query, '&amp;_start=1&amp;_howmany=-1'))/exist:result/node()" />
</xsl:function>

<xsl:template match="/">
    
<div>
        
<xsl:for-each-group select="fn:eXist('/db/mycomp/myproject', '/doc/head/title')" group-by="substring(., 1, 1)">
            
<xsl:sort select="." />
            
<div>
                
<div><xsl:value-of select="current-grouping-key()" /></div>
                
<xsl:for-each select="current-group()">
                    
<xsl:sort select="." />
                    
<div><xsl:value-of select="." /></div>
                
</xsl:for-each>
            
</div>
        
</xsl:for-each-group>
    
</div>
</xsl:template>

</xsl:stylesheet>

It's as simple as that... what would normally take minutes takes seconds (once the database setup is done). If you haven't used eXist yet I highly recommend it.

This article is repeated here

Wednesday, July 11, 2007

CSV to XML transform updated

I've posted a new version of the CSV to XML transform. This version handles nested quotes correctly - the previous version would generate extra tokens either side of the quoted value.

Friday, June 29, 2007

Kernow 1.5.0.9 [beta] available

I've made a new beta version of Kernow available:

  • XSLT directory transforms are now multi-threaded, with the size of the thread pool configurable in the options. For systems with multi-core processors and enough available memory this can really improve directory transform time.
  • Single File / Standalone transforms now have a "Performance Testing" feature where the transforms are run repeatedly and the average time displayed.
  • The panels can now be resized for the XQuery Sandbox tab. The fixed size XQuery Sandbox made it pretty useless before - at least now you can see more than four lines!
  • Other minor bug fixes

There's loads still to do, but I'm trying to stick to the "release early, release often" policy.

Monday, June 18, 2007

UK Postcode Googlemaps mashup

Everyone else has done a mashup so I thought I should do one too. My mashup combines GoogleMaps with StreetMap.co.uk's GridConvert to get the nearest postcode for a given longitude and latitude. I had to use some server side PHP to get around cross domain scripting issues and to scrape the postcode from result HTML, but other than that it's all through the GoogleMaps API.

Friday, June 15, 2007

Sukoku Solver - much improved.

I've just uploaded the latest version of my Sudoku Solver.

I wrote the initial version of this in March 2006, and have since been in competition with Dimitre Novatchev to have the fastest solution. Dimitre's held that title for while, but with this version I think I may have won it back.

The additions here are Naked Tuple discovery in the first phase, and then repeatedly re-ordering the cells by least-number-of-possible-values as each cell is populated in the second phase. Sounds obvious but its something I missed first time around.

On my machine this latest version solves the vast majority of puzzles (including AI Escargot) in under a second, with the occasional one taking just under two seconds. The previous version would take up to a minute on some puzzles, so I'm pleased with improvement.

Hopefully Dimitre will do some comparisons using his latest version, and then tell me the good news :)

My new homepage

I'd like to draw attention to my new homepage: andrewjwelch.com. I've created this as a better place to store the code samples that were posted in this blog, and to link to my open source projects.

I'll try to keep news and opinions to this blog, and code samples on the homepage.

Friday, June 08, 2007

A long drawn out redundancy...

I've been informed that I will be made redundant... in October! That's a long 4 1/2 months away. In the mean time I'm meant to be performing maintenance and handover tasks (4 months of handover ?!?!) while the redundancy + retention carrot is tentatively dangled to keep us here until then...

So if you can see that far ahead and need a Java/XML/XSLT bod to fill a high paying contract, let me know :)

Friday, March 30, 2007

Er, the real "hardest Sudoku"

Following the comment from Malcom below, I've run my Sudoku solver against the real "AL Escargot"...

Using Saxon 8.9.0.3b from the command line with the -3 option (to discount java startup time) the average time across three runs is 2213ms... which isn't bad at all.

This is when I just solve the puzzle starting at the top-left - if I turn on the ordering feature where empty cells are processing by least number of possibilities first, then the time dramatically increases to ~58 seconds, which shows why Sudoku is NP-complete.

If you're interested the solution it produced is:


1, 6, 2,   8, 5, 7,   4, 9, 3
5, 3, 4,   1, 2, 9,   6, 7, 8
7, 8, 9,   6, 4, 3,   5, 2, 1

4, 7, 5,   3, 1, 2,   9, 8, 6
9, 1, 3,   5, 8, 6,   7, 4, 2
6, 2, 8,   7, 9, 4,   1, 3, 5

3, 5, 6,   4, 7, 8,   2, 1, 9,
2, 4, 1,   9, 3, 5,   8, 6, 7,
8, 9, 7,   2, 6, 1,   3, 5, 4,

Thursday, March 29, 2007

The Worlds Hardest Sudoku

A mathematician claims to have penned the world's hardest Sudoku puzzle.

Running it through my Sudoku Solver it took ~24 seconds on my work machine (most puzzles are sub 1 second on the same machine) so I would think it must be reasonably hard.

Friday, February 23, 2007

A CSV to XML converter in XSLT 2.0

* Note: This transform has been updated http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html


I wrote a rudimentary csv to XML converter a while back which broke when the csv contained quoted values, eg foo, "foo, bar", bar Dealing with these quotes is surprisingly hard, especially when you take into account quotes are escaped by doubling them.

I raised it on xsl-list, and Abel Braaksma came up with a genious solution - the technique is to use both sides of analyze-string - read his post for the best explanation.

To keep the transform generic I've used an attribute instead of an element for the column names to cope with names that aren't valid QNames (for example ones that contain a space) - for my own use would add a function to convert names to valid QNames and then change <elem name="{...}"> to <xsl:element name="{fn:getQName(...)}"> as it generates nicer XML. I've also modified the non-matching-substring side of analyze-string to only return tokens that contain values.

So this sample input:

Col 1, Col 2, Col 3
foo, "foo,bar", "foo:""bar"""


...creates this output:

<root>
<row>
<elem name="Col 1">foo</elem>
<elem name="Col 2">foo,bar</elem>
<elem name="Col 3">foo:"bar"</elem>
</row>
</root>

Here's the finished transform:


<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="fn"
exclude-result-prefixes="xs fn">

<xsl:output indent="yes" encoding="US-ASCII"/>

<xsl:param name="pathToCSV" select="'file:///c:/csv.csv'"/>

<xsl:function name="fn:getTokens" as="xs:string+">
<xsl:param name="str" as="xs:string"/>
<xsl:analyze-string regex='("[^"]*")+' select="$str">
<xsl:matching-substring>
<xsl:sequence select='replace(., "^""|""$|("")""", "$1")'/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:for-each select="tokenize(., '\s*,\s*')">
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>

<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV)"/>
<xsl:variable name="lines" select="tokenize($csv, '&#xa;')" as="xs:string+"/>
<xsl:variable name="elemNames" select="fn:getTokens($lines[1])" as="xs:string+"/>
<root>
<xsl:for-each select="$lines[position() > 1]">
<row>
<xsl:variable name="lineItems" select="fn:getTokens(.)" as="xs:string+"/>

<xsl:for-each select="$elemNames">
<xsl:variable name="pos" select="position()"/>
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]"/>
</elem>
</xsl:for-each>
</row>
</xsl:for-each>
</root>
</xsl:when>
<xsl:otherwise>
<xsl:text>Cannot locate : </xsl:text>
<xsl:value-of select="$pathToCSV"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

</xsl:stylesheet>

Monday, February 05, 2007

A Soap Extension Function

Recently I had to write a client to retrieve XML from a web service that required authentication. All this meant was that the credentials needed to be in the soap header, eg:

<soapenv:Envelope>
<soapenv:Header>
<c:AuthHeader>
<c:Username>foo</c:Username>
<c:Password>bar</c:Password>
</c:AuthHeader>
</soapenv:Header>
<soapenv:Body>
...
Anyone that's coded any Java web service clients will know how hard it is to get the methods generated for you to add this soap header to your calls... its a nightmare. It varies between generation tool and the specification level you're coding to. What makes it worse is the whole reason you go through this pain is to send XML down the wire and get XML back. It would be much nicer if you could just make the call in XSLT....

I've written the following extension function to do just that. It accepts a soap request and an endpoint, makes the request and returns the soap response. It leaves the "complexity" of creating the request and processing to response to the XSLT, where it's pretty straightforward. Here's the java:

package net.sf.kernow.soapextension;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.StringWriter;
import java.io.UnsupportedEncodingException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.om.NodeInfo;

/**
* Enables the calling of SOAP based web services from XSLT.
* @author Andrew Welch
*/
public class SOAPExtension {

public static String soapRequest(NodeInfo requestXML, String endpoint) {
String result = makeCall(transformToString(requestXML), endpoint);
return result;
}

public static String soapRequest(String requestXML, String endpoint) {
String result = makeCall(requestXML, endpoint);
return result;
}

private static String transformToString(NodeInfo sourceXML) {

StringWriter sw = new StringWriter();

try {
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer();
transformer.transform(sourceXML, new StreamResult(sw));
} catch (TransformerConfigurationException ex) {
ex.printStackTrace();
} catch (TransformerException ex) {
ex.printStackTrace();
}

return sw.toString();
}

private static String makeCall(String requestXML, String endpoint) {

String SOAPUrl = endpoint;
StringBuffer responseBuf = new StringBuffer();

try {
// Create the connection to the endpoint
URL url = new URL(SOAPUrl);
URLConnection connection = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) connection;

byte[] b = requestXML.getBytes("UTF-8");

// Set the appropriate HTTP parameters.
httpConn.setRequestProperty( "Content-Length", String.valueOf(b.length));
httpConn.setRequestProperty("Content-Type","text/xml; charset=utf-8");

httpConn.setRequestMethod("POST");
httpConn.setDoOutput(true);
httpConn.setDoInput(true);

// Send the the request
OutputStream out = httpConn.getOutputStream();
out.write(b);
out.close();

// Read the response and write it to the response buffer.
InputStreamReader isr = new InputStreamReader(httpConn.getInputStream());
BufferedReader in = new BufferedReader(isr);

String line;
do {
line = in.readLine();
if (line != null) {
responseBuf.append(line);
}
} while (line != null);

in.close();

} catch (ProtocolException ex) {
ex.printStackTrace();
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}

return responseBuf.toString();
}
}

I've put the extension function in the "net.sf.kernow.soapextension" package and called it SOAPExtension (it will be in the 1.5 version of Kernow when I eventually release it). Now the XSLT to make and process the requests:

<xsl:stylesheet version="2.0"
xmlns:soap="net.sf.kernow.soapextension.SOAPExtension"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://saxon.sf.net/"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">

<xsl:param name="endpoint" select="'http://somewebservice'"/>

<xsl:variable name="request">
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://somewebservice/">
<soapenv:Body>
<ws:getSomething>
<urn>name123</urn>
</ws:getSomething>
</soapenv:Body>
</soapenv:Envelope>
</xsl:variable>

<xsl:template match="/" name="main">
<xsl:apply-templates select="saxon:parse(soap:soapRequest($request, $endpoint))" mode="process-SOAP-message"/>
</xsl:template>

<xsl:template match="/" mode="process-SOAP-message">
<xsl:apply-templates select="saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())" mode="process-response-payload"/>
</xsl:template>

<xsl:template match="/" mode="process-response-payload">
<xsl:apply-templates/>
</xsl:template>


</xsl:stylesheet>
There are a couple of thing to notice - firstly you call soapRequest() with the message as a document node, and the endpoint as a string. The extension will also accept the message as a string, but that would just request the extra step of saxon:serialize($request).

Secondly you need to use saxon:parse to parse the response string into XML. Applying templates to saxon:parse() will search for the root matching template, so to avoid endless loops different modes are used to separate the various root matching templates.

The template in the mode "process-SOAP-message" deals with processing the soap response, so the root element here would be , so in order to get to the actual payload (and to treat it as a document in its own right) I use:

saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())

...and a third root matching template in the mode "process-response-payload" (the actual path may vary for your payload). In this template you deal with actual response, so you can apply-templates to it, write it to disk etc

And that's it, it really is as simple as that. The S in SOAP can mean Simple :)

Sunday, February 04, 2007

Testing XSLT - CheckXML

It's well known that XSLT isn't easily unit-testable, and there isn't currently a standard way of testing transforms for correctness. I've long thought that the only way to do this is to run the transform using a given input and check the result, and through that infer correctness in the transform.


I wrote a stylesheet to do this in XSLT 2.0 with heavy use of extensions (to perform the transforms and execute the XPaths) which was nice from an academic standpoint, but it soon became clear that this would be more useful as a Java app runnable from Ant.


This little project grew into a way of checking any XML file (to check a transform it runs the transform first and then checks the result). I'm provisionally calling this "CheckXML" and it's still early days but I think it's got the potential to be something really good.

CheckXML will allow you perform various checks on an XML file - XML Schema, XPath 2.0, XSLT 2.0, XQuery and Relax NG. This allows users to augment schema checks with XPath/XSLT checks to fully check the correctness of an XML file. A sample CheckXML configuration file would be:

<checkXML>
 <xml src="SampleXML.xml">
  <check>SampleXSD.xsd</check>
  <check>count(distinct-values(//@id)) = count(//id)</check>
  <check>SampleXSLT.xslt</check>
 </xml>
</checkXML>

Here the XML file "SampleXML.xml" first has the XML Schema SampleXSD.xsd applied to it, the then the XPath in the second check and finally the XSLT check. The CheckXML app will run each check and look for a result of "true" - any value that isn't true will be reported as a fail. This is the crucial point - it offloads the reponsbility of a creating a useful error to the check writer, and because the check write has the full power of XSLT/XPath/XQuery the error message can be as detailed as necessary (XSD/RNG check will return the error message if the XML isn't valid).

For example, modifying the XPath above to return a helpful message could be like this:

<check>if (count(distinct-values(//@id)) = count(//@id)) then 'true' else concat('The following id is not unique: ', distinct-values(for $x in //@id return $x[count(//@id[. = $x]) > 1]))</check>

This would output "The following id is not unique: 123" if you had two @id's with the value "123". The XPath's can get pretty complicated pretty quickly, which is why most of them should be moved into XSLT files, but it still might be more convenient to just use the XPath in the check config file itself. As the check writer has full control over the return message, it can be as simple or complicated as needed.

The CheckConfig files can have multiple <xml> elements (<transform> elements for checking transforms) with each one having as many <check>'s as required. A CheckSuite will point to multiple CheckConfig files. CheckXML will be callable from Ant, with any fails causing the Ant build to fail (with all details of the fail in the logs). I'm also planning a GUI with a nice green/red progress bar :) and a CheckConfig editor, but that's down the line.