Friday, February 23, 2007

A CSV to XML converter in XSLT 2.0

* Note: This transform has been updated http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html


I wrote a rudimentary csv to XML converter a while back which broke when the csv contained quoted values, eg foo, "foo, bar", bar Dealing with these quotes is surprisingly hard, especially when you take into account quotes are escaped by doubling them.

I raised it on xsl-list, and Abel Braaksma came up with a genious solution - the technique is to use both sides of analyze-string - read his post for the best explanation.

To keep the transform generic I've used an attribute instead of an element for the column names to cope with names that aren't valid QNames (for example ones that contain a space) - for my own use would add a function to convert names to valid QNames and then change <elem name="{...}"> to <xsl:element name="{fn:getQName(...)}"> as it generates nicer XML. I've also modified the non-matching-substring side of analyze-string to only return tokens that contain values.

So this sample input:

Col 1, Col 2, Col 3
foo, "foo,bar", "foo:""bar"""


...creates this output:

<root>
<row>
<elem name="Col 1">foo</elem>
<elem name="Col 2">foo,bar</elem>
<elem name="Col 3">foo:"bar"</elem>
</row>
</root>

Here's the finished transform:


<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="fn"
exclude-result-prefixes="xs fn">

<xsl:output indent="yes" encoding="US-ASCII"/>

<xsl:param name="pathToCSV" select="'file:///c:/csv.csv'"/>

<xsl:function name="fn:getTokens" as="xs:string+">
<xsl:param name="str" as="xs:string"/>
<xsl:analyze-string regex='("[^"]*")+' select="$str">
<xsl:matching-substring>
<xsl:sequence select='replace(., "^""|""$|("")""", "$1")'/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:for-each select="tokenize(., '\s*,\s*')">
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>

<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV)"/>
<xsl:variable name="lines" select="tokenize($csv, '&#xa;')" as="xs:string+"/>
<xsl:variable name="elemNames" select="fn:getTokens($lines[1])" as="xs:string+"/>
<root>
<xsl:for-each select="$lines[position() > 1]">
<row>
<xsl:variable name="lineItems" select="fn:getTokens(.)" as="xs:string+"/>

<xsl:for-each select="$elemNames">
<xsl:variable name="pos" select="position()"/>
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]"/>
</elem>
</xsl:for-each>
</row>
</xsl:for-each>
</root>
</xsl:when>
<xsl:otherwise>
<xsl:text>Cannot locate : </xsl:text>
<xsl:value-of select="$pathToCSV"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

</xsl:stylesheet>

Monday, February 05, 2007

A Soap Extension Function

Recently I had to write a client to retrieve XML from a web service that required authentication. All this meant was that the credentials needed to be in the soap header, eg:

<soapenv:Envelope>
<soapenv:Header>
<c:AuthHeader>
<c:Username>foo</c:Username>
<c:Password>bar</c:Password>
</c:AuthHeader>
</soapenv:Header>
<soapenv:Body>
...
Anyone that's coded any Java web service clients will know how hard it is to get the methods generated for you to add this soap header to your calls... its a nightmare. It varies between generation tool and the specification level you're coding to. What makes it worse is the whole reason you go through this pain is to send XML down the wire and get XML back. It would be much nicer if you could just make the call in XSLT....

I've written the following extension function to do just that. It accepts a soap request and an endpoint, makes the request and returns the soap response. It leaves the "complexity" of creating the request and processing to response to the XSLT, where it's pretty straightforward. Here's the java:

package net.sf.kernow.soapextension;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.StringWriter;
import java.io.UnsupportedEncodingException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.om.NodeInfo;

/**
* Enables the calling of SOAP based web services from XSLT.
* @author Andrew Welch
*/
public class SOAPExtension {

public static String soapRequest(NodeInfo requestXML, String endpoint) {
String result = makeCall(transformToString(requestXML), endpoint);
return result;
}

public static String soapRequest(String requestXML, String endpoint) {
String result = makeCall(requestXML, endpoint);
return result;
}

private static String transformToString(NodeInfo sourceXML) {

StringWriter sw = new StringWriter();

try {
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer();
transformer.transform(sourceXML, new StreamResult(sw));
} catch (TransformerConfigurationException ex) {
ex.printStackTrace();
} catch (TransformerException ex) {
ex.printStackTrace();
}

return sw.toString();
}

private static String makeCall(String requestXML, String endpoint) {

String SOAPUrl = endpoint;
StringBuffer responseBuf = new StringBuffer();

try {
// Create the connection to the endpoint
URL url = new URL(SOAPUrl);
URLConnection connection = url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) connection;

byte[] b = requestXML.getBytes("UTF-8");

// Set the appropriate HTTP parameters.
httpConn.setRequestProperty( "Content-Length", String.valueOf(b.length));
httpConn.setRequestProperty("Content-Type","text/xml; charset=utf-8");

httpConn.setRequestMethod("POST");
httpConn.setDoOutput(true);
httpConn.setDoInput(true);

// Send the the request
OutputStream out = httpConn.getOutputStream();
out.write(b);
out.close();

// Read the response and write it to the response buffer.
InputStreamReader isr = new InputStreamReader(httpConn.getInputStream());
BufferedReader in = new BufferedReader(isr);

String line;
do {
line = in.readLine();
if (line != null) {
responseBuf.append(line);
}
} while (line != null);

in.close();

} catch (ProtocolException ex) {
ex.printStackTrace();
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}

return responseBuf.toString();
}
}

I've put the extension function in the "net.sf.kernow.soapextension" package and called it SOAPExtension (it will be in the 1.5 version of Kernow when I eventually release it). Now the XSLT to make and process the requests:

<xsl:stylesheet version="2.0"
xmlns:soap="net.sf.kernow.soapextension.SOAPExtension"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://saxon.sf.net/"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">

<xsl:param name="endpoint" select="'http://somewebservice'"/>

<xsl:variable name="request">
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://somewebservice/">
<soapenv:Body>
<ws:getSomething>
<urn>name123</urn>
</ws:getSomething>
</soapenv:Body>
</soapenv:Envelope>
</xsl:variable>

<xsl:template match="/" name="main">
<xsl:apply-templates select="saxon:parse(soap:soapRequest($request, $endpoint))" mode="process-SOAP-message"/>
</xsl:template>

<xsl:template match="/" mode="process-SOAP-message">
<xsl:apply-templates select="saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())" mode="process-response-payload"/>
</xsl:template>

<xsl:template match="/" mode="process-response-payload">
<xsl:apply-templates/>
</xsl:template>


</xsl:stylesheet>
There are a couple of thing to notice - firstly you call soapRequest() with the message as a document node, and the endpoint as a string. The extension will also accept the message as a string, but that would just request the extra step of saxon:serialize($request).

Secondly you need to use saxon:parse to parse the response string into XML. Applying templates to saxon:parse() will search for the root matching template, so to avoid endless loops different modes are used to separate the various root matching templates.

The template in the mode "process-SOAP-message" deals with processing the soap response, so the root element here would be , so in order to get to the actual payload (and to treat it as a document in its own right) I use:

saxon:parse(soapenv:Envelope/soapenv:Body/*/return/node())

...and a third root matching template in the mode "process-response-payload" (the actual path may vary for your payload). In this template you deal with actual response, so you can apply-templates to it, write it to disk etc

And that's it, it really is as simple as that. The S in SOAP can mean Simple :)

Sunday, February 04, 2007

Testing XSLT - CheckXML

It's well known that XSLT isn't easily unit-testable, and there isn't currently a standard way of testing transforms for correctness. I've long thought that the only way to do this is to run the transform using a given input and check the result, and through that infer correctness in the transform.


I wrote a stylesheet to do this in XSLT 2.0 with heavy use of extensions (to perform the transforms and execute the XPaths) which was nice from an academic standpoint, but it soon became clear that this would be more useful as a Java app runnable from Ant.


This little project grew into a way of checking any XML file (to check a transform it runs the transform first and then checks the result). I'm provisionally calling this "CheckXML" and it's still early days but I think it's got the potential to be something really good.

CheckXML will allow you perform various checks on an XML file - XML Schema, XPath 2.0, XSLT 2.0, XQuery and Relax NG. This allows users to augment schema checks with XPath/XSLT checks to fully check the correctness of an XML file. A sample CheckXML configuration file would be:

<checkXML>
 <xml src="SampleXML.xml">
  <check>SampleXSD.xsd</check>
  <check>count(distinct-values(//@id)) = count(//id)</check>
  <check>SampleXSLT.xslt</check>
 </xml>
</checkXML>

Here the XML file "SampleXML.xml" first has the XML Schema SampleXSD.xsd applied to it, the then the XPath in the second check and finally the XSLT check. The CheckXML app will run each check and look for a result of "true" - any value that isn't true will be reported as a fail. This is the crucial point - it offloads the reponsbility of a creating a useful error to the check writer, and because the check write has the full power of XSLT/XPath/XQuery the error message can be as detailed as necessary (XSD/RNG check will return the error message if the XML isn't valid).

For example, modifying the XPath above to return a helpful message could be like this:

<check>if (count(distinct-values(//@id)) = count(//@id)) then 'true' else concat('The following id is not unique: ', distinct-values(for $x in //@id return $x[count(//@id[. = $x]) > 1]))</check>

This would output "The following id is not unique: 123" if you had two @id's with the value "123". The XPath's can get pretty complicated pretty quickly, which is why most of them should be moved into XSLT files, but it still might be more convenient to just use the XPath in the check config file itself. As the check writer has full control over the return message, it can be as simple or complicated as needed.

The CheckConfig files can have multiple <xml> elements (<transform> elements for checking transforms) with each one having as many <check>'s as required. A CheckSuite will point to multiple CheckConfig files. CheckXML will be callable from Ant, with any fails causing the Ant build to fail (with all details of the fail in the logs). I'm also planning a GUI with a nice green/red progress bar :) and a CheckConfig editor, but that's down the line.