* Note: This transform has been updated http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html
I wrote a rudimentary csv to XML converter a while back which broke when the csv contained quoted values, eg
foo, "foo, bar", bar
Dealing with these quotes is surprisingly hard, especially when you take into account quotes are escaped by doubling them.I raised it on xsl-list, and Abel Braaksma came up with a genious solution - the technique is to use both sides of analyze-string - read his post for the best explanation.
To keep the transform generic I've used an attribute instead of an element for the column names to cope with names that aren't valid QNames (for example ones that contain a space) - for my own use would add a function to convert names to valid QNames and then change <elem name="{...}"> to <xsl:element name="{fn:getQName(...)}"> as it generates nicer XML. I've also modified the non-matching-substring side of analyze-string to only return tokens that contain values.
So this sample input:
Col 1, Col 2, Col 3
foo, "foo,bar", "foo:""bar"""
...creates this output:
<root>
<row>
<elem name="Col 1">foo</elem>
<elem name="Col 2">foo,bar</elem>
<elem name="Col 3">foo:"bar"</elem>
</row>
</root>
Here's the finished transform:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="fn"
exclude-result-prefixes="xs fn">
<xsl:output indent="yes" encoding="US-ASCII"/>
<xsl:param name="pathToCSV" select="'file:///c:/csv.csv'"/>
<xsl:function name="fn:getTokens" as="xs:string+">
<xsl:param name="str" as="xs:string"/>
<xsl:analyze-string regex='("[^"]*")+' select="$str">
<xsl:matching-substring>
<xsl:sequence select='replace(., "^""|""$|("")""", "$1")'/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:for-each select="tokenize(., '\s*,\s*')">
<xsl:sequence select="."/>
</xsl:for-each>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV)"/>
<xsl:variable name="lines" select="tokenize($csv, '
')" as="xs:string+"/>
<xsl:variable name="elemNames" select="fn:getTokens($lines[1])" as="xs:string+"/>
<root>
<xsl:for-each select="$lines[position() > 1]">
<row>
<xsl:variable name="lineItems" select="fn:getTokens(.)" as="xs:string+"/>
<xsl:for-each select="$elemNames">
<xsl:variable name="pos" select="position()"/>
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]"/>
</elem>
</xsl:for-each>
</row>
</xsl:for-each>
</root>
</xsl:when>
<xsl:otherwise>
<xsl:text>Cannot locate : </xsl:text>
<xsl:value-of select="$pathToCSV"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
25 comments:
I confess -- I'm a newbie when it comes to using XSLT. Can you describe the steps to run this XSLT? I have the .CSV file, but I don't know what the next step is.
Thanks
Andrew - this looks really good and exactly what I am looking for.
However I get an error when I try to run this as follows:
$ java -cp "c:\qdata\saxon\saxon8.jar" net.sf.saxon.Transform -it main csv2xml3
.xsl
Error on line 11 of file:/c:/qdata/Telco/csv2xml3.xsl:
XTTE0780: An empty sequence is not allowed as the result of function fn:getTo
ens()
Transformation failed: Run-time errors were reported
Any ideas what might cause that?
The line is the start of the function definition.
Possibly leading or trailing new lines causing empty lines to be processed, eg an extra CR on the end of the last line.
I should probably modify this to cope better...
By the way, make sure you use the latest version which is linked to at the top of the page, or just click here
Strange. The version on this page now works fine (not sure what I did wrong before) but the version on the page that you linked to gives this error:
$ java -cp saxon/saxon8.jar net.sf.saxon.Transform -it main csv2xml2.xsl
Error on line 9 of file:/c:/qdata/Telco/csv2xml2.xsl:
XPDY0002: The context item for axis step child::element(test.csv) is undefined
Transformation failed: Run-time errors were reported
The CSV file is:
Col 1, Col 2, Col 3
foo, "foo,bar", "foo:""bar"""
one, two, three
Thanks for your help. I will use the version I have working now.
That error suggests it's treating the path to the csv as an XPath and not a string, which is usually because the parameter value is not wrapped in single quotes.
The $pathToCSV parameter here should really be typed as xs:string which would give a much better error message. One more thing on the TODO list....
So just to be explicit - when you've changed the default parameter value you've deleted the single quotes too. You have to ensure that your new value (which ends in "test.csv") is wrapped in single quotes to make it a string. (or just supply the new value as a parameter to the transform, rather than edit the default)
If you wanted to type the parameter as I mentioned in my last comment then change the parameter definition to:
<xsl:param
name="pathToCSV"
select="'file:///c:/csv.csv'"
as="xs:string"/>
Adding the "as" attribute will ensure that $pathToCSV is always a single string and give a helpful error message if you forget the single quotes.
If you haven't already, take a look at Kernow which is a GUI for Saxon so you don't have to use the command line.
Thanks that was the problem and the change has fixed things and made the errors better.
I'll take a look at Kernow.
Is this only used to manually transform a csv to xml or can it be used or modified to run on a server? I have a website that executes .xslt pages and I want to include .csv files and read them from there - The error I'm getting now is:
'unparsed-text-available()' is an unknown XSLT function.
Any ideas if there is a solution? Thanks!
This is an XSLT 2.0 transform so it requires an XSLT 2.0 processor, such as Saxon 9.
Hi Andrew,
We are looking for something that takes a detailed CSV and gives us a summarized XML based on some metadata.
For example, if we have a CSV that lists all the people in the US in the following format.
State, County, City, Name
We'd like to take this CSV and get an XML that shows for each "State" the "COUNT" of persons. "State" and "COUNT" would be parameterized.
What would be the most scalable approach to converting such CSV to a Grouped-XML?
Appreciate the help.
I would do it in 2 stages: convert the CSV to XML, then use XSLT 2.0 to do the grouping.
Thanks Andrew. That's an option we considered but were wondering if the performance would take a hit, in case we used a Dynamic XSLT (with parameters). The grouping would be decided based on metadata and isn't static.
If performance would be an issue, is there a more performant way to do it?
I really wouldn't expect performance to be problem here.
I would go with the two stage approach, measure the performance, and then if it really is a problem explore other solutions.
For assistance with the grouping, maybe join the xsl-list (google it) and ask on there.
Alternatively, put together a set of requirements and I'll give you a quote :)
Hi Andrew:
If I want to use this program to batch convert some csv files, how do I do it?
Any help is appreciated
Hi, nice work!
Unfortunately I can't manage to run it using :
- java 1.6.0_18 or 1.5.0_20
- kernow 1.6.1
- saxon 9.1.0.8
Whatever I use for the csv file path it always complain that the file can't be located.
ex : Cannot locate : file:///d:/csv.csv
I've the same problem using command line on saxon alone.
Could someone help me please.
hi
thanks for you post. It help me very much. I just have one question.
My .csv file content UTF-8 character like: Jürgen
The xml will export successful when i remove this character.
Replace the encoding :
encoding="US-ASCII"
by
encoding="UTF-8"
It should help
I did it but it does not work.
My data in .csv file is
Datum;Tag;Ein;Aus;Total;Task;Arbeitsnotiz;Tag Total;Tagesnotiz
23.11.2011;Mi;18:00;18:00;00:00;Safety;Jürgen 1131 User Lib;;
It alway show "Cannot locate :" problem, but i remove "Jürgen" text, then it works.
Could someone help me please.
I changed it like this:
xsl:output method="xml" encoding="utf-8"
Hi i need to convert a plain text file to plain text file using xslt2.0 is it possible
eg:
input file:
haii
this
is for test
Out put file:
haii
this
is for test
HI , thnks for sharing this. I have exact requiremnt.
I am trying to cpyu the code into XSLT in j developer and it is giving me lot of error.
Can u plz suggets if i can sue this code directly into .xsl file and compile it using j developer
HI , thnks for sharing this. I have exact requiremnt.
I am trying to cpyu the code into XSLT in j developer and it is giving me lot of error.
Can u plz suggets if i can sue this code directly into .xsl file and compile it using j developer
Post a Comment