Thursday, August 16, 2007

When a = b and a != b both return true...

In XPath = and != are set operators. That is, they return true if any item on the left hand side returns true when compared with any item on the right hand side. Or in other words:

some x in $seqA, y in $seqB satisfies x op y

...where "op" is = or != (or > or < etc)

To demonstrate this take the two sets ('a', 'b') and ('b', 'c'):

$seqA = $seqB returns true because both sets contains 'b'

$seqA != $seqB returns true because setA contains 'a' which is not equal to 'c' in setB

This catches me out a lot, even though I've been caught out before several times. I really have to think hard about what it is exactly that I'm comparing, and still end up getting it wrong.

A simple rules to follow is "never use != where both sides are sequences of more than one item". 99.9% of the time you won't need to, as much as it feels like the right thing to do.

Below are some of the most common operations on sequences, put together for a reference.

The two sequences are ('a', 'b') and ('b', 'c'), which can be defined in XSLT as:

<xsl:variable name="seqA" select="('a', 'b')" as="xs:string+"/>
<xsl:variable name="seqB" select="('b', 'c')" as="xs:string+"/>

or in XQuery as:

let $seqA := ('a', 'b')
let $seqB := ('b', 'c')



Select all items in both sequences

($seqA, $seqB)

Result: a b b c



Select all items in both sequences, eliminating duplicates

distinct-values(($seqA, $seqB))

Result: a b c



Select all items that occur in $seq1 but not $seq2

$seqA[not(. = $seqB)]

Result: a



Select all items that occur in both sequences

$seqA[. = $seqB]

Result: b



Select all items that do not occur in both sequences

($seqA[not(. = $seqB)],$seqB[not(. = $seqA)])

or

($seqA, $seqB)[not(. = $seqA[. = $seqB])]


Result: a c



Determine if both sequences are identical

deep-equal($seqA, $seqB)

Result: false



Test if all items in the sequence are different

count(distinct-values($seqA)) eq count($seqA)

Result: true


2 comments:

garth said...

Hi

XSLT looks very interesting. I have written a short program in c that is able to solve this Sudoku 2500 times a second. However if I try it with the collection of 34000, 17 clue sudoku available on the web, it can only do 100 per second. I would be happy to share the code since I am sure that it could be greatly improved for better speed. Let me know if anyone is interested.

Garth

Andrew Welch said...

I guess you intended to post that against the Sudoku article... For your benefit and everyone else's: the goal was to write a Sudoku solver *using XSLT*. There are countless solvers in other languages, this blog is interested about the ones involving XSLT.