The eXtensible Markup Language (XML) is the de-facto standard for platform independent data interchange. Malcom Wallace and Collin Runciman outline two different approaches for XML document processing in Haskell - Generic Combinators or Type-Based Translation?.

The first approach uses the generic tree structure of XML data to represent XML documents and can be implemented without further tool support. The second approach translates XML document type definitions into algebraic datatypes and provides a mechanism to generate functions to read and write XML documents as typed data.

Both approaches can be mixed by providing type-based combinators, that construct conversion functions between dataterms and generic XML trees. No external tools are necessary and XML data can be converted into typed values using concise specifications expressed in Curry.

Primitive Values
Complex XML Data
Tipps and Tricks


Primitive Values

To convert between primitive values and their XML representation we provide converters of type XPrimConv a:

int    :: XPrimConv Int
float  :: XPrimConv Float
char   :: XPrimConv Char
string :: XPrimConv String

The XML representation of a primitive value is just a string without enclosing tag. The type XPrimConv a is an instance of the more general type XmlConv _ _ a which has two additional type parameters. For the moment you can safely ignore them - we will come back to them later. There also is a combinator

empty :: a -> XPrimConv a

for values of type a without XML representation.

XML Elements

I admit that values without XML representation are a bit unusual. However, they make sense if we combine them with a combinator that adds an enclosing tag to an arbitrary XML representation:

element :: String -> XmlConv _ _ a -> XElemConv a

This function takes a tag name and an XML converter and returns another XML converter that adds an enclosing tag to the XML data represented by the given converter. We provide shortcuts that directly create an enclosing tag for all provided combinators. So there is also a combinator

eEmpty :: String -> a -> XElemConv a
eEmpty name a = element name (empty a)

to create a converter that represents a value as an empty tag with the given name. For the primitive types we provide

eInt    :: String -> XElemConv Int
eFloat  :: String -> XElemConv Float
eChar   :: String -> XElemConv Char
eString :: String -> XElemConv String

For example, we could specify a converter for numbers as follows:

cNumber :: XElemConv Int
cNumber = eInt "number"

The XML representation of the number 42 corresponding to this converter is

<number>42</number>

As you might have guessed, XElemConv is another instance of the type

data XmlConv rep elem a = ... -- abstract

The first type parameter rep will be important in a moment - the second one elem specifies whether the corresponding converter represents an XML element or not. For example, primitive values do not represent XML elements and we will see other XML data that also doesn’t. To specify whether a converter represents an XML element, we provide the phantom types

data Elem
data NoElem

These types have no values. They are only used during type inference to ensure constraints on XML converters. For example, we provide functions

xmlRead :: XmlConv _ Elem a -> XmlExp -> a
xmlShow :: XmlConv _ Elem a -> a -> XmlExp

that must be called with XML converters that represent elements and the type system can ensure that they always are.

Attributes

Another example for XML data that does not represent an element are attributes. We provide converters of type XAttrConv a that represent primitive values as attributes:

aInt    :: String -> XAttrConv Int
aFloat  :: String -> XAttrConv Float
aChar   :: String -> XAttrConv Char
aString :: String -> XAttrConv String

For example, we could define an XML converter for numbers represented as attributes as

cNumber :: XElemConv Int
cNumber = element "number" (aInt "value")

The representation of the number 42 corresponding to this converter is

<number value="42" />

To represent arbitrary values with string representation as attribute, there is a combinator

attr :: String -> (String->a,a->String) -> XAttrConv a

For convenience, we also provide a function to construct an attribute converter for boolean values:

aBool :: String -> String -> String -> XAttrConv Bool

The provided strings are the attribute name, the representation of True and the representation of False respectively.

Complex XML Data

Honestly, converting primitive values into XML is not that exciting. We need combinators to construct complex converters from simple ones. To be able to handle arbitrary XML data, we need to model converters for optional XML data, repeated XML elements, sequences of XML data and choices. In the following, we will discuss one after the other in detail.

Optional Data

In Curry, optional values are represented as values of type Maybe a. In XML, data can be just present or missing. We provide a combinator to construct a converter for arbitrary XML data that can be missing:

opt :: XmlConv _ _ a -> XOptConv (Maybe a)

The function takes an XML converter of type a and returns one of type Maybe a. Nothing is represented by a missing value and Just x by the representation of x corresponding to the given converter. Using this combinator, we can define a converter for optional numbers as:

cOptNumber :: XElemConv (Maybe Int)
cOptNumber = element "number" (opt (aInt "value"))

The representation of Just 42 corresponding to this converter is

<number value="42" />

and the representation of Nothing is

<number />

We also ccould define

cOptNumber :: XOptConv (Maybe Int)
cOptNumber = opt (eInt "number")

and represent Just 42 as

<number>42</number>

and Nothing as nothing, i.e., without XML representation. However, this converter does not represent an element and can therefore not be used with xmlRead and xmlShow.

Repeated Elements

Lists of values can be modeled in XML by repeating representations of the values arbitrarily often. However, there is XML data that must not be repeated: Attributes must occur at most once and multiple primitive values cannot be parsed correctly, if they do not have an enclosing tag. Furthermore, parsing a repetition of optional values will always yield infinitely many results, as well as parsing repetitions of repeated values. In a fair implementation of Curry, we could allow optional values and repetitions to be repeatable and return all results nondeterministically. We need to constrain XML converters for repetitions to converters that represent XML data that may be safely repeated. We express this constraint by another pair of phantom types

data Repeatable
data NotRepeatable

and provide a combinator for repetitions that must be applied to a converter representing repeatable XML data:

rep :: XmlConv Repeatable _ a -> XRepConv [a]

This function returns a converter that represents lists as repeated representations of their elements, specified by the given converter. The type system ensures that only repetitions of repeatable elements are defined. For example, the call

cOptRep :: XRepConv [Int]
cOptRep = rep (opt int)

will be rejected by the type checker since optional values are not repeatable.

Now we can reveal the definitions of the presented converter types:

type XElemConv a = XmlConv Repeatable Elem a
type XAttrConv a = XmlConv NotRepeatable NoElem a
type XPrimConv a = XmlConv NotRepeatable NoElem a
type XOptConv  a = XmlConv NotRepeatable NoElem a
type XRepConv  a = XmlConv NotRepeatable NoElem a
type XSeqConv  a = XmlConv NotRepeatable NoElem a

The careful reader may notice, that we do not need two phantom types since we use only two from four possible combinations. Since this may change in a fair implementation of Curry, we stay with the distinction of repeatable XML data and XML data represented by an element.

Sequences

The last converter type presented above is the type for sequence converters. A sequence of values differs from a repetition in that it may contain values of different types. Ideally, a sequence should be repeatable if all components of the sequence are and not repeatable if any component is not repeatable. Unfortunately, we do not know how to express this using phantom types. Therefore, we define sequences as not repeatable by default and provide a special combinator to construct repetitions of sequences of repeatable elements:

seq2    :: (a -> b -> c)
        -> XmlConv _ _ a -> XmlConv _ _ b
        -> XSeqConv c
repSeq2 :: (a -> b -> c)
        -> XmlConv Repeatable _ a -> XmlConv Repeatable _ b
        -> XRepConv [c]

We do not only provide converters for sequences of two elements but also for other reasonable numbers. The first argument of both functions is a reversible function that combines the components of the sequence into a compound value. Usually, you can use a constructor of some datatype as first argument. The remaining arguments specify how the components are represented in XML. You can think of repSeq2 as being defined as

repSeq2 f xa xb = rep (seq2 f xa xb)

although this definition is prohibited by the type checker, since sequences are not repeatable.

With the presented combinators, we can already represent quite complex datastructures in XML. Let’s consider a datatype for persons

data Person = Person Name Sex Date
data Name = Name String (Maybe String)
type Sex = Bool
data Date = Date Int Int Int

and a converter for this datatype

cPerson :: XElemConv Person
cPerson = eSeq2 "person" Person cName cSex cBirth

cName :: XElemConv Name
cName = eSeq2 "name" Name (aString "last") (opt (aString "first"))

cSex :: XAttrConv Bool
cSex = aBool "sex" "male" "female"

cBirth :: XAttrConv Date
cBirth = attr "born" (readDate,showDate)

Corresponding to this converter, the value

Person
  (Name "Bach" (Just "Johann Sebastian"))
  True
  (Date 1685 3 21)

is represented as XML element

<person sex="male" birth="21.03.1685">
  <name last="Bach" first="Johann Sebastian" />
</person>

If we prefer to represent the birthdate as String inside the person-element, we can use the function

adapt :: (a->b,b->a) -> XmlConv r e a -> XmlConv r e b

and define

cBirth :: XPrimConv Date
cBirth = adapt (readDate,showDate) string

With this converter, the person-element would contain the name-element and the birthdate as mixed content. You should use mixed content very carefully and ensure that the XML representation can always be parsed correctly. Especially, two subsequent primitive representations can never be parsed correctly.

Nondeterministic Parsing

Combining sequences and optional values may result in unexpected nondeterminism. Consider a datatype

data TwoOptNums = Nums (Maybe Int) (Maybe Int)

with two optional components for numbers. If we define a converter for this type as

cTwoOptNums :: XElemConv TwoOptNums
cTwoOptNums = eSeq2 "numbers" Nums cNum cNum

cNum :: XOptConv Int
cNum = opt (eInt "number")

and call xmlRead on an XML representation where one component is missing, we nondeterministically get two results. One with Nothing in the first component and one with Nothing in he second. So you need to carefully design your converters in a way that representations can be parsed deterministically, if you don’t like such surprises. In this example you could use attributes with different names to represent the optional numbers:

cTwoOptNums :: XElemConv TwoOptNums
cTwoOptNums
  = eSeq2 "numbers" Nums (opt (aInt "n1")) (opt (aInt "n2"))

The same observations hold for repetitions instead of optional values, wich can also lead to nondeterministic parsing.

Choices

The nondeterministic features of Curry can also be employed to choose between different alternatives for conversion. With the combinators presented so far, we can’t model datatypes defined by multiple constructors. For this purpose we provide a function

(!) :: XmlConv r e a -> XmlConv r e a -> XmlConv r e a

that nondeterministically combines to XML converters. For example to convert between trees

data Tree = Leaf Int | Branch Tree Int Tree

and XML, we can define a converter

cTree :: XElemConv Tree
cTree = eSeq1 "leaf" Leaf (aInt "value")
      ! eSeq3 "branch" Branch cTree int cTree

and use it to convert between values like

Branch (Leaf 1) 2 (Leaf 3)

and

<branch>
  <leaf value="1" />2<leaf value="3" />
</branch>

This example shows, that even recursive datatypes can be modeled by our XML combinators.

Tipps and Tricks

There are several techniques that help to convert between Curry datatypes and XML trees that do not match exactly. In the following we present a few of them by brief examples.

Datatypes with more Structure

We can use anonymous sequences to apply more structure to a given XML representation. For example, if we want to handle XML data of the form

<person
  lastname="Bach"
  firstname="Johann Sebastian"
  sex="male"
  born="21.03.1685" />

with the datatype Person introduced above, we can define a converter

cPerson :: XElemConv Person
cPerson = eSeq2 "person" Person cName cSex cBirth

cName :: XSeqConv Name
cName
  = seq2 Name (aString "lastname") (opt (aString "firstname"))

Compared to the example above, we only need to change the definition of cName such that no extra element is introduced for names.

Datatypes with less Structure

We can also do it the other way round and model the original XML representation of persons with a less structured Curry datatype. For this purpose we need to introduce a smart constructor that creates less structured persons and use this instead of the actual constructor for persons:

data Person = Person String (Maybe String) Sex Date

person :: (String,Maybe String) -> Sex -> Date -> Person
person (last,first) sex born = Person last first sex born

cPerson :: XElemConv Person
cPerson = eSeq3 "person" person cName cSex cDate

cName :: XElemConv (String,Maybe String)
cName = eSeq2 "name" (,) (aString "last) (opt (aString "first"))


Long Sequences

We can also use a combination of both techniques to model sequences of a length that is not handled by our library - even if we do not want to introduce more structure on the Curry side. We can define a combinator

seq10 f c1 ... c10
  = seq2 merge (seq5 (,,,,) c1 ... c5) (seq5 (,,,,) c6 ... c10)
 where
  merge (x1,...,x5) (x6,...,x10) = f x1 ... x10

if the provided combinators for sequences don’t suffice to model our datatypes.

Reading Curry Data with less Information

If you only want to parse some XML data into Curry values, you can define a converter that ignores some input. Consider a datatype for persons without birthdate information and a smart constructor that ignores the birthdate:

data Person = Person Name Sex

ignoreBirth :: Name -> Sex -> _ -> Person
ignoreBirth name sex _ = Person name sex

You can use this smart constructor to define an XML parser that ignores birthdates of persons as follows:

cPerson :: XElemConv Person
cPerson = eSeq3 "person" ignoreBirth cName cSex (aString "born")

With this converter you can parse XML data like

<person sex="female" born="29.03.1976">
  <name last="Capriati" first="Jennifer" />
</person>

into the Curry value

(Person (Name "Capriati" (Just "Jennifer")) False)


Showing XML Data with less Information

If you only want to show Curry values as XML data, you can define a converter that hides some parts of the Curry datastructure. For example, if you want to generate XML data without birthdate information from the original datatype for persons, you can use the empty converter instead of cBirth:

cPerson :: XElemConv Person
cPerson = eSeq3 "person" Person cName cSex (empty unknown)

Now, you can use this converter to generate XML data like

<person sex="female">
  <name last="Capriati" first="Jennifer" />
</person>

from the Curry value

(Person (Name "Capriati" (Just "Jennifer")) False (Date 1976 3 29))

We presented an approach to concisely construct XML converters with type-based combinators. If you are interested in a more detailed look at our library, watch here.