The eXtensible Markup Language (XML) is the de-facto standard for platform independent data interchange. Malcom Wallace and Collin Runciman outline two different approaches for XML document processing in Haskell - Generic Combinators or Type-Based Translation?.
The first approach uses the generic tree structure of XML data to represent XML documents and can be implemented without further tool support. The second approach translates XML document type definitions into algebraic datatypes and provides a mechanism to generate functions to read and write XML documents as typed data.
Both approaches can be mixed by providing type-based combinators, that construct conversion functions between dataterms and generic XML trees. No external tools are necessary and XML data can be converted into typed values using concise specifications expressed in Curry.
To convert between primitive values and their XML representation we provide converters of type XPrimConv a:
int :: XPrimConv Int float :: XPrimConv Float char :: XPrimConv Char string :: XPrimConv String
The XML representation of a primitive value is just a string without enclosing tag. The type XPrimConv a is an instance of the more general type XmlConv _ _ a which has two additional type parameters. For the moment you can safely ignore them - we will come back to them later. There also is a combinator
empty :: a -> XPrimConv a
for values of type a without XML representation.
XML Elements
I admit that values without XML representation are a bit unusual. However, they make sense if we combine them with a combinator that adds an enclosing tag to an arbitrary XML representation:
element :: String -> XmlConv _ _ a -> XElemConv a
This function takes a tag name and an XML converter and returns another XML converter that adds an enclosing tag to the XML data represented by the given converter. We provide shortcuts that directly create an enclosing tag for all provided combinators. So there is also a combinator
eEmpty :: String -> a -> XElemConv a eEmpty name a = element name (empty a)
to create a converter that represents a value as an empty tag with the given name. For the primitive types we provide
eInt :: String -> XElemConv Int eFloat :: String -> XElemConv Float eChar :: String -> XElemConv Char eString :: String -> XElemConv String
For example, we could specify a converter for numbers as follows:
cNumber :: XElemConv Int cNumber = eInt "number"
The XML representation of the number 42 corresponding to this converter is
<number>42</number>
As you might have guessed, XElemConv is another instance of the type
data XmlConv rep elem a = ... -- abstract
The first type parameter rep will be important in a moment - the second one elem specifies whether the corresponding converter represents an XML element or not. For example, primitive values do not represent XML elements and we will see other XML data that also doesn’t. To specify whether a converter represents an XML element, we provide the phantom types
data Elem data NoElem
These types have no values. They are only used during type inference to ensure constraints on XML converters. For example, we provide functions
xmlRead :: XmlConv _ Elem a -> XmlExp -> a xmlShow :: XmlConv _ Elem a -> a -> XmlExp
that must be called with XML converters that represent elements and the type system can ensure that they always are.
Attributes
Another example for XML data that does not represent an element are attributes. We provide converters of type XAttrConv a that represent primitive values as attributes:
aInt :: String -> XAttrConv Int aFloat :: String -> XAttrConv Float aChar :: String -> XAttrConv Char aString :: String -> XAttrConv String
For example, we could define an XML converter for numbers represented as attributes as
cNumber :: XElemConv Int cNumber = element "number" (aInt "value")
The representation of the number 42 corresponding to this converter is
<number value="42" />
To represent arbitrary values with string representation as attribute, there is a combinator
attr :: String -> (String->a,a->String) -> XAttrConv a
For convenience, we also provide a function to construct an attribute converter for boolean values:
aBool :: String -> String -> String -> XAttrConv Bool
The provided strings are the attribute name, the representation of True and the representation of False respectively.
Complex XML Data
Honestly, converting primitive values into XML is not that exciting. We need combinators to construct complex converters from simple ones. To be able to handle arbitrary XML data, we need to model converters for optional XML data, repeated XML elements, sequences of XML data and choices. In the following, we will discuss one after the other in detail.
Optional Data
In Curry, optional values are represented as values of type Maybe a. In XML, data can be just present or missing. We provide a combinator to construct a converter for arbitrary XML data that can be missing:
opt :: XmlConv _ _ a -> XOptConv (Maybe a)
The function takes an XML converter of type a and returns one of type Maybe a. Nothing is represented by a missing value and Just x by the representation of x corresponding to the given converter. Using this combinator, we can define a converter for optional numbers as:
cOptNumber :: XElemConv (Maybe Int) cOptNumber = element "number" (opt (aInt "value"))
The representation of Just 42 corresponding to this converter is
<number value="42" />
and the representation of Nothing is
<number />
We also ccould define
cOptNumber :: XOptConv (Maybe Int) cOptNumber = opt (eInt "number")
and represent Just 42 as
<number>42</number>
and Nothing as nothing, i.e., without XML representation. However, this converter does not represent an element and can therefore not be used with xmlRead and xmlShow.
Repeated Elements
Lists of values can be modeled in XML by repeating representations of the values arbitrarily often. However, there is XML data that must not be repeated: Attributes must occur at most once and multiple primitive values cannot be parsed correctly, if they do not have an enclosing tag. Furthermore, parsing a repetition of optional values will always yield infinitely many results, as well as parsing repetitions of repeated values. In a fair implementation of Curry, we could allow optional values and repetitions to be repeatable and return all results nondeterministically. We need to constrain XML converters for repetitions to converters that represent XML data that may be safely repeated. We express this constraint by another pair of phantom types
data Repeatable data NotRepeatable
and provide a combinator for repetitions that must be applied to a converter representing repeatable XML data:
rep :: XmlConv Repeatable _ a -> XRepConv [a]
This function returns a converter that represents lists as repeated representations of their elements, specified by the given converter. The type system ensures that only repetitions of repeatable elements are defined. For example, the call
cOptRep :: XRepConv [Int] cOptRep = rep (opt int)
will be rejected by the type checker since optional values are not repeatable.
Now we can reveal the definitions of the presented converter types:
type XElemConv a = XmlConv Repeatable Elem a type XAttrConv a = XmlConv NotRepeatable NoElem a type XPrimConv a = XmlConv NotRepeatable NoElem a type XOptConv a = XmlConv NotRepeatable NoElem a type XRepConv a = XmlConv NotRepeatable NoElem a type XSeqConv a = XmlConv NotRepeatable NoElem a
The careful reader may notice, that we do not need two phantom types since we use only two from four possible combinations. Since this may change in a fair implementation of Curry, we stay with the distinction of repeatable XML data and XML data represented by an element.
Sequences
The last converter type presented above is the type for sequence converters. A sequence of values differs from a repetition in that it may contain values of different types. Ideally, a sequence should be repeatable if all components of the sequence are and not repeatable if any component is not repeatable. Unfortunately, we do not know how to express this using phantom types. Therefore, we define sequences as not repeatable by default and provide a special combinator to construct repetitions of sequences of repeatable elements:
seq2 :: (a -> b -> c)
-> XmlConv _ _ a -> XmlConv _ _ b
-> XSeqConv c
repSeq2 :: (a -> b -> c)
-> XmlConv Repeatable _ a -> XmlConv Repeatable _ b
-> XRepConv [c]
We do not only provide converters for sequences of two elements but also for other reasonable numbers. The first argument of both functions is a reversible function that combines the components of the sequence into a compound value. Usually, you can use a constructor of some datatype as first argument. The remaining arguments specify how the components are represented in XML. You can think of repSeq2 as being defined as
repSeq2 f xa xb = rep (seq2 f xa xb)
although this definition is prohibited by the type checker, since sequences are not repeatable.
With the presented combinators, we can already represent quite complex datastructures in XML. Let’s consider a datatype for persons
data Person = Person Name Sex Date data Name = Name String (Maybe String) type Sex = Bool data Date = Date Int Int Int
and a converter for this datatype
cPerson :: XElemConv Person cPerson = eSeq2 "person" Person cName cSex cBirth cName :: XElemConv Name cName = eSeq2 "name" Name (aString "last") (opt (aString "first")) cSex :: XAttrConv Bool cSex = aBool "sex" "male" "female" cBirth :: XAttrConv Date cBirth = attr "born" (readDate,showDate)
Corresponding to this converter, the value
Person (Name "Bach" (Just "Johann Sebastian")) True (Date 1685 3 21)
is represented as XML element
<person sex="male" birth="21.03.1685"> <name last="Bach" first="Johann Sebastian" /> </person>
If we prefer to represent the birthdate as String inside the person-element, we can use the function
adapt :: (a->b,b->a) -> XmlConv r e a -> XmlConv r e b
and define
cBirth :: XPrimConv Date cBirth = adapt (readDate,showDate) string
With this converter, the person-element would contain the name-element and the birthdate as mixed content. You should use mixed content very carefully and ensure that the XML representation can always be parsed correctly. Especially, two subsequent primitive representations can never be parsed correctly.
Nondeterministic Parsing
Combining sequences and optional values may result in unexpected nondeterminism. Consider a datatype
data TwoOptNums = Nums (Maybe Int) (Maybe Int)
with two optional components for numbers. If we define a converter for this type as
cTwoOptNums :: XElemConv TwoOptNums cTwoOptNums = eSeq2 "numbers" Nums cNum cNum cNum :: XOptConv Int cNum = opt (eInt "number")
and call xmlRead on an XML representation where one component is missing, we nondeterministically get two results. One with Nothing in the first component and one with Nothing in he second. So you need to carefully design your converters in a way that representations can be parsed deterministically, if you don’t like such surprises. In this example you could use attributes with different names to represent the optional numbers:
cTwoOptNums :: XElemConv TwoOptNums cTwoOptNums = eSeq2 "numbers" Nums (opt (aInt "n1")) (opt (aInt "n2"))
The same observations hold for repetitions instead of optional values, wich can also lead to nondeterministic parsing.
Choices
The nondeterministic features of Curry can also be employed to choose between different alternatives for conversion. With the combinators presented so far, we can’t model datatypes defined by multiple constructors. For this purpose we provide a function
(!) :: XmlConv r e a -> XmlConv r e a -> XmlConv r e a
that nondeterministically combines to XML converters. For example to convert between trees
data Tree = Leaf Int | Branch Tree Int Tree
and XML, we can define a converter
cTree :: XElemConv Tree
cTree = eSeq1 "leaf" Leaf (aInt "value")
! eSeq3 "branch" Branch cTree int cTree
and use it to convert between values like
Branch (Leaf 1) 2 (Leaf 3)
and
<branch> <leaf value="1" />2<leaf value="3" /> </branch>
This example shows, that even recursive datatypes can be modeled by our XML combinators.
Tipps and Tricks
There are several techniques that help to convert between Curry datatypes and XML trees that do not match exactly. In the following we present a few of them by brief examples.
Datatypes with more Structure
We can use anonymous sequences to apply more structure to a given XML representation. For example, if we want to handle XML data of the form
<person lastname="Bach" firstname="Johann Sebastian" sex="male" born="21.03.1685" />
with the datatype Person introduced above, we can define a converter
cPerson :: XElemConv Person cPerson = eSeq2 "person" Person cName cSex cBirth cName :: XSeqConv Name cName = seq2 Name (aString "lastname") (opt (aString "firstname"))
Compared to the example above, we only need to change the definition of cName such that no extra element is introduced for names.
Datatypes with less Structure
We can also do it the other way round and model the original XML representation of persons with a less structured Curry datatype. For this purpose we need to introduce a smart constructor that creates less structured persons and use this instead of the actual constructor for persons:
data Person = Person String (Maybe String) Sex Date person :: (String,Maybe String) -> Sex -> Date -> Person person (last,first) sex born = Person last first sex born cPerson :: XElemConv Person cPerson = eSeq3 "person" person cName cSex cDate cName :: XElemConv (String,Maybe String) cName = eSeq2 "name" (,) (aString "last) (opt (aString "first"))
We can also use a combination of both techniques to model sequences of a length that is not handled by our library - even if we do not want to introduce more structure on the Curry side. We can define a combinator
seq10 f c1 ... c10 = seq2 merge (seq5 (,,,,) c1 ... c5) (seq5 (,,,,) c6 ... c10) where merge (x1,...,x5) (x6,...,x10) = f x1 ... x10
if the provided combinators for sequences don’t suffice to model our datatypes.
Reading Curry Data with less Information
If you only want to parse some XML data into Curry values, you can define a converter that ignores some input. Consider a datatype for persons without birthdate information and a smart constructor that ignores the birthdate:
data Person = Person Name Sex ignoreBirth :: Name -> Sex -> _ -> Person ignoreBirth name sex _ = Person name sex
You can use this smart constructor to define an XML parser that ignores birthdates of persons as follows:
cPerson :: XElemConv Person cPerson = eSeq3 "person" ignoreBirth cName cSex (aString "born")
With this converter you can parse XML data like
<person sex="female" born="29.03.1976"> <name last="Capriati" first="Jennifer" /> </person>
into the Curry value
(Person (Name "Capriati" (Just "Jennifer")) False)
Showing XML Data with less Information
If you only want to show Curry values as XML data, you can define a converter that hides some parts of the Curry datastructure. For example, if you want to generate XML data without birthdate information from the original datatype for persons, you can use the empty converter instead of cBirth:
cPerson :: XElemConv Person cPerson = eSeq3 "person" Person cName cSex (empty unknown)
Now, you can use this converter to generate XML data like
<person sex="female"> <name last="Capriati" first="Jennifer" /> </person>
from the Curry value
(Person (Name "Capriati" (Just "Jennifer")) False (Date 1976 3 29))
We presented an approach to concisely construct XML converters with type-based combinators. If you are interested in a more detailed look at our library, watch here.

