At the Haskell Workshop 2007 Neil Mitchell presented an approach to generic traversals of data that does not need “Scary Types”. He restricted his Uniplate library compared to existing approaches and was able to give an implementation based on a simple Haskell type class.
This post rephrases Neil’s enlightening ideas and simplifies (a bit) the implementation and use of his library. In the following, I will explain the ideas in my own words. I have implemented a Haskell library Traversal.hs for generic traversals and will comment on similarities and differences after documenting my library by example.
Features of the Traversal
library
The main feature of the library is to support to query and manipulate children of tree-structured values. With little help from the programmer, the library provides powerful and flexible queries and transformations of tree structured values. Powerful, because sophisticated functions are implemented in the library independently of a concrete datatype, so the programmer does not need to write such often tedious traversal code herself. Flexible, because the provided functions allow to focus on the interesting part of the data and use default behaviour for other parts, so the programmer can later refine her datatypes without having to adapt transformations that are not concerned with the change.
Finally, the approach is comnpletely lightweight. The library compiles with both hugs and ghc – the currently most widely used Haskell systems. It uses multi-parameter type classes to provide a bit more general interface. An implementation within Haskell98 is very useful already and can be obtained by specializing some type signatures in the presented library.
A central type class
In order to use the library for her own datatype, the programmer has to make it an instance of the type class Traversable
:
class Traversable a b where scrap :: a -> ([b],[b] -> a)
This type class specifies a single function scrap
that selects a list of children from an arbitrary value and provides a means to replace the children of the given value. The main restriction of this approach is that all children must have the same type. Thanks to multi-parameter type-classes, the type of the children can differ from the type of the “mother”. Although the approach is not as general as other approaches to generic traversals it is general enough for many practical examples.
Assume we have defined types for names and expressions and define a type for bindings based on these:
type Binds = [(Name,Exp)]
We can make this type traversable as follows:
instance Traversable Binds Exp where scrap bs = let (xs,es) = unzip bs in (es, zip xs)
We decided to consider expressions – not names – children of bindings. We can use library functions to compute the list of expressions and replace bound expressions in a list of bindings:
boundExps :: Binds -> [Exp] boundExps = children replaceBoundExps :: Binds -> [Exp] -> Binds replaceBoundExps = replaceChildren
The employed library functions have the following types:
children :: Traversable a b => a -> [b] replaceChildren :: Traversable a b => a -> [b] -> a
They are simply defined by projecting to the two components of the result of the function scrap
that is defined in the Traversable
class.
Honestly, this is a boring example. There are more interesting functions defined in the library that are based on these “single-step operations”. Let’s consider these!
A datatype for expressions
As a running example, we will use a datatype for expressions defined as follows.
type Name = String data Exp = Cons Name | Var Name | Lam Name Exp | Exp :@: Exp | Let Binds Exp
As this is a recursive datatype, we can make it an instance of Traversable
by considering direct subexpressions children of an expression:
instance Traversable Exp Exp where scrap (Lam x e) = ([e], \ [e'] -> Lam x e') scrap (e1 :@: e2) = ([e1,e2], \ [e1',e2'] -> e1' :@: e2') scrap (Let bs e) = let (es,rep) = scrap bs in (e:es, \ (e':es') -> Let (rep es') e') scrap e = ([], \_ -> e)
Note, how we can reuse the instance Traversable Binds Exp
to implement the Let
-case for the scrap
function. The library function
family :: Traversable a a => a -> [a]
computes the list of the given value, its children, those children and so on, if the given value is traversable and its type equals the type of its children. Applied to an expression, it yields the list of all subexpressions of the given expression. We can use this function, e.g., to compute the names of all variables that are used in an expression:
usedVars :: Exp -> [Name] usedVars e = [ n | Var n ← family e ]
Note that this function mentions only a single constructor of the Exp
datatype. If we want to add new kinds of expressions – for example, case expressions for pattern matching – only the instance declaration for Traversable
has to be adapted – usedVars
does not have to be changed.
A simple transformation
We cannot only query subexpressions but also transform them with user defined functions. Assume we want to eliminate simple let expressions. A let expression is simple, if the bound expressions do not use the introduced variables:
isSimpleLet :: Exp -> Bool isSimpleLet (Let bs _) = null (intersect xs (concatMap usedVars es)) where xs = map fst bs es = boundExps bs isSimpleLet _ = False
Simple let expressions can be replaced by a lambda expression as follows:
elimSimpleLet :: Exp -> Exp elimSimpleLet (Let bs e) = foldl (:@:) (foldr Lam e xs) es where xs = map fst bs es = boundExps bs
In order to apply this transformation to every simple let expression in a given expression we employ the library function
mapFamily :: Traversable a a => (a -> a) -> a -> a
that maps the given function on all family members of the given value. With the help of mapFamily
, eliminating simple let expressions is straightforward:
elimSimpleLets :: Exp -> Exp elimSimpleLets = mapFamily elim where elim e | isSimpleLet e = elimSimpleLet e | otherwise = e
Note again, that we don’t have to change anything, if we decide to add new constructors to the expression datatype.
Refining the transformation
The specification of simple let expressions can be improved. For example, the following let expression could be considered simple, but is not by the current definition of isSimpleLet
:
let x=\x.x in x
Although x
appears in the list of variables used in \x.x
we can safely eliminate this let expression, because x
does not occur freely in \x.x
. If we redefine isSimpleLet
as
isSimpleLet :: Exp -> Bool isSimpleLet (Let bs _) = null (intersect xs (concatMap freeVars es)) where xs = map fst bs es = boundExps bs isSimpleLet _ = False
then the above let expression is simplified to
(\x.x) (\x.x)
But how can we define the function freeVars
?
A monad for scoping
An occurrence of a variable is called free if the variable is not in scope, i.e., bound by a let or lambda expression, when it is referenced. We use the following monad with two auxiliary functions to model scoping:
newtype Scoped a = Scoped { runScoped :: [Name] -> a } instance Monad Scoped where return x = Scoped (const x) a >>= f = Scoped (\ns -> runScoped (f (runScoped a ns)) ns) isInScope :: Name -> Scoped Bool isInScope n = Scoped (n`elem`) extendScope :: [Name] -> Scoped a -> Scoped a extendScope ns scpd = Scoped (runScoped scpd . (ns++))
The function extendScope
is special because it takes a monadic action and runs it in a different scope. This is more natural than using a state monad because variables that are bound in one subexpression can occur freely in another.
The function freeVars
can now be defined in terms of a monadic action that computes free variables:
freeVars :: Exp -> [Name] freeVars e = runScoped (freeVarsScoped e) []
The function freeVarsScoped
has the structure of a generalized fold, i.e., it matches all alternatives of the Exp
datatype, calls itself recursively on subexpressions, and recombines the results to yield the overall result.
This is a very frequent scheme – many transformations are defined like this. However, if we mention all constructors explicitely in the definition of freeVarsScoped
, this results in a poorly extensible implementation. We would have to change freeVarsScoped
whenever we change the definition of Exp
.
As we are only interested in the Var
, Lam
and Let
cases, we are only willing to change freeVarsScoped
when one of these constructors changes. We don’t care about the others.
Of course, the Traversal
library comes to the rescue. It provides a function
fold :: Traversable a a => (a -> [r] -> r) -> a -> r
that can be employed to define freeVarsScoped
as follows:
freeVarsScoped :: Exp -> Scoped [Name] freeVarsScoped = fold vars where vars (Var n) cs = do inScope ← isInScope n if inScope then return [] else return [n] vars (Lam n _) cs = extendScope [n] (descend cs) vars (Let bs _) cs = extendScope (map fst bs) (descend cs) vars _ cs = descend cs descend = liftM concat . sequence
We only explicitely mention the constructors that we are interested in. All other cases are summarized by the local function descend
that combines the results of the different recursive calls that are hidden in the implementation of fold
.
Comparison with Neil’s Uniplate library
The main difference between the Traversal
and the Uniplate
library is that the former defines a single type class Traversable
and the latter defines two type classes:
class Uniplate a where uniplate :: a -> ([a], [a] -> a) class Uniplate b => Biplate a b where biplate :: a -> ([b], [b] -> a)
You can see that the type of uniplate
is a special case of the type of biplate
with a=b
.
The function family
is called universe
in Uniplate:
universe :: Uniplate a => a -> [a]
There is a similar function universeBi
for the Biplate
class that is called childFamilies
in the Traversal library:
universeBi :: Biplate a b => a -> [b] childFamilies :: (Traversable a b, Traversable b b) => a -> [b]
The functions family
and childFamilies
can be implemented as mutually recursive functions quite naturally:
family :: Traversable a a => a -> [a] family a = a : childFamilies a childFamilies :: (Traversable a b, Traversable b b) => a -> [b] childFamilies = concatMap family . children
The function uniplate
cannot be implemented in terms of uniplateBi
because an instance Uniplate a
does not imply Biplate a a
.
Moreover, Neil seems to have invested much more thoughts in making his implementation efficient. I’m wondering whether his optimizations could be imitated within the presented single type-class design.
Another topic for future investigations is to derive instances of the class Traversable
automatically like it is possible with Uniplate. Until now, I did not feel the need for implementing this for my own projects because the instances are not very difficult to write by hand.
Try it out!
I did not describe the whole interface of the library. So watch Traversal.hs for more information.