At the Haskell Workshop 2007 Neil Mitchell presented an approach to generic traversals of data that does not need “Scary Types”. He restricted his Uniplate library compared to existing approaches and was able to give an implementation based on a simple Haskell type class.

This post rephrases Neil’s enlightening ideas and simplifies (a bit) the implementation and use of his library. In the following, I will explain the ideas in my own words. I have implemented a Haskell library Traversal.hs for generic traversals and will comment on similarities and differences after documenting my library by example.

Features of the Traversal library

The main feature of the library is to support to query and manipulate children of tree-structured values. With little help from the programmer, the library provides powerful and flexible queries and transformations of tree structured values. Powerful, because sophisticated functions are implemented in the library independently of a concrete datatype, so the programmer does not need to write such often tedious traversal code herself. Flexible, because the provided functions allow to focus on the interesting part of the data and use default behaviour for other parts, so the programmer can later refine her datatypes without having to adapt transformations that are not concerned with the change.

Finally, the approach is comnpletely lightweight. The library compiles with both hugs and ghc – the currently most widely used Haskell systems. It uses multi-parameter type classes to provide a bit more general interface. An implementation within Haskell98 is very useful already and can be obtained by specializing some type signatures in the presented library.

A central type class

In order to use the library for her own datatype, the programmer has to make it an instance of the type class Traversable:

class Traversable a b
 where scrap :: a -> ([b],[b] -> a)

This type class specifies a single function scrap that selects a list of children from an arbitrary value and provides a means to replace the children of the given value. The main restriction of this approach is that all children must have the same type. Thanks to multi-parameter type-classes, the type of the children can differ from the type of the “mother”. Although the approach is not as general as other approaches to generic traversals it is general enough for many practical examples.

Assume we have defined types for names and expressions and define a type for bindings based on these:

type Binds = [(Name,Exp)]

We can make this type traversable as follows:

instance Traversable Binds Exp
 where scrap bs = let (xs,es) = unzip bs in (es, zip xs)

We decided to consider expressions – not names – children of bindings. We can use library functions to compute the list of expressions and replace bound expressions in a list of bindings:

boundExps :: Binds -> [Exp]
boundExps = children

replaceBoundExps :: Binds -> [Exp] -> Binds
replaceBoundExps = replaceChildren

The employed library functions have the following types:

children        :: Traversable a b => a -> [b]
replaceChildren :: Traversable a b => a -> [b] -> a

They are simply defined by projecting to the two components of the result of the function scrap that is defined in the Traversable class.

Honestly, this is a boring example. There are more interesting functions defined in the library that are based on these “single-step operations”. Let’s consider these!

A datatype for expressions

As a running example, we will use a datatype for expressions defined as follows.

type Name = String

data Exp
  = Cons Name
  | Var Name
  | Lam Name Exp
  | Exp :@: Exp
  | Let Binds Exp

As this is a recursive datatype, we can make it an instance of Traversable by considering direct subexpressions children of an expression:

instance Traversable Exp Exp
 where
  scrap (Lam x e)   = ([e], \ [e'] -> Lam x e')
  scrap (e1 :@: e2) = ([e1,e2], \ [e1',e2'] -> e1' :@: e2')
  scrap (Let bs e)  = let (es,rep) = scrap bs
                       in (e:es, \ (e':es') -> Let (rep es') e')
  scrap e           = ([], \_ -> e)

Note, how we can reuse the instance Traversable Binds Exp to implement the Let-case for the scrap function. The library function

family :: Traversable a a => a -> [a]

computes the list of the given value, its children, those children and so on, if the given value is traversable and its type equals the type of its children. Applied to an expression, it yields the list of all subexpressions of the given expression. We can use this function, e.g., to compute the names of all variables that are used in an expression:

usedVars :: Exp -> [Name]
usedVars e = [ n | Var n ← family e ]

Note that this function mentions only a single constructor of the Exp datatype. If we want to add new kinds of expressions – for example, case expressions for pattern matching – only the instance declaration for Traversable has to be adapted – usedVars does not have to be changed.

A simple transformation

We cannot only query subexpressions but also transform them with user defined functions. Assume we want to eliminate simple let expressions. A let expression is simple, if the bound expressions do not use the introduced variables:

isSimpleLet :: Exp -> Bool
isSimpleLet (Let bs _) =
  null (intersect xs (concatMap usedVars es))
 where
  xs = map fst bs
  es = boundExps bs

isSimpleLet _ = False

Simple let expressions can be replaced by a lambda expression as follows:

elimSimpleLet :: Exp -> Exp
elimSimpleLet (Let bs e) = foldl (:@:) (foldr Lam e xs) es
 where
  xs = map fst bs
  es = boundExps bs

In order to apply this transformation to every simple let expression in a given expression we employ the library function

mapFamily :: Traversable a a => (a -> a) -> a -> a

that maps the given function on all family members of the given value. With the help of mapFamily, eliminating simple let expressions is straightforward:

elimSimpleLets :: Exp -> Exp
elimSimpleLets = mapFamily elim
 where elim e | isSimpleLet e = elimSimpleLet e
              | otherwise     = e

Note again, that we don’t have to change anything, if we decide to add new constructors to the expression datatype.

Refining the transformation

The specification of simple let expressions can be improved. For example, the following let expression could be considered simple, but is not by the current definition of isSimpleLet:

let x=\x.x in x

Although x appears in the list of variables used in \x.x we can safely eliminate this let expression, because x does not occur freely in \x.x. If we redefine isSimpleLet as

isSimpleLet :: Exp -> Bool
isSimpleLet (Let bs _) =
  null (intersect xs (concatMap freeVars es))
 where
  xs = map fst bs
  es = boundExps bs

isSimpleLet _ = False

then the above let expression is simplified to

(\x.x) (\x.x)

But how can we define the function freeVars?

A monad for scoping

An occurrence of a variable is called free if the variable is not in scope, i.e., bound by a let or lambda expression, when it is referenced. We use the following monad with two auxiliary functions to model scoping:

newtype Scoped a = Scoped { runScoped :: [Name] -> a }

instance Monad Scoped
 where
  return x = Scoped (const x)
  a >>=  f = Scoped (\ns -> runScoped (f (runScoped a ns)) ns)

isInScope :: Name -> Scoped Bool
isInScope n = Scoped (n`elem`)

extendScope :: [Name] -> Scoped a -> Scoped a
extendScope ns scpd = Scoped (runScoped scpd . (ns++))

The function extendScope is special because it takes a monadic action and runs it in a different scope. This is more natural than using a state monad because variables that are bound in one subexpression can occur freely in another.

The function freeVars can now be defined in terms of a monadic action that computes free variables:

freeVars :: Exp -> [Name]
freeVars e = runScoped (freeVarsScoped e) []

The function freeVarsScoped has the structure of a generalized fold, i.e., it matches all alternatives of the Exp datatype, calls itself recursively on subexpressions, and recombines the results to yield the overall result.

This is a very frequent scheme – many transformations are defined like this. However, if we mention all constructors explicitely in the definition of freeVarsScoped, this results in a poorly extensible implementation. We would have to change freeVarsScoped whenever we change the definition of Exp.

As we are only interested in the Var, Lam and Let cases, we are only willing to change freeVarsScoped when one of these constructors changes. We don’t care about the others.

Of course, the Traversal library comes to the rescue. It provides a function

fold :: Traversable a a => (a -> [r] -> r) -> a -> r

that can be employed to define freeVarsScoped as follows:

freeVarsScoped :: Exp -> Scoped [Name]
freeVarsScoped = fold vars
 where
  vars (Var n) cs = do
    inScope ← isInScope n
    if inScope then return [] else return [n]

  vars (Lam n _)  cs = extendScope [n] (descend cs)
  vars (Let bs _) cs = extendScope (map fst bs) (descend cs)
  vars _          cs = descend cs

  descend = liftM concat . sequence

We only explicitely mention the constructors that we are interested in. All other cases are summarized by the local function descend that combines the results of the different recursive calls that are hidden in the implementation of fold.

Comparison with Neil’s Uniplate library

The main difference between the Traversal and the Uniplate library is that the former defines a single type class Traversable and the latter defines two type classes:

class Uniplate a
 where uniplate :: a -> ([a], [a] -> a)

class Uniplate b => Biplate a b
 where biplate :: a -> ([b], [b] -> a)

You can see that the type of uniplate is a special case of the type of biplate with a=b.

The function family is called universe in Uniplate:

universe :: Uniplate a => a -> [a]

There is a similar function universeBi for the Biplate class that is called childFamilies in the Traversal library:

universeBi :: Biplate a b => a -> [b]

childFamilies :: (Traversable a b, Traversable b b) => a -> [b]

The functions family and childFamilies can be implemented as mutually recursive functions quite naturally:

family :: Traversable a a => a -> [a]
family a = a : childFamilies a

childFamilies :: (Traversable a b, Traversable b b) => a -> [b]
childFamilies = concatMap family . children

The function uniplate cannot be implemented in terms of uniplateBi because an instance Uniplate a does not imply Biplate a a.

Moreover, Neil seems to have invested much more thoughts in making his implementation efficient. I’m wondering whether his optimizations could be imitated within the presented single type-class design.

Another topic for future investigations is to derive instances of the class Traversable automatically like it is possible with Uniplate. Until now, I did not feel the need for implementing this for my own projects because the instances are not very difficult to write by hand.

Try it out!

I did not describe the whole interface of the library. So watch Traversal.hs for more information.