> import Control.Monad
> import Control.Parallel
>
> import System ( getArgs )

As a running example I will use permutation sort, one of the least
efficient sorting algorithms ever invented:
> sort l = permute l `suchthat` isSorted
Admittedly,
* this is a toy example and
* sorting is usually not expressed as a search problem
but it is a representative program in the so called
*generate-and-test* style and it generates a huge search space
quickly. It should be possible to explore that in parallel leaving us
with a promising case study.
We can express non-determinism in Haskell using *lists of
successes*. The functions used to define `sort` could be defined using
*list comprehensions*:
~~~ { .Haskell }
suchthat :: [a] -> (a -> Bool) -> [a]
suchthat xs p = [ x | x <- xs, p x ]
permute :: [a] -> [[a]]
permute [] = [[]]
permute (x:xs) = [ zs | ys <- permute xs, zs <- insert x ys ]
insert :: a -> [a] -> [[a]]
insert x [] = [[x]]
insert x (y:ys) = [x:y:ys] ++ [ y:zs | zs <- insert x ys ]
~~~
`suchthat` is a convenient renaming for `flip filter` defined using a
list comprehension and the functions `permute` and `insert` return a
list of possible results instead of only one. For example, the call
`permute [1,2]` yields `[[1,2],[2,1]]`. The predicate `isSorted`
checks whether a list of numbers is sorted:
> isSorted :: [Int] -> Bool
> isSorted (x:y:ys) = x <= y && isSorted (y:ys)
> isSorted _ = True
With these definitions, the call `sort [3,2,1]` is evaluated to
`[[1,2,3]]` because `[1,2,3]` is the only sorted permutation of
`[3,2,1]`.
Lazy lists are great to represent the result of enumerating the search
space of a non-deterministic computation. But they fix the search
strategy to sequential depth-first search. I want to define a
different search strategy (viz., parallel depth-first search) so this
is too restrictive. We can generalize the above definitions using the
`MonadPlus` type class of which the type constructor `[]` for lists is
an instance of. So let's step back for an interlude that explains
monads for non-determinism.
Interlude that explains monads for non-determinism
---
A monad for non-determinism is a type constructor `m` that supports
the following four functions:
~~~ { .Haskell }
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
mzero :: m a
mplus :: m a -> m a -> m a
~~~
The first two functions belong to the type class `Monad`, the last two
belong to the type class `MonadPlus` which is a subclass of `Monad`. A
value of type `m a` can be interpreted as non-deterministic
computation that yields results of type `a`. The intuition behind
these four operations is as follows:
* `return` constructs a non-deterministic computation with a single
result, viz., the one given as argument,
* `>>=` applies a non-deterministic function to every result of a
non-deterministic computation and merges the results,
* `mzero` constructs a non-deterministic computation without
results, i.e., a failing computation, and
* `mplus` merges the results of two non-deterministic computations.
We can rewrite the non-deterministic functions for sorting using
monadic combinators:
> suchthat :: MonadPlus m => m a -> (a -> Bool) -> m a
> suchthat xs p = do x <- xs
> guard (p x) -- predefined in Control.Monad
> return x
>
> permute :: MonadPlus m => [a] -> m [a]
> permute [] = return []
> permute (x:xs) = do ys <- permute xs
> zs <- insert x ys
> return zs
>
> insert :: MonadPlus m => a -> [a] -> m [a]
> insert x [] = return [x]
> insert x (y:ys) = return (x:y:ys) `mplus` do zs <- insert x ys
> return (y:zs)
Now, our `sort` function can return results in any monad for
non-determinism. Especially, it can still yield a list of results.
Parallel search with a dumb instance of `MonadPlus`
---
In order to define parallel search algorithms, we define a different
instance of the `MonadPlus` type class that represents the search
space as a tree structure[^tree-monad]. The idea is to simply replace
calls to monadic combinators with constructors. Only `>>=` is
implemented as a function:
[^tree-monad]: A similar tree representation of non-deterministic
search is available on
[Hackage](http://hackage.haskell.org/cgi-bin/hackage-scripts/package/tree-monad).
> data MPlus a = MZero | Return a | MPlus (MPlus a) (MPlus a)
>
> instance Monad MPlus
> where
> return = Return
>
> MZero >>= _ = MZero
> Return x >>= f = f x
> MPlus l r >>= f = MPlus (l >>= f) (r >>= f)
>
> instance MonadPlus MPlus
> where
> mzero = MZero
> mplus = MPlus
The implementation of `>>=` is borrowed from laws [for
`Monad`][monad-laws] and [for `MonadPlus`][monadplus-laws].
We can now define different search strategies as functions of type
`MPlus a -> [a]`. For example, sequential depth-first search looks as
follows:
> search :: MPlus a -> [a]
> search MZero = []
> search (Return x) = [x]
> search (MPlus l r) = search l ++ search r
This function is inspired by the `MonadPlus` instance for `[]`.
Let's count on the promise of semi-explicit parallelism: In the last
rule defining `search`, the results of the right branch `r` of the
search space will probably be needed later and it might be beneficial
to compute them in parallel.
We can express this using the combinator `par` from the module
`Control.Parallel` and define a function for parallel depth-first
search:
> parSearch :: MPlus a -> [a]
> parSearch MZero = []
> parSearch (Return x) = [x]
> parSearch (MPlus l r) = rs `par` (parSearch l ++ rs)
> where rs = parSearch r
This use of `par` forks a so called *spark*, i.e., a computation that
might be chosen by the run-time system for parallel execution. We do
not need to fork a spark for searching the left subtree because
`parSearch l` is evaluated by the call to `++` immediately.
Does this tweak speedup search? Here is a simple benchmark to try it
out.
> main = do n <- liftM (read.head) getArgs
> mapM_ print (parSearch (sort [1..n]))
We compile this benchmark with GHCs `-threaded` option in order to
get code that runs on multiple cores:
~~~
$ ghc --make -threaded -O2 -o permsort-par speedup-search-with-parallelism.lhs
~~~
With a single core the run-time of `permsort-par 11` is about 20
seconds.
![Multi-core Permutation Sort](multicore-permsort.png)
Using two cores the run-time is reduced to 11 seconds. Twice the cores
gives almost half the run-time. Using three and four cores is again a
little bit faster.
I have measured the run-times using GHC 6.10.1 on an Intel Core2 Quad
CPU with 2.83 GHz providing 1 GB of initial heap. Using a current
snapshot of GHC 6.11 did not show significant differences on a MacBook
with Intel Core2 Duo CPU. This may well be due to the specific nature
of the solved problem: there is a huge search space with a single
solution, so most of the search space is just an expensive failing
computation.
Feel free to try other examples, the `parSearch` function is on
[Hackage].
For feedback contact [Sebastian Fischer][sebf] or comment on [reddit].
[monad-laws]: http://www.haskell.org/haskellwiki/Monad_Laws
[monadplus-laws]: http://www.haskell.org/haskellwiki/MonadPlus
[Hackage]: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/parallel-tree-search
[sebf]: mailto:sebf@informatik.uni-kiel.de
[reddit]: http://www.reddit.com/r/haskell/comments/8aoqh/speedup_search_with_parallelism/