% Speedup search with parallelism!
% Sebastian Fischer <sebf@informatik.uni-kiel.de>

Recently, I learned about [improved support for *semi-explicit
parallelism*][ghc-par] in the Glorious Haskell Compiler GHC:

 > [Programmers] merely annotate subcomputations that might be
 > evaluated in parallel, leaving the choice of whether to actually do
 > so to the run-time system. [...] programmers should be able to take
 > existing Haskell programs, and with a little high-level knowledge
 > of how the program should parallelize, make some small
 > modifications to the program using existing well-known techniques,
 > and thereby achieve decent speedup on today's parallel hardware.

[ghc-par]: http://www.haskell.org/~simonmar/papers/multicore-ghc.pdf

One of my favourite research topics is non-deterministic
programming. Let's see whether semi-explicit parallelism helps to
enumerate the search space of such programs faster. This post is
generated from a [literate Haskell file][lhs] in case you want to
reproduce my results.

[lhs]: speedup-search-with-parallelism.lhs

Non-deterministic programming in Haskell
---

<div class="nodisplay">

> import Control.Monad
> import Control.Parallel
>
> import System ( getArgs )

</div>

As a running example I will use permutation sort, one of the least
efficient sorting algorithms ever invented:

> sort l = permute l `suchthat` isSorted

Admittedly,

  * this is a toy example and

  * sorting is usually not expressed as a search problem

but it is a representative program in the so called
*generate-and-test* style and it generates a huge search space
quickly. It should be possible to explore that in parallel leaving us
with a promising case study.

We can express non-determinism in Haskell using *lists of
successes*. The functions used to define `sort` could be defined using
*list comprehensions*:

~~~ { .Haskell }
suchthat :: [a] -> (a -> Bool) -> [a]
suchthat xs p = [ x | x <- xs, p x ]

permute :: [a] -> [[a]]
permute []     = [[]]
permute (x:xs) = [ zs | ys <- permute xs, zs <- insert x ys ]

insert :: a -> [a] -> [[a]]
insert x []     = [[x]]
insert x (y:ys) = [x:y:ys] ++ [ y:zs | zs <- insert x ys ]
~~~

`suchthat` is a convenient renaming for `flip filter` defined using a
list comprehension and the functions `permute` and `insert` return a
list of possible results instead of only one. For example, the call
`permute [1,2]` yields `[[1,2],[2,1]]`. The predicate `isSorted`
checks whether a list of numbers is sorted:

> isSorted :: [Int] -> Bool
> isSorted (x:y:ys) = x <= y && isSorted (y:ys)
> isSorted _        = True

With these definitions, the call `sort [3,2,1]` is evaluated to
`[[1,2,3]]` because `[1,2,3]` is the only sorted permutation of
`[3,2,1]`.

Lazy lists are great to represent the result of enumerating the search
space of a non-deterministic computation. But they fix the search
strategy to sequential depth-first search. I want to define a
different search strategy (viz., parallel depth-first search) so this
is too restrictive. We can generalize the above definitions using the
`MonadPlus` type class of which the type constructor `[]` for lists is
an instance of. So let's step back for an interlude that explains
monads for non-determinism.

Interlude that explains monads for non-determinism
---

A monad for non-determinism is a type constructor `m` that supports
the following four functions:

~~~ { .Haskell }
return :: a -> m a
(>>=)  :: m a -> (a -> m b) -> m b
mzero  :: m a
mplus  :: m a -> m a -> m a
~~~

The first two functions belong to the type class `Monad`, the last two
belong to the type class `MonadPlus` which is a subclass of `Monad`. A
value of type `m a` can be interpreted as non-deterministic
computation that yields results of type `a`. The intuition behind
these four operations is as follows:

  * `return` constructs a non-deterministic computation with a single
    result, viz., the one given as argument,

  * `>>=` applies a non-deterministic function to every result of a
    non-deterministic computation and merges the results,

  * `mzero` constructs a non-deterministic computation without
    results, i.e., a failing computation, and

  * `mplus` merges the results of two non-deterministic computations.

We can rewrite the non-deterministic functions for sorting using
monadic combinators:

> suchthat :: MonadPlus m => m a -> (a -> Bool) -> m a
> suchthat xs p = do x <- xs
>                    guard (p x)  -- predefined in Control.Monad
>                    return x
>
> permute :: MonadPlus m => [a] -> m [a]
> permute []     = return []
> permute (x:xs) = do ys <- permute xs
>                     zs <- insert x ys
>                     return zs
>
> insert :: MonadPlus m => a -> [a] -> m [a]
> insert x []     = return [x]
> insert x (y:ys) = return (x:y:ys) `mplus` do zs <- insert x ys
>                                              return (y:zs)

Now, our `sort` function can return results in any monad for
non-determinism. Especially, it can still yield a list of results.

Parallel search with a dumb instance of `MonadPlus`
---

In order to define parallel search algorithms, we define a different
instance of the `MonadPlus` type class that represents the search
space as a tree structure[^tree-monad]. The idea is to simply replace
calls to monadic combinators with constructors. Only `>>=` is
implemented as a function:

[^tree-monad]: A similar tree representation of non-deterministic
search is available on
[Hackage](http://hackage.haskell.org/cgi-bin/hackage-scripts/package/tree-monad).

> data MPlus a = MZero | Return a | MPlus (MPlus a) (MPlus a)
>
> instance Monad MPlus
>  where
>   return = Return
>
>   MZero     >>= _ = MZero
>   Return x  >>= f = f x
>   MPlus l r >>= f = MPlus (l >>= f) (r >>= f)
>
> instance MonadPlus MPlus
>  where
>   mzero = MZero
>   mplus = MPlus

The implementation of `>>=` is borrowed from laws [for
`Monad`][monad-laws] and [for `MonadPlus`][monadplus-laws].

We can now define different search strategies as functions of type
`MPlus a -> [a]`. For example, sequential depth-first search looks as
follows:

> search :: MPlus a -> [a]
> search MZero       = []
> search (Return x)  = [x]
> search (MPlus l r) = search l ++ search r

This function is inspired by the `MonadPlus` instance for `[]`.

Let's count on the promise of semi-explicit parallelism: In the last
rule defining `search`, the results of the right branch `r` of the
search space will probably be needed later and it might be beneficial
to compute them in parallel.

We can express this using the combinator `par` from the module
`Control.Parallel` and define a function for parallel depth-first
search:

> parSearch :: MPlus a -> [a]
> parSearch MZero       = []
> parSearch (Return x)  = [x]
> parSearch (MPlus l r) = rs `par` (parSearch l ++ rs)
>  where rs = parSearch r

This use of `par` forks a so called *spark*, i.e., a computation that
might be chosen by the run-time system for parallel execution. We do
not need to fork a spark for searching the left subtree because
`parSearch l` is evaluated by the call to `++` immediately.

Does this tweak speedup search? Here is a simple benchmark to try it
out.

> main = do n <- liftM (read.head) getArgs
>           mapM_ print (parSearch (sort [1..n]))

We compile this benchmark with GHCs `-threaded` option in order to
get code that runs on multiple cores:

~~~
$ ghc --make -threaded -O2 -o permsort-par speedup-search-with-parallelism.lhs
~~~

With a single core the run-time of `permsort-par 11` is about 20
seconds.

![Multi-core Permutation Sort](multicore-permsort.png)

Using two cores the run-time is reduced to 11 seconds. Twice the cores
gives almost half the run-time. Using three and four cores is again a
little bit faster.

I have measured the run-times using GHC 6.10.1 on an Intel Core2 Quad
CPU with 2.83 GHz providing 1 GB of initial heap. Using a current
snapshot of GHC 6.11 did not show significant differences on a MacBook
with Intel Core2 Duo CPU. This may well be due to the specific nature
of the solved problem: there is a huge search space with a single
solution, so most of the search space is just an expensive failing
computation.

Feel free to try other examples, the `parSearch` function is on
[Hackage].

For feedback contact [Sebastian Fischer][sebf] or comment on [reddit].

[monad-laws]: http://www.haskell.org/haskellwiki/Monad_Laws
[monadplus-laws]: http://www.haskell.org/haskellwiki/MonadPlus
[Hackage]: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/parallel-tree-search
[sebf]: mailto:sebf@informatik.uni-kiel.de
[reddit]: http://www.reddit.com/r/haskell/comments/8aoqh/speedup_search_with_parallelism/
