Table of Contents

Implementing Least-Strict Natural Numbers

Introduction

On this page I want to summarize some insights about the behaviour of natural number implementations (in respect to non-strictness). I have gained these by using a tool called StrictCheck. StrictCheck was developed by Olaf Chitil and is supposed to test whether a function is unnecessarily strict. In the long term my goal is to improve the suggestions made by StrictCheck because in some cases StrictCheck is too ambitious. More Information about the development of StrictCheck can be found on a separate page. This page is supposed to be some kind of motivation for further development of StrictCheck. It shows that StrictCheck can be used to gain surprising insights about functions. For example the multiplication of Peano numbers like it is implemented naively is not least-strict. Furthermore the examples should demonstrate that StrictCheck is especially useful in the area of functional logic programming.

This work also initiated an implementation of natural numbers by a binary representation in Haskell. This implementation is similar to the implementation presented in Section “Binary Numbers”. This implementation is efficient in comparison to Peano numbers and is still less strict than the external numbers. A cabal package can be found at hackageDB.

The idea is to identify functions that are not least-strict. A function is not least-strict if we can alter it such that it behaves equally for total values but yields a more defined result for a partial value. In a functional logic programming language it is very important that functions are least-strict because least-strict functions result in the smallest possible search space when we use free variables.

Peano Numbers

We start with an implementation of Peano numbers.

data Peano = Z | S Peano

We will see that even for this trivial data type a naive implementation is not least-strict.

Multiplication

We implement a addition and on the basis of this a multiplication for Peano numbers.

(+) :: Peano -> Peano -> Peano
Z   + y = y
S x + y = S (x + y)

(*) :: Peano -> Peano -> Peano
Z   * _ = Z
S x * y = y + (x * y)

The following example is supposed to demonstrate that even the implementation of simple Peano arithmetic is not least strict if we are not careful. The Curry function check takes a number n and checks whether there is another number m such that m * n = 1.

check :: Peano -> Peano 
check n | m * n == S Z = m
  where m free

This function guesses a Peano number m and checks whether it yields 1 when multiplied by n. If we apply check to Z the function does not terminate. If we swap the arguments of (*) in check it yields no more solutions because there is no natural number m with the property 0 * m = 1. But there is also no number m with the property m * 0 = 1.

Although least-strictness is more important in a functional logic language the multiplication shows up an undesirable behaviour in a functional programming language, too. For example we can define the infinite Peano number as follows.

infinity :: Peano
infinity = S infinity

While the evaluation of Z * infinity yields Z the evaluation of infinity * Z does not terminiate.

So, let's have a closer look at the behaviour of (*). We use StrictCheck to check whether (*) is least-strict.

Failed:  f _|_         Z = _|_    but should be Z
Failed:  f (S _|_)     Z = _|_    but should be Z
Failed:  f (S (S _|_)) Z = _|_    but should be Z

Note that we can check only a finite number of partial values and there are more examples where (*) is not least-strict. These results state that an application of (*) to _|_ and Z yields _|_ but should yield Z. Sadly this is not possible (at least in Haskell) because StrictCheck also demands that Z * _|_ yields 'Z'. But also in the context of Curry we are only interested in deterministic functions. We do not want to introduce non-determinism if it is not necessary. Therefore we would prefer StrictCheck to take this into account. I want to address exactly this problem in the future.

So, let's take a look at the second counter example and check whether we can satisfy this one. The operation (+) evaluates its first argument to HNF to yield a HNF. Therefore the expression y + (x * y) demands the evaluation of y to yield a HNF. The recursive call to (*) demands the evaluation of its first argument. Therefore it would be cleverer to switch the arguments of the recursive call of (*).

(*) :: Peano -> Peano -> Peano
Z   * _ = Z
S x * y = y + (y * x)

We check this new implementation with StrictCheck.

Failed:  f _|_ Z = _|_  but should be Z

That is the other two cases are least-strict now. Furthermore now the evaluation of check Z yields no more solutions and infinity * Z yields Z.

Comparison

We want to take a look at the implementation of comparison functions for Peano numbers now. First we define a comparison of two Peano numbers which yields a value of type Ordering.

compare :: Peano -> Peano -> Ordering
compare Z     Z     = EQ
compare Z     (S _) = LT
compare (S _) Z     = GT
compare (S x) (S y) = compare x y

On basis of this function we can define (<), (<=), (>) and (>=). For example (>) is defined as follows.

(>) :: Peano -> Peano -> Bool
x > y = compare x y == GT

In Haskell these functions belong to the type class Ord. It is sufficient to only define compare. In this case (<) is exactly defined as stated above. So let's check (<) for least-strictness.

Failed:  f _|_ Z = _|_          but should be False
Failed:  f (S _|_) (S Z) = _|_  but should be False

Well, the first case sounds reasonable. If we check whether anything is smaller than Z we know this is false. If we take a look at compare we see that we preform pattern matching on both arguments. But we do not have to. If we would define (>) without using compare it would look like the following.

(>) :: Peano -> Peano -> Bool
Z   > _   = False
S _ > Z   = True
S x > S y = x > y 

This function is least-strict. It looks very similar to compare but we merge the first two cases of compare. Because we do not want to define all the comparison functions this way we define a special version of compare called compareLT. This function yields LT instead of EQ in case the two arguments are equal.

compareLT :: Peano -> Peano -> Ordering
compareLT Z     _     = LT
compareLT (S _) Z     = GT
compareLT (S x) (S y) = compareLT x y

Now we can define a least-strict version of all the comparison functions on basis of compareLT.

x > y = compareLT x y == GT
x >= y = compareLT y x == LT
x < y = compareLT y x == GT
x <= y = compareLT x y == LT

Binary Numbers

Peano numbers are nice if you want to use narrowing but they are very inefficient if you want to do arithmetic. Therefore Curry provides an implementation of natural numbers by a binary representation. The implementation in Curry differs slightly from the one presented here.

data Nat1 = IHi | O Nat | I Nat

On the basis of this data type we implement natural numbers including zero (which nobody really likes but apparently everybodty needs ;) ).

data Nat = Zero | Pos Nat1

On the basis of this data type we implement signed integers.

data Int = Neg Nat1 | Pos0 Nat

Multiplication

The multiplication of two binary numbers can be implemented as follows.

(*) :: Nat1 -> Nat1 -> Nat1
IHi * y = y
I x * y = y + O (x * y) 
O x * y = O (x * y)

This implementation has a similar drawback (with respect to least-strictness) as the implementation of the multiplication of Peano numbers. The addition evaluates both its arguments to HNF to yield a HNF. Therefore it is better with respect to least-strictness to swap the arguments of the recursive call of (*). The following implementation is least-strict.

(*) :: Nat1 -> Nat1 -> Nat1
IHi * y = y
I x * y = y + O (y * x) 
O x * y = O (x * y)

Note that it is not a good idea to swap the second recursive call of (*). Note furthermore that this implementation is not the canonical one and it is very difficult to see whether this function is least-strict. Therefore a supporting tool would be very valuable.

Minus

(-) :: Nat1 -> Nat1 -> Int
IHi - y   = inc (Neg y)          
O x - IHi = Pos (pred (O x))    
I x - IHi = Pos (O x)
O x - O y = o (x - y)
O x - I y = dec (o (x - y))
I x - O y = inc (o (x - y))
I x - I y = o (x - y)     

o :: Int -> Int
o (Neg n)  = Neg (O n)
o (Pos0 n) = 
  Pos0 (case n of
             Zero  -> Zero
             Pos m -> Pos (O m))
(-) :: Nat1 -> Nat1 -> Int
x - y = 
  case minus x y of
    Pos0 n  -> Pos0 n
    Neg IHi -> zero
    Neg n   -> Neg (pred n)

minus :: Nat1 -> Nat1 -> Int
minus IHi     y     = Neg y
minus x@(O _) IHi   = pos (pred x)
minus (I x)   IHi   = pos (O x)
minus (O x)   (O y) = incNeg (o (minus x y))
minus (I x)   (I y) = incNeg (o (minus x y))
minus (O x)   (I y) = decNeg (o (minus x y))
minus (I x)   (O y) = decNeg (o (negate (minus y x)))
 
incNeg :: Int -> Int
incNeg (Neg n)  = Neg (pred n)
incNeg (Pos0 n) = Pos0 n

decNeg :: Int -> Int 
decNeg (Neg n)  = Neg n
decNeg (Pos0 n) = Pos0 (pred n)

Division

A simple implementation of the division of two natural numbers is implemented as following.

divmodNat :: Nat -> Nat -> (Nat,Nat)
divmodNat _ Zero = error "divide by zero"
divmodNat Zero _ = (Zero,Zero)
divmodNat (Pos x) (Pos y) = divmodNat1 x y

divmodNat1 :: Nat1 -> Nat1 -> (Nat,Nat)
divmodNat1 x y
  | x<y       = (Zero,Pos x)
  | otherwise = (succ q,r)
 where
  (q,r) = divmodNat (Pos x-Pos y) (Pos y)

You can implement the test whether a number is even or not on basis of divmodNat. In Haskell you have actually no choice as the function even is implemented as following by demanding its argument to be a member of the type class Integral. For natural numbers mod and rem are the same.

even :: Nat -> Bool
even n = n `rem` Pos (O IHi) == Zero

Note that it is furthermore not possible to define 'even for the data type Nat in Haskell as it does not support a zero.

Let's test whether a number is even or not. In our representation we can check this by simply taking a look at the first bit. But with the above implementation the expression even (Pos (O _|_)) is evaluated to _|_. But we improve this implementation as follows.

divmodNat :: Nat -> Nat -> (Int,Int)
divmodNat x     IHi   = (Pos x,Zero)
divmodNat (O x) (O y) = (q,o r) 
 where
  (q,r) = divmodNat x y
divmodNat (I x) (O y) = (q,i r) 
 where
  (q,r) = divmodNat x y
divmodNat x y
  | x<y       = (Zero,Pos x)
  | otherwise = (succ q,r)
 where
  (q,r) = divmodNat (Pos x-Pos y) (Pos y)

o :: Int -> Int
o Zero    = Zero
o (Pos n) = Pos (O n)

i :: Int -> Int
i Zero    = Pos IHi
i (Pos n) = Pos (I n)

When we now check even (Pos (O _|_)) we get True as we have expected.

Comparison

The following shows a canonical way of implementing the compare function for the Nat data type.

compare :: Nat -> Nat -> Ordering
compare x y = cmpNat EQ x y

cmpNat :: Ordering -> Nat -> Nat -> Ordering
cmpNat eq IHi IHi = eq
cmpNat _  IHi (O _) = LT
cmpNat _  IHi (I _) = LT
cmpNat _  (O _) IHi = GT
cmpNat _  (I _) IHi = GT
cmpNat eq (O x) (O y) = cmpNat eq x y
cmpNat eq (I x) (I y) = cmpNat eq x y
cmpNat _  (O x) (I y) = cmpNat LT x y
cmpNat _  (I x) (O y) = cmpNat GT x y

Instead of using a case distinction on the result of the recursive call it uses an extra parameter. When we check this function using StrictCheck we get the following result.

Failed:  f (2*_|_+1) (2*1) = _|_        but should be GT
Failed:  f (2*1) (2*_|_+1) = _|_        but should be LT

Are these counter examples satisfiable? Let's take a look at more counter examples.

Failed:  f (2*(2*_|_+1)) (2*(2*1)) = _|_        but should be GT
Failed:  f (2*(2*_|_+1)) (2*(2*1)+1) = _|_      but should be GT

These all have in common that you do not have to evaluate the second argument if the other-one is IHi. And you do not have to because you know from a less significant bit that one of the two numbers is less than the other. In the definition of compare if the first argument is IHi we check the other one. But we do not have to if the numbers differ in one of the previous bits. Therefore we define a function compareLT like we did for Peano numbers which yields LT in the case that its arguments are equal. This way we can merge the first three cases of the function and do not have to pattern match on the second argument. Instead of defining a similar function compareGT we use a function invOrd which changes LT to GT and LT to GT.

compareLT :: Nat1 -> Nat1 -> Ordering
comapreLT IHi _     = LT
compareLT (O _) IHi = GT
compareLT (I _) IHi = GT
compareLT (O x) (O y) = compareLT x y
compareLT (I x) (I y) = compareLT x y
compareLT (O x) (I y) = compareLT x y
compareLT (I x) (O y) = invOrd (compareLT y x)

On basis of compareLT we define compare.

compare :: Nat1 -> Nat1 -> Ordering
compare IHi IHi   = EQ
compare IHi (O _) = LT
compare IHi (I _) = LT
compare (O _) IHi = GT
compare (I _) IHi = GT
compare (O x) (O y) = compare x y
compare (I x) (I y) = compare x y
compare (O x) (I y) = compareLT x y
compare (I x) (O y) = invOrd (compareLT y x)
 

We can reduce the code duplication by introducing a functional parameter. When we check the new implementation of compare there are no more counterexamples. Another way of implementing this is using minus and checking the sign of the result.

Performance

The following Curry function can be used to generate Pythagorean triples.

pyt :: (Nat,Nat,Nat)
pyt | a*a+b*b==c*c = (a,b,c)
  where a,b,c free

testPyt n = take n (allValuesB (searchTree pyt))
no of triples old implementation lest-strict implementation
100 3.2s 1.2s
200 21.8s 3.9s
300 46.2s 13.8s

Conclusion

If you want to define a function which is as non-strict as possible you have to invest brain power. Furthermore it is very difficult to reason about the non-strictness of a function. Even for simple functions like the multiplication of Peano numbers the canonical implementation is not least-strict. There are good reasons to be least-strict in a functional programming language (for example infinite data structures like infinity). But there are even better reasons to be least-strict in a functional logic programming language.

Right now you have to interpret the results of StrictCheck which is not easy. Therefore we have to improve the tool to make applicable for everybody.

Jan Christiansen