using automated testing to find an unsatisfiable boolean formula that is not easily detected as such by a simple SAT solver

import System ( getArgs )

The other day I wrote a simple, incremental solver for boolean satisfiability.

import Data.Boolean.SatSolver

Today I want to search for a smallest unsatifiable boolean formula that this solver only identifies as such after backtracking. I'll use SmallCheck to find one.

import Test.SmallCheck

The idea is strikingly simple: state that there is no such formula and SmallCheck will shatter this claim showing you the smallest counter example.

audaciousClaim :: Boolean -> Property
audaciousClaim formula = null (solver >>= solve) ==> null solver
where solver = assertTrue formula newSatSolver

In English: If we cannot find a solution by solving the formula, then asserting the formula without solving afterwards must already fail.

The SAT solver performs deterministic simplification steps when asserting formulas that might detect unsatisfiability without backtracking. The operation solve searches for solutions non-deterministically and there might be formulas whose unsatisfiability can only be detected via unsuccessful search.

To enable SmallCheck to find such a formula, we need to inform it about the type of boolean formulas.

instance Serial Boolean
series = cons1 Var
\/ cons0 Yes
\/ cons0 No
\/ cons1 Not
\/ cons2 (:&&:)
\/ cons2 (:||:)

coseries = undefined

Now, SmallCheck is ready to go:

shatterClaim :: IO ()
shatterClaim = smallCheckI audaciousClaim

The function smallCheckI will interactively check the claim for boolean formulas of increasing depth. I am running this function while I'm writing this paragraph -- it's searching at depth four. Here are the results for all depths up to three:

Depth 0:
  Completed 2 test(s) without failure.
  But 1 did not meet ==> condition.
  Deeper? y
Depth 1:
  Completed 13 test(s) without failure.
  But 7 did not meet ==> condition.
  Deeper? y
Depth 2:
  Completed 356 test(s) without failure.
  But 193 did not meet ==> condition.
  Deeper? y
Depth 3:
  Completed 253835 test(s) without failure.
  But 137561 did not meet ==> condition.
  Deeper? y

From 253835 formulas at depth three 137561 are satisfiable, thats a bit more than half of them. All of the remaining 116274 unsatisfiable formulas are detected as such without backtracking.

SmallChek is running in GHCi now for 15 minutes, still at depth four reaching three million tests. Let's see whether compiled code is faster.

main :: IO ()
main = getArgs >>= flip depthCheck audaciousClaim . read . head

Bingo! Compiling this with -O2 and passing depth four when calling the program yields a counter example in an instant.

Depth 4:
  Failed test no. 6600270. Test values follow.
  Not (Var 0 :&&: Var 0) :&&: (Var (-1) :&&: (Var 0 :||: Var 0))

Hence, ¬(x ∧ x) ∧ y ∧ (x ∨ x) is a smallest formula that is unsatisfiable and not detected as such by my solver.

That is a bit weird because both x ∧ x and x ∨ x can be easily simplified. Let's be suspicious and check the simplified formula ¬x ∧ y ∧ x by hand. We use -1 as encoding for y like in the counter example although that should not matter.

 > assertTrue (Not (Var 0) :&&: Var (-1) :&&: Var 0) newSatSolver

Indeed, no backtracking is necessary.

The solver works on conjunctive normal forms (CNF) of boolean formulas. The CNF of the counterexample is x ∨ ¬x) ∧ y ∧ (x ∨ x). Seems like I can easily improve the solver by eliminating duplicate literals from clauses. I'll send an update to Hackage.

After the fix the previously found conter example is none anymore. The (compiled) search at depth four does not yield a counter example within one hour.

Instead of generating arbitrary boolean formulas, let's directly create only conjunctive normal forms. In order to do so, we define a newtype for CNFs and a custom Serial instance.

newtype CNF = CNF { clauses :: [Clause] }
newtype Clause = Clause { literals :: [Literal] }
newtype Literal = Literal { literal :: Boolean }
instance Serial CNF
series = map CNF . series
coseries = undefined
instance Serial Clause
series = map Clause . series
coseries = undefined
instance Serial Literal
series depth = map Literal $
concat [ [Var x, Not (Var x)] | x <- series (depth-1) ]

coseries = undefined

We convert CNFs to feed them to the solver.

cnfFormula :: CNF -> Boolean
cnfFormula = foldr (:&&:) Yes
. map (foldr (:||:) No . map literal . literals)
. clauses

instance Show CNF where show = show . cnfFormula

We restate our claim as property on conjunctive normal forms in order to check only those.

cnfClaim :: CNF -> Property
cnfClaim cnf = null (solver >>= solve) ==> null solver
where solver = assertTrue (cnfFormula cnf) newSatSolver

Let's try to shatter this one and redefine the main function.

main :: IO ()
main = getArgs >>= flip smallCheck cnfClaim . read . head

Waiting for SmallCheck to generate counter examples leaves enough time to come up with one by hand: (x ∨ y) ∧ (x ∨ ¬y) ∧ (¬x ∨ y) ∧ (¬x ∨ ¬y) is unsatisfiable but is not detected as such by my solver without guessing. Probably it is the smallest formula with this property.. Does SmallCheck find it? Or an even smaller one?

At depth six the are still no counter examples:

Depth 6:
  Completed 97589351 test(s) without failure.
  But 25183775 did not meet ==> condition.

Searching the almost 0.1 billion candidates took less than 10 minutes on my machine. At depth seven, SmallCheck eats all my memory. Sadly.

As we know, that we only need two variables, we can alter the Serial instance for literals accordingly:

instance Serial Literal
series _ = map Literal $
concat [ [Var x, Not (Var x)] | x <- [0,1] ]

coseries = undefined

There are still no conter examples until depth four..

Depth 0:
  Completed 1 test(s) without failure.
  But 1 did not meet ==> condition.
Depth 1:
  Completed 2 test(s) without failure.
  But 1 did not meet ==> condition.
Depth 2:
  Completed 11 test(s) without failure.
  But 5 did not meet ==> condition.
Depth 3:
  Completed 232 test(s) without failure.
  But 93 did not meet ==> condition.
Depth 4:
  Completed 19721 test(s) without failure.
  But 7221 did not meet ==> condition.

Hey, none at depth five either:

Depth 5:
  Completed 6724862 test(s) without failure.
  But 2342549 did not meet ==> condition.

I am eagerly waiting for the search on depth six to reveal one.

Depth 6:


  Failed test no. 47219388. Test values follow.
  (Var 0 :||: (Var 0 :||: (Var 0 :||: (Var 0 :||: (Var 1 :||: No))))) :&&: ((Var 0 :||: (Var 0 :||: (Var 0 :||: (Not (Var 1) :||: No)))) :&&: ((Not (Var 0) :||: (Not (Var 0) :||: (Var 1 :||: No))) :&&: ((Not (Var 0) :||: (Not (Var 1) :||: No)) :&&: Yes)))

There it is! It looks larger than the one we created by hand but it only adds redundant literals. If we remove them, the formular reads (x ∨ y) ∧ (x ∨ ¬y) ∧ (¬x ∨ y) ∧ (¬x ∨ ¬y), exactly the formula we envisioned earlier.

Why do we get all these redundant literals? Apparently, according to SmallCheck the counter example has the same depth as the simplified formula and happens to be found earlier in a depth-first traversal.

What I learned

Searching for a smallest unsatisfiable formula where my solver needs backtracking to detect its unsatisfiability I opted for SmallCheck to automate the search. It pointed out a missed opportunity for simplification in my solver where deterministic simplification was not possible only because of duplicated literals in clauses. However, waiting for counter examples to be generated took so long that there was enough time to come up with the desired formula by hand. At least, this formula was confirmed by SmallCheck after improving the generation of candidate formulas manually (but only after recognizing that two variables are sufficient).