After my talk at FLOPS 2008, Chung-chieh Shan 單中杰 pointed out that — because test-case generation is a search problem — heuristic search techniques from AI could be applied to improve the usual-case performance of test-case generation. Specifically, he pointed me at Adaptive Tree Search (ATS) which aims at finding good leaves in a search tree early by guiding the search based on the quality of leaves found earlier.

Although the techniques that are used in [ATS] are not directly applicable to test-case generation, understanding them, lead me to new ideas for guiding search. The described method is targeted at search trees that correspond to combinatorial problems, i.e., the trees are of limited depth and each choice on the same level of the tree corresponds to the same choice in the original problem. When generating test cases, search trees are usually infinite and different choices are independent even on the same level of the tree. Nevertheless, adaptive search can be applied to this more general class of search trees. Only, the technique of estimating the quality of expected new leaves needs to be different.

How can we measure the quality of test cases? The test-case generator presented in my PPDP’07 and ICFP’08 papers, uses code coverage information in order to eliminate redundant test cases. Code coverage allows to compare different test cases w.r.t. execution details and we can employ it to evaluate generated tests. The idea is simple: test cases that cause new code coverage are good.

Initially, coverage information is only attached to test cases and they are stored in the leaves of a search tree. In order to estimate the quality of test cases in a (sub) tree, we need to associate coverage information also to inner nodes. If we associate to each inner node the coverage of all test cases in the corresponding sub tree found so far, the coverage information associated to the inner nodes of the tree will constantly adapt to the newly found test cases. The search algorithm can decide which subtrees to prefer based on the coverage information that is stored with each inner node.

Instead of simply monitoring the coverage of test cases for each sub tree, we can additionally attach more sophisticated information to outgoing edges of inner nodes. For example, a progress list could indicate how much new coverage has been achieved with each new test case found in the corresponding sub tree. Or the ratio of new and redundant coverage information would denote the degree of novelty of previously found test cases in a sub tree.

In order to implement adaptive search on a tree where edges are labeled with the ratio of new and redundant coverage information found in the corresponding sub tree, we can use the following datatype for trees:

  type Coverage = [Int]
  type Ratio = (Int,Int)

  data CoverTree a
    = Leaf (a,Coverage)
    | Branch Coverage [(Ratio,CoverTree a)]

For ease of presentation, we assume that coverage information is represented as list of numbers. Curry’s search trees can be transformed into CoverTrees as follows:

  coverTree :: SearchTree (a,Coverage) → CoverTree a
  coverTree Fail = Branch [] []
  coverTree (Value x) = Leaf x
  coverTree (Choice ts) =
    Branch [] (zip (repeat (1,1)) (map coverTree ts))

Initially, the ratio of new and old coverage is 1 and no coverage is attached to inner nodes. To handle infinite search trees, we can use an iterative approach: limit the number of nodes visited in each iteration and increase this limit from one iteration to the next:

  adaptiveSearch :: SearchTree (a,Coverage) → [(a,Coverage)]
  adaptiveSearch = iter 1 . coverTree

  iter :: Int → CoverTree a &rarr [(a,Coverage)]
  iter l t = xs ++ maybe [] (iter (2*l)) mt
   where (xs,_,mt) = search l t

Here, the limit is doubled after each iteration. The function search has the following type:

  search :: Int → CoverTree a
          → ([(a,Coverage)],Int,Maybe (CoverTree a))

It takes a limit of nodes to visit and a tree and yields a list of visited leaves, a number of nodes that would have been allowed to visit but weren’t, and an optional tree to be searched in the next iteration. If the tree to be searched is empty, the limit might not need to be exhausted.

The implementation of search is given in the file AdaptiveSearch.curry. The idea is to distribute the given limit to the subtrees of inner nodes based on the degree of novelty indicated by the ratio attached to the corresponding edges. Furthermore, this ratio and the coverage attached to inner nodes is updated for every found leaf of the tree.

First experiments are quite promising. As simple test, we can generate a search tree with unbalanced coverage information and compare the results of enumerating some leaves using breadth-first search and the presented adaptive search. For example, we can compare both enumerations using the tree generated by the following function:

  exampleTree :: Coverage → SearchTree (Int,Coverage)
  exampleTree cov@(n:_) =
    Choice [exampleTree cov
           ,Value (n,cov)
           ,exampleTree (n+1:cov)]

In right branches, additional coverage is introduced by incrementing the largest number in the coverage list. Leaves are labeled with this largest number and coverage information. If we enumerate the first 1000 leaves in breadth-first or adaptive order, we can compare, how many leaves with new coverage are found. The following table shows for each leaf label, how many leaves with this label were among the first 1000:

leaf label breadth first adaptive
0 10 5
1 45 14
2 120 25
3 210 36
4 252 44
5 209 52
6 113 68
7 36 90
8 5 109
9 0 118
10 0 122
11 0 118
12 0 97
13 0 61
14 0 29
15 0 10
16 0 2

Adaptive search finds leaves labeled with 16 while the largest label found with breadth-first search is 8. The current implementation is merely a proof of concept and it remains to investigate how the presented search algorithms performs on real search trees for generating test cases — both w.r.t. test quality and run-time or memory performance.