Pattern Minimization

Sebastian recommended an ICFP paper to me which is called “Pattern Minimization Problems over Recursive Data Types”. Some time ago I wrote a Curry program with non-trivial case pattern matching. This program transformed a relation algebraic expression into another one. There were quite some cases I had to distinguish and I were too lazy to write non-overlapping patterns. Therefore I used a case expression with its top to bottom behaviour. Unfortunately compiling the module took ages and I asked myself how this could be. I discovered that the flatcurry file which was generated was really big. The paper mentioned above provided me some insights about this problem at least I was not aware of. The frontend that is used by PAKCS and kics transforms a case pattern matching into a pattern matching without overlapping patterns. This is exactly the problem the paper is concerned with.

First of all the naive approach which is used by the frontend can result in an exponential blowup. Consider the following data Type data Ex = A | B | C and the following rules.

(A,_,_)
(_,A,_)
(_,_,A)
(_,_,_)

The last pattern matches all triples that do not contain an A. To express this we need an exponential number of patterns with respect to the number of constructors of Ex. This is the reason that compilation of my module performed so badly. I am quite surprised that this problem apparently never occurred before.

There are examples where the naive approach results in an exponential blowup but the optimal solution prevents this blowup. The paper presents an algorithm based on minimization of propositional DNF formulas. This problem (as well as the pattern minimization problem) is <latex>\Sigma^P_2</latex>-complete. The author states that this is “one level up” from NP. He states that the algorithm nevertheless performs well because the instances are very small. There is also a remark in the paper that the problem is “only” NP-hard if you generate a case cascade and not a set of rules. I am not sure what this implies for the problem at hand.

Conclusion

If it is possible I would suggest to modify the backends so they can handle top to bottom pattern matching. I assume that the transformation which is performed by the frontend is used due to historical reasons. I think it would be possible for kics to handle top to bottom pattern matching. If it is not possible for PAKCS it would be useful to add a switch for the transformation.

This reveals the power of top to bottom pattern matching. For example if you want to implement the case distinction for the data type Ex by a rule based function in Curry you have to define exponentially many rules. This is quite a burden. Perhaps we could use complement pattern to solve this problem. In this case the case distinction of Ex can be defined as follows.

( A, _, _)
(-A, A, _)
(-A,-A, A)
(-A,-A,-A)

I am not sure whether we end in the same pitfall because checking whether rules are overlapping is very expensive in this case. But perhaps this would be a solution which combines the nice pattern matching of Curry with the expressiveness of top to bottom pattern matching. Curry pattern matching with complement is not as expressive as top down pattern matching with underscore. Consider a data type data More = A | B | C | D.

A
B
_

To express the underscore we need a conjunction or a disjunction.