regexp.README: Source Documentation

The aim of this module is to give a top-down reading plan of the source code. Each module listed here should start with an overview of the important exports from the module. We reserve the Theorem vernacular for the most important proof from the module. Lemma is generally used for local properties of a bunch of function exposed by the module.

Where to start ?

It is best to start reading from the regexp.synthesis as:
1. It is the user facing module which is used in actual synthesis.
2. The correctness proof is also given there.

Require regexp.synthesis.

Preliminaries

The basics of languages and regular expressions are covered in the following modules.

Require regexp.lang.
Require regexp.expr.

Let B be a finite set of symbols. In order to support generalized regular expressions over B, i.e. expressions whose atoms are predicates B bool rather than elements of B, we need to consider languages over {set B}. We call such an element a world over B. We develop the language over worlds and the associated notion of thickening of a language in the following module.

Require Import regexp.lang.world.

The three layered proof.

The overall correctness depends on the correctness of the following two steps.
1. The regular expression to NFA construction
2. The Simulation of the constructed automata on thickened strings.
We carry out the proof in three layers. the abstract, the concrete, and the synthesis layers. The module hierararchy reflect these layers.

Automata construction

Given a regular expression constructing the corresponding automata is the first important step. This algorithm is implemented in the abstract layer and the concrete layer. The synthesis layer does not need this construction as the hardware is only meant to simulate the NFA and not construct it.

Automata simulation

Automata simulation is defined in the modules below. For the synthesis layer the simulation function is defined in the automata module itself.

Reification of finite type and sets

The concrete layer reifies the finite types and finite sets on those types from the mathcomp library into concrete representations.

Semantics via the denote family of functions

Each interesting data type in the concrete layer has an associated denote function that maps it into the abstract layer. There are also spec-lemmas, i.e. lemmas whose name are prefixed by the word spec which proves the connection of an operation like say intersection to the corresponding operation in the abstract layer.

Synthesis of finite types and sets

The synthesis layer has most of the data structures of concrete layer but at the clash level.

Synthesis via syn family of functions.

The syn functions capture the synthesis of a concrete layer data type to its corresponding synthesis layer object. The spec lemmas in this layer connect the operation in the synthesis layer to that of the corresponding concrete layer.

Axioms in the synthesis layers.

The clash.universe postulates type level nats and the clash.function module postulates clash level finite functions (as clash vectors) via the Axiom vernacular. Only those which are absolutely necessary are declared thus. We also give explicit extraction instructions for these functions.
Besides the functions defined above we need the following axioms that connect functions like map to the associated function in the concrete layer. These portions should be considered as the trusted kernel of our library and hence are deliberately kept small. All other properties in say clash.subset etc for example are proved from this small kernel.