regexp.README: Source Documentation
The aim of this module is to give a top-down reading plan of the
source code. Each module listed here should start with an overview of
the important exports from the module. We reserve the
Theorem
vernacular for the most important proof from the module.
Lemma is
generally used for local properties of a bunch of function exposed by
the module.
Where to start ?
It is best to start reading from the
regexp.synthesis as:
1. It is the user facing module which is used in actual synthesis.
2. The correctness proof is also given there.
Preliminaries
The basics of languages and regular expressions are covered in the
following modules.
Let
B be a finite set of symbols. In order to support
generalized
regular expressions over
B, i.e. expressions whose atoms are
predicates
B → bool rather than elements of
B, we need to consider
languages over
{set B}. We call such an element a world over B. We
develop the language over worlds and the associated notion of
thickening of a language in the following module.
The three layered proof.
The overall correctness depends on the correctness of the following
two steps.
1. The regular expression to NFA construction
2. The Simulation of the constructed automata on
thickened strings.
We carry out the proof in three layers. the
abstract, the
concrete, and the
synthesis layers. The module hierararchy reflect
these layers.
Automata construction
Given a regular expression constructing the corresponding automata is
the first important step. This algorithm is implemented in the
abstract layer and the concrete layer. The synthesis layer does not
need this construction as the hardware is only meant to simulate the
NFA and not construct it.
Automata simulation
Automata simulation is defined in the modules below. For the synthesis
layer the simulation function is defined in the automata module
itself.
Reification of finite type and sets
The concrete layer reifies the finite types and finite sets on those
types from the
mathcomp library into concrete representations.
Semantics via the denote family of functions
Each interesting data type in the concrete layer has an associated
denote function that maps it into the abstract layer. There are also
spec-lemmas, i.e. lemmas whose name are prefixed by the word
spec
which proves the connection of an operation like say intersection to
the corresponding operation in the abstract layer.
Synthesis of finite types and sets
The synthesis layer has most of the data structures of concrete layer
but at the clash level.
Synthesis via syn family of functions.
The
syn functions capture the synthesis of a concrete layer data
type to its corresponding synthesis layer object. The
spec lemmas in
this layer connect the operation in the synthesis layer to that of the
corresponding concrete layer.
Axioms in the synthesis layers.
The
clash.universe postulates type level nats and the
clash.function module postulates clash level finite functions (as
clash vectors) via the
Axiom vernacular. Only those which are
absolutely necessary are declared thus. We also give explicit
extraction instructions for these functions.
Besides the functions defined above we need the following axioms that
connect functions like
map to the associated function in the
concrete layer. These portions should be considered as the trusted
kernel of our library and hence are deliberately kept small. All other
properties in say
clash.subset etc for example are proved from this
small kernel.