regexp.concrete.rule: String matching rules

Overview

Rule: The rules for matching the string. The rule type should be seen as the AST for specifing regular expressions by the user.
language rls: For a given set of rules rls, one can think of rls as a generalized regular expression given by the type expr.matcher.t. This denotes the language of the associated regular expression. The user means this language.
interpretation: This is the main theorem of this module.

String matching rules

A matching rule for strings over an input I is essentially a regexp matcher for I but expressed in the language of finite sorts. Instead of directly providing a regular expression over atoms of type I → bool, a rule is a record of

A finite sort letter
An atomTable that is a map from the finite letters to boolean predicates on I
A regular expression over the finite letters.

Note that there is no loss of generality here because any matcher regexp, given by element of type expr.matcher.t I can be refactored thus because there are only finitely many atoms occurring in a given regular expression.

Record Rule {I} := { letter : sort;
                     atomTable : (letter ~> (I → bool ))%con;
                     regexp : expr.t (universe.typeOf letter)
  }.

Arguments Rule : clear implicits.

The associated language

The language associated with a set of rules is given by defining the associated matcher and then defining the language associated with the matcher.

Section Cxt.
Context {s : sort}{I : Type}.

The atom table which gives a map from the finite sort s to boolean test functions on the input I

Variable aT : (s ~> (I → bool ))%con.

The matcher associated with an expression language

Definition matcher_of : expr.t s → expr.matcher.t I := expr.map aT.

The language over I corresponding to the matcher

Definition mLanguage : expr.t s → lang I := expr.matcher.language \o matcher_of.

End Cxt.

The language that the user means when giving a set of rules

Definition language {I} (rls : Rule I) : lang I := mLanguage (atomTable rls) (regexp rls).

Interpreting the input

Inside the hardware the input is encoded as a finite set by. The language matched by the hardware is therefore on the alphabet which is the finite set over the sort s and is given by the function wLanguage.

Given an atom table we can encode input letters as finite subsets over the sort s below using the encode function. The above interpretation of input alphabet can be seen as a language over I by making use of the atom table as given below

Definition wLanguage {s : sort}: expr.t s → lang {set universe.typeOf s} := thicken \o expr.language.
Section Cxt.
  Context {s : sort}{I : Type}.
  Variable aT : (s ~> (I → bool ))%con.

  Definition encode (i : I) := function.map (fun tst ⇒ tst i) aT.

Interpreting the language over subsets as language over the input

Definition interpret : lang {set universe.typeOf s} → lang I := lang.comap encode \o lang.comap subset.denote.

We have the following language equivalence

  Lemma equivalence : ∀ re : expr.t s, interpret (wLanguage re) ≡ mLanguage aT re.
    rewrite /interpret /wLanguage /mLanguage /language /matcher.language /comp /expr.language.
    elim ⇒ [ | x | re1 IH1 re2 IH2 | re1 IH1 re2 IH2 |re IH].
    - rewrite thicken_epsilon ?comap_epsilon //=.
    - rewrite //= thicken_singleton ?comap_satisfy; apply /satisfy_ext ⇒ u.
      by rewrite /comp /encode denoteE to_map.
    - by rewrite //= thicken_alt ?comap_alt IH1 IH2.
    - by rewrite //= thicken_concat ?comap_concat IH1 IH2.
    - by rewrite //= thicken_many1 ?comap_many1 IH.
  Qed.
End Cxt.

The interpreted language got by interpreting the input elements as subsets

Definition iLanguage {I} (rls : Rule I) : lang I := interpret (atomTable rls) (wLanguage (regexp rls)).

Theorem interpretation {I}(rls : Rule I) :
(language rls ≡ iLanguage rls)%lan.
case rls ⇒ s aT re; by rewrite /iLanguage /language equivalence.
Qed.