regexp.concrete.rule: String matching rules


Overview

  • Rule: The rules for matching the string. The rule type should be seen as the AST for specifing regular expressions by the user.
  • language rls: For a given set of rules rls, one can think of rls as a generalized regular expression given by the type expr.matcher.t. This denotes the language of the associated regular expression. The user means this language.
  • interpretation: This is the main theorem of this module.

String matching rules

A matching rule for strings over an input I is essentially a regexp matcher for I but expressed in the language of finite sorts. Instead of directly providing a regular expression over atoms of type I bool, a rule is a record of
  • A finite sort letter
  • An atomTable that is a map from the finite letters to boolean predicates on I
  • A regular expression over the finite letters.
Note that there is no loss of generality here because any matcher regexp, given by element of type expr.matcher.t I can be refactored thus because there are only finitely many atoms occurring in a given regular expression.

Record Rule {I} := { letter : sort;
                     atomTable : (letter ~> (I bool))%con;
                     regexp : expr.t (universe.typeOf letter)
  }.

Arguments Rule : clear implicits.

The associated language

The language associated with a set of rules is given by defining the associated matcher and then defining the language associated with the matcher.

Section Cxt.
  Context {s : sort}{I : Type}.

The atom table which gives a map from the finite sort s to boolean test functions on the input I
  Variable aT : (s ~> (I bool))%con.

The matcher associated with an expression language
  Definition matcher_of : expr.t s expr.matcher.t I := expr.map aT.

The language over I corresponding to the matcher
  Definition mLanguage : expr.t s lang I := expr.matcher.language \o matcher_of.

End Cxt.

The language that the user means when giving a set of rules
Definition language {I} (rls : Rule I) : lang I := mLanguage (atomTable rls) (regexp rls).

Interpreting the input

Inside the hardware the input is encoded as a finite set by. The language matched by the hardware is therefore on the alphabet which is the finite set over the sort s and is given by the function wLanguage.
Given an atom table we can encode input letters as finite subsets over the sort s below using the encode function. The above interpretation of input alphabet can be seen as a language over I by making use of the atom table as given below

Definition wLanguage {s : sort}: expr.t s lang {set universe.typeOf s} := thicken \o expr.language.
Section Cxt.
  Context {s : sort}{I : Type}.
  Variable aT : (s ~> (I bool))%con.

  Definition encode (i : I) := function.map (fun tsttst i) aT.

Interpreting the language over subsets as language over the input
We have the following language equivalence
  Lemma equivalence : re : expr.t s, interpret (wLanguage re) mLanguage aT re.
    rewrite /interpret /wLanguage /mLanguage /language /matcher.language /comp /expr.language.
    elim ⇒ [ | x | re1 IH1 re2 IH2 | re1 IH1 re2 IH2 |re IH].
    - rewrite thicken_epsilon ?comap_epsilon //=.
    - rewrite //= thicken_singleton ?comap_satisfy; apply /satisfy_extu.
      by rewrite /comp /encode denoteE to_map.
    - by rewrite //= thicken_alt ?comap_alt IH1 IH2.
    - by rewrite //= thicken_concat ?comap_concat IH1 IH2.
    - by rewrite //= thicken_many1 ?comap_many1 IH.
  Qed.
End Cxt.

The interpreted language got by interpreting the input elements as subsets
Definition iLanguage {I} (rls : Rule I) : lang I := interpret (atomTable rls) (wLanguage (regexp rls)).

Theorem interpretation {I}(rls : Rule I) :
  (language rls iLanguage rls)%lan.
  case rlss aT re; by rewrite /iLanguage /language equivalence.
Qed.