regexp.concrete.rule: String matching rules
Overview
- Rule: The rules for matching the string. The rule type should be
seen as the AST for specifing regular expressions by the user.
- language rls: For a given set of rules rls, one can think of rls
as a generalized regular expression given by the type
expr.matcher.t. This denotes the language of the associated
regular expression. The user means this language.
- interpretation: This is the main theorem of this module.
String matching rules
- A finite sort letter
- An atomTable that is a map from the finite letters to boolean
predicates on I
- A regular expression over the finite letters.
Record Rule {I} := { letter : sort;
atomTable : (letter ~> (I → bool))%con;
regexp : expr.t (universe.typeOf letter)
}.
Arguments Rule : clear implicits.
The associated language
The atom table which gives a map from the finite sort s to
boolean test functions on the input I
The matcher associated with an expression language
The language over I corresponding to the matcher
The language that the user means when giving a set of rules
Interpreting the input
Definition wLanguage {s : sort}: expr.t s → lang {set universe.typeOf s} := thicken \o expr.language.
Section Cxt.
Context {s : sort}{I : Type}.
Variable aT : (s ~> (I → bool))%con.
Definition encode (i : I) := function.map (fun tst ⇒ tst i) aT.
Interpreting the language over subsets as language over the input
Definition interpret : lang {set universe.typeOf s} → lang I := lang.comap encode \o lang.comap subset.denote.
We have the following language equivalence
Lemma equivalence : ∀ re : expr.t s, interpret (wLanguage re) ≡ mLanguage aT re.
rewrite /interpret /wLanguage /mLanguage /language /matcher.language /comp /expr.language.
elim ⇒ [ | x | re1 IH1 re2 IH2 | re1 IH1 re2 IH2 |re IH].
- rewrite thicken_epsilon ?comap_epsilon //=.
- rewrite //= thicken_singleton ?comap_satisfy; apply /satisfy_ext ⇒ u.
by rewrite /comp /encode denoteE to_map.
- by rewrite //= thicken_alt ?comap_alt IH1 IH2.
- by rewrite //= thicken_concat ?comap_concat IH1 IH2.
- by rewrite //= thicken_many1 ?comap_many1 IH.
Qed.
End Cxt.
rewrite /interpret /wLanguage /mLanguage /language /matcher.language /comp /expr.language.
elim ⇒ [ | x | re1 IH1 re2 IH2 | re1 IH1 re2 IH2 |re IH].
- rewrite thicken_epsilon ?comap_epsilon //=.
- rewrite //= thicken_singleton ?comap_satisfy; apply /satisfy_ext ⇒ u.
by rewrite /comp /encode denoteE to_map.
- by rewrite //= thicken_alt ?comap_alt IH1 IH2.
- by rewrite //= thicken_concat ?comap_concat IH1 IH2.
- by rewrite //= thicken_many1 ?comap_many1 IH.
Qed.
End Cxt.
The interpreted language got by interpreting the input elements as subsets