regexp.expr: Regular expressions
Overview
- t A: The regular expressions over atoms of type A. We have the
following operations
- ε
- r1 || r2 : Either r1 or r2 written r1 + r2 in literature
- r1 ++ r2: The concatenation
- many r: The Kleene closure r⋆
- many1 r: One or more of r often r⁺ = r r⋆ in literature
- ε
- map: The natural map function. Makes expr.t a functor.
- language r : The language associated with the regular expression.
- eval: When the atom set itself is lang B then every regular expression defines a language over B. Both language and matcher.language are defined in terms of this eval function.
Matcher variant.
- matcher.t B: regular expressions used for matching. It is what we
call generelized regular expression (i.e. t (B → bool)
- matcher.language The language over that is matched by the given
regular expression
Inductive t {A : Type} :=
| eps : t
| single : A → t
| alt : t → t → t
| concat : t → t → t
| many1 : t → t.
Arguments t : clear implicits.
Fixpoint map {A B}(f : A → B)(re : t A) : t B :=
match re with
| eps ⇒ eps
| single x ⇒ single (f x)
| alt re1 re2 ⇒ alt (map f re1) (map f re2)
| concat re1 re2 ⇒ concat (map f re1) (map f re2)
| many1 re ⇒ many1 (map f re)
end.
Definition many {A}(r : t A) := alt (many1 r) eps.
Notation "'ε'" := eps : regexp_scope.
Infix "||" := alt : regexp_scope.
Infix "++" := concat : regexp_scope.
Fixpoint unnest {A}(re : t (t A)) : t A :=
match re with
| eps ⇒ eps
| single x ⇒ x
| alt re1 re2 ⇒ alt (unnest re1) (unnest re2)
| concat re1 re2 ⇒ concat (unnest re1) (unnest re2)
| many1 re ⇒ many1 (unnest re)
end.
Require Import regexp.lang.
Regular expression over languages.
Fixpoint eval {A}(re : t (lang.lang A)) : lang.lang A :=
match re with
| ε%re ⇒ ε%lan
| single L ⇒ L
| (re1 || re2)%re ⇒ (eval re1 || eval re2)%lan
| (re1 ++ re2)%re ⇒ (eval re1 ++ eval re2)%lan
| expr.many1 re ⇒ lang.many1 (eval re)
end%lan.