API¶
Productions, Grammars and Derivations¶
The basic building block of a grammar is a production represented by the following class.
-
class
Production
(lhs, rhs)¶ A grammar production.
This class represents a grammar production, it has a left-hand and right-hand sides that can be nonempty
strings
, ortuples
of nonempty strings; the right-hand side can contain ε only if is the only symbol it comprises. A production is iterable and unpacking can be used to obtain its sides, so for example>>> lhs, rhs = Production('A', ('B', 'C')) >>> lhs 'A' >>> rhs ('B', 'C')
- Parameters
- Raises
ValueError – in case the left-hand or right-hand side is not a strings, or tuples of strings.
A convenience method is provided to obtain productions from a suitable string representation of them.
-
classmethod
from_string
(prods, context_free=True)¶ Builds a tuple of productions obtained from the given string.
- Parameters
The string must be a sequence of lines of the form:
lhs -> alternatives
where
alternatives
is a list ofrhs
strings (possibly separated by|
) andlhs
andrhs
are space separated strings that will be used as left-hand and right-hand sides of the returned productions; for example:S -> ( S ) | x T y x T y -> t
- Raises
ValueError – in case the productions are declared as
context_free
but on of them has more than one symbol on the right-hand side.
In order to find productions satisfying a set of properties, a convenience method to build a predicate from a set of keyword arguments is provided.
-
classmethod
such_that
(**kwargs)¶ Returns a conjunction of predicates specified by its named arguments.
This method returns a predicate that can be conveniently used with
filter()
to- Parameters
lhs – returns a predicate that is
True
weather the production left-hand side is equal to the argument value.rhs – returns a predicate that is
True
weather the production left-hand side is equal to the argument value.rhs_len – returns a predicate that is
True
weather the length of the production left-hand side is equal to the argument value.rhs_is_suffix_of – returns a predicate that is
True
weather the the argument value ends with the production.
- Returns
A predicate (that is a one-argument function that retuns
True
orFalse
) that isTrue
weather the production given as argument satisfies all the predicates given by the named arguments.
Example
As an example, consider the following productions and the subset of them you can obtaining by filtering them according to different predicates
>>> prods = Production.from_string("A -> B C\nB -> b\nC -> D") >>> list(filter(Production.such_that(lhs = 'B'), prods)) [B -> b] >>> list(filter(Production.such_that(rhs = ['B', 'C']), prods)) [A -> B C] >>> list(filter(Production.such_that(rhs_len = 1), prods)) [B -> b, C -> D] >>> list(filter(Production.such_that(rhs_is_suffix_of = ('a', 'B', 'C')), prods)) [A -> B C] >>> list(filter(Production.such_that(lhs = 'B', rhs_len = 1), prods)) [B -> b]
For the purpose of presenting the Knuth Automaton in the context of LR(0) parsing, productions are extended to include a dot in the following class.
-
class
Item
(lhs, rhs, pos=0)¶ A dotted production, also known as an item.
>>> item = Item('A', ('B', 'C'), 1) >>> item A -> B•C >>> lhs, rhs, pos = item >>> lhs 'A' >>> rhs ('B', 'C') >>> pos 1
- Parameters
- Raises
ValueError – in case the left-hand , or right-hand side is not a tuple of strings, or the dot pos is invalid.
A grammar can be represented by the following class, that can be instantiated given the usual formal definition of a grammar as a tuple.
-
class
Grammar
(N, T, P, S)¶ A grammar.
This class represents a formal grammar, that is a tuple \((N, T, P, S)\) where \(N\) is the finite set of nonterminals or variables symbols, \(T\) is the finite set of terminals, \(P\) are the grammar productions or rules and, \(S \in N\) is the start symbol.
- Parameters
N (set) – the grammar nonterminals.
T (set) – the grammar terminals.
P (tuple) – the grammar
productions
.S (str) – the grammar start symbol.
Even in the case of grammars, a convenience method is provided to build a grammar from a suitable string representation. Once the productions are obtained with
Production.from_string()
, the remaining parameters are conventionally inferred as detailed below.-
classmethod
from_string
(prods, context_free=True)¶ Builds a grammar obtained from the given productions.
- Parameters
Once the productions are determined via a call to
Production.from_string()
, the remaining defining elements of the grammar are obtained as follows:if the grammar is not context-free the nonterminals is the set of symbols, appearing in (the left-hand, or right-hand side of) any production, beginning with an uppercase letter, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production;
if the grammar is context-free the nonterminals is the set of symbols appearing in a left-hand side of any production, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production.
An utility method is provided to enumerate alternatives for a given left-hand side:
-
alternatives
(N)¶ Yields al the right-hand sides alternatives matching the given nonterminal.
To implement grammar “cleanup strategies” a method to eliminate unwanted symbols (and productions) from a grammar is also provided.
Once a grammar has been defined, one can build derivations with the help of the following class.
-
class
Derivation
(G)¶ A derivation.
This class is immutable, derivations can be built invoking
step()
,leftmost()
, andrightmost()
- Parameters
G (
Grammar
) – the grammar to which the derivations refers to.
-
step
(prod, pos)¶ Performs a derivation step, returning a new derivation.
Applies the specified production to the given position in the sentential form.
- Parameters
- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the production can’t be applied at the specified position.
-
leftmost
(prod)¶ Performs a leftmost derivation step.
Applies the specified production to the current leftmost nonterminal in the sentential form.
- Parameters
prod (int) – the production to apply, that is
G.P[prod]
.- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the leftmost nonterminal isn’t the left-hand side of the given production.
-
rightmost
(prod)¶ Performs a rightmost derivation step.
Applies the specified production to the current rightmost nonterminal in the sentential form.
- Parameters
prod (int) – the production to apply, that is
G.P[prod]
.- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the rightmost nonterminal isn’t the left-hand side of the given production.
-
possible_steps
(prod=None, pos=None)¶ Yields all the possible steps that can be performed given the grammar and current sentential form.
Determines all the position of the sentential form that correspond to the left-hand side of one of the production in the grammar, returning the position and production number. If a production is specified, it yields only the pairs referring to it; similarly, if a position is specified, it yields only the pairs referring to it.
Derivations can be displayed using a ProductionGraph
.
Transitions and Automata¶
Albeit in the context of parsing (that is the main focus of this library), the role of finite state automata and regular grammars is not the main focus, a couple of classes to handle them is provided.
First of all, a basic implementation of a transition is provided; as in the
case of grammars, the symbols and terminals are simply strings
.
-
class
Transition
(frm, label, to)¶ An automaton transition.
This class represents an automaton transition; it has a frm starting state and a to destination state and a label, the states can be:
whereas the label is a
str
. A transition is iterable and unpacking can be used to obtain its components, so for example>>> frm, label, to = Transition({'A', 'B'}, 'c', {'D'}) >>> sorted(frm) ['A', 'B'] >>> label 'c' >>> to {'D'}
- Parameters
- Raises
ValueError – in case the frm or to states are not nonempty strings, or nonempty set of nonempty strings, or the label is not a nonempty string.
A convenience method is provided to obtain transtions from a suitable string representation of them.
From transitions one can obtain a representation of a (nondeterministic) finite state automata using the following class.
-
class
Automaton
(N, T, transitions, q0, F)¶ An automaton.
This class represents a (nondeterministic) finite automaton.
- Parameters
N (set) – The states of the automaton.
T (set) – The transition labels.
transitions (
set
ofTransition
) – The transition of the automata.q0 – The starting state of the automaton.
F (set) – The set of final states.
Even for the automata there are a couple of convenience methods to obtain automata from a string representation of the transitions, or from a regular grammar.
-
classmethod
from_string
(transitions, F=None, q0=None)¶ Builds an automaton obtained from the given transitions.
- Parameters
Once the transitions are determined via a call to
Transition.from_string()
, the starting state (if not specified) is defined as thefrm
state of the first transition.
-
classmethod
from_grammar
(G)¶ Builds the automaton corresponding to the given regular grammar.
- Parameters
G (
Grammar
) – the regular grammar to derive the automata from.
The method works under the assumption that the only rules of the grammar are of the form \(A\to aB\), \(A\to B\), \(A\to a\), and \(A\to ε\).
An utility method is provided to represent the transition function of the automata.
-
δ
(X, x)¶ The transition function.
This function returns the set of states reachable from the given state and input symbol.
- Parameters
X – the state.
x – the input symbol.
- Returns
The set of next states of the automaton.
Automata can be displayed using StateTransitionGraph.from_automaton
.
Rich dislpay¶
-
class
ProductionGraph
(derivation, compact=None)¶
-
class
StateTransitionGraph
(transitions, S=None, F=None, large_labels=False)¶ A directed graph with labelled nodes and arcs.
- Parameters
transitions – an iterable of triples \((f, l, t)\) of three strings representing the \((f, t)\) edge of the graph with label \(l\).
S (str) – the starting node.
F (set) – the set of final nodes.
large_labels (bool) – whether the string labelling nodes are long and require pre-formatting before being drawn or not.
Examples
The following code
StateTransitionGraph([ ('node a','arc (a,b)','node b'), ('node a', 'arc (a,c)', 'node c'), ('node b', 'arc (b, c)', 'node c')], 'node a', {'node b'})
gives the following graph
where nodes and edges are labeled as implied by the first parameter, the staring node has an (unlabelled) edge entering into it, whereas the final nodes are doubly circled.
-
classmethod
from_automaton
(A, coalesce_sets=True, large_labels=False)¶ A factory method to build a
StateTransitionGraph
starting from anAutomaton
.
Utilities and decorators¶
-
closure
(f)¶ Wraps a function in a closure computation.
This decorator takes a function
f(S, O)
and returns a functionF(S, O)
that repeatedly callsf
on its own returned value andO
until it doesn’t change.- Parameters
f – the function to wrap in the closure computation.
Example
Consider the following example where the function
f
given a setS
of integers and an integerm
as input, returns the set obtained adding toS
all the valuess - 1
for all its elements greater than some lower limitm
.>>> @closure ... def reduce_up_to(S, m): ... return S | {s - 1 for s in S if s > m} ... >>> reduce_up_to({7,5}, 3) {3, 4, 5, 6, 7}
It is evident that its closure will return the set of values from
m
to the largest element inS
.Notes
More formally, consider a function \(f : \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) where \(\mathfrak{D}\) is a domain of interest and \(\mathfrak{X}\) is the (possibly empty) domain of extra parameters. Define as usual the iterated application of \(f\) as \(f^{(n + 1)}(D, X) = f(f^{(n)}(D, X), X)\), where \(f^{(0)}(D, X) = f(D, X)\). Then \(F = \textit{closure}(f)\) is the function \(F: \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) defined as \(F(D, X) = f^{(\hat{n})}(D, X)\) where \(\hat{n}\) is the minimum \(n\) (depending on \(D\) and \(X\)) such that \(f^{(\hat{n} + 1)}(D, X) = f^{(\hat{n})}(D, X)\).
-
show_calls
(show_retval=False)¶ Wraps a function so that it calls (and return values) are printed when invoked.
This decorator takes a function a decorated function that prints every invocation (with the value of the parameters and optionally the returned value).
- Parameters
f – the function to wrap in the closure computation.
show_retval – wether to show also the return values.
Example
This is how to use the decorator with the classic recursive implementation of the Fibonacci numbers computation.
>>> @show_calls(True) ... def fib(n): ... if n == 0 or n == 1: return 1 ... return fib(n - 1) + fib(n - 2) ... >>> _ = fib(3) ┌fib(3) │┌fib(2) ││┌fib(1) ││└─ 1 ││┌fib(0) ││└─ 1 │└─ 2 │┌fib(1) │└─ 1 └─ 3