API

Productions, Grammars and Derivations

The basic building block of a grammar is a production represented by the following class.

class Production(lhs, rhs)

A grammar production.

This class represents a grammar production, it has a left-hand and right-hand sides that can be nonempty strings, or tuples of nonempty strings; the right-hand side can contain ε only if is the only symbol it comprises. A production is iterable and unpacking can be used to obtain its sides, so for example

>>> lhs, rhs = Production('A', ('B', 'C'))
>>> lhs
'A'
>>> rhs
('B', 'C')
Parameters
  • lhs (str or tuple of str) – The left-hand side of the production.

  • rhs (str or tuple of str) – The right-hand side of the production.

Raises

ValueError – in case the left-hand or right-hand side is not a strings, or tuples of strings.

A convenience method is provided to obtain productions from a suitable string representation of them.

classmethod from_string(prods, context_free=True)

Builds a tuple of productions obtained from the given string.

Parameters
  • prods (str) – a string representing the set of productions.

  • context_free (bool) – if True all the left-hand sides will be strings (not tuples).

The string must be a sequence of lines of the form:

lhs -> alternatives

where alternatives is a list of rhs strings (possibly separated by |) and lhs and rhs are space separated strings that will be used as left-hand and right-hand sides of the returned productions; for example:

S -> ( S ) | x T y
x T y -> t
Raises

ValueError – in case the productions are declared as context_free but on of them has more than one symbol on the right-hand side.

In order to find productions satisfying a set of properties, a convenience method to build a predicate from a set of keyword arguments is provided.

classmethod such_that(**kwargs)

Returns a conjunction of predicates specified by its named arguments.

This method returns a predicate that can be conveniently used with filter() to

Parameters
  • lhs – returns a predicate that is True weather the production left-hand side is equal to the argument value.

  • rhs – returns a predicate that is True weather the production left-hand side is equal to the argument value.

  • rhs_len – returns a predicate that is True weather the length of the production left-hand side is equal to the argument value.

  • rhs_is_suffix_of – returns a predicate that is True weather the the argument value ends with the production.

Returns

A predicate (that is a one-argument function that retuns True or False) that is True weather the production given as argument satisfies all the predicates given by the named arguments.

Example

As an example, consider the following productions and the subset of them you can obtaining by filtering them according to different predicates

>>> prods = Production.from_string("A -> B C\nB -> b\nC -> D")
>>> list(filter(Production.such_that(lhs = 'B'), prods))
[B -> b]
>>> list(filter(Production.such_that(rhs = ['B', 'C']), prods))
[A -> B C]
>>> list(filter(Production.such_that(rhs_len = 1), prods))
[B -> b, C -> D]
>>> list(filter(Production.such_that(rhs_is_suffix_of = ('a', 'B', 'C')), prods))
[A -> B C]
>>> list(filter(Production.such_that(lhs = 'B', rhs_len = 1), prods))
[B -> b]

For the purpose of presenting the Knuth Automaton in the context of LR(0) parsing, productions are extended to include a dot in the following class.

class Item(lhs, rhs, pos=0)

A dotted production, also known as an item.

>>> item = Item('A', ('B', 'C'), 1)
>>> item
A -> B•C
>>> lhs, rhs, pos = item
>>> lhs
'A'
>>> rhs
('B', 'C')
>>> pos
1
Parameters
  • lhs (str or tuple of str) – The left-hand side of the production.

  • rhs (str or tuple of str) – The right-hand side of the production.

  • pos (int) – the position of the dot (optional, 0 if absent).

Raises

ValueError – in case the left-hand , or right-hand side is not a tuple of strings, or the dot pos is invalid.

A grammar can be represented by the following class, that can be instantiated given the usual formal definition of a grammar as a tuple.

class Grammar(N, T, P, S)

A grammar.

This class represents a formal grammar, that is a tuple \((N, T, P, S)\) where \(N\) is the finite set of nonterminals or variables symbols, \(T\) is the finite set of terminals, \(P\) are the grammar productions or rules and, \(S \in N\) is the start symbol.

Parameters
  • N (set) – the grammar nonterminals.

  • T (set) – the grammar terminals.

  • P (tuple) – the grammar productions.

  • S (str) – the grammar start symbol.

Even in the case of grammars, a convenience method is provided to build a grammar from a suitable string representation. Once the productions are obtained with Production.from_string(), the remaining parameters are conventionally inferred as detailed below.

classmethod from_string(prods, context_free=True)

Builds a grammar obtained from the given productions.

Parameters
  • prods (str) – a string describing the productions.

  • context_free (bool) – if True the grammar is expected to be context-free.

Once the productions are determined via a call to Production.from_string(), the remaining defining elements of the grammar are obtained as follows:

  • if the grammar is not context-free the nonterminals is the set of symbols, appearing in (the left-hand, or right-hand side of) any production, beginning with an uppercase letter, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production;

  • if the grammar is context-free the nonterminals is the set of symbols appearing in a left-hand side of any production, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production.

An utility method is provided to enumerate alternatives for a given left-hand side:

alternatives(N)

Yields al the right-hand sides alternatives matching the given nonterminal.

Parameters

N (str or tuple of str) – the right-hand side to match.

Yields

the right-hand sides of all productions having N as the left-hand side.

To implement grammar “cleanup strategies” a method to eliminate unwanted symbols (and productions) from a grammar is also provided.

restrict_to(symbols)

Returns a grammar using only the given symbols.

Parameters

symbols – the only allowed symbols.

Returns

a new grammar containing only the given symbols (and hence no production of the original grammar that uses a symbol not in the given set.)

Return type

Grammar

Once a grammar has been defined, one can build derivations with the help of the following class.

class Derivation(G)

A derivation.

This class is immutable, derivations can be built invoking step(), leftmost(), and rightmost()

Parameters

G (Grammar) – the grammar to which the derivations refers to.

step(prod, pos)

Performs a derivation step, returning a new derivation.

Applies the specified production to the given position in the sentential form.

Parameters
  • prod (int) – the production to apply, that is G.P[prod].

  • pos (int) – the position (in the current sentential form) where to apply the production.

Returns

A derivation obtained applying the specified production to the present production.

Return type

Derivation

Raises

ValueError – in case the production can’t be applied at the specified position.

leftmost(prod)

Performs a leftmost derivation step.

Applies the specified production to the current leftmost nonterminal in the sentential form.

Parameters

prod (int) – the production to apply, that is G.P[prod].

Returns

A derivation obtained applying the specified production to the present production.

Return type

Derivation

Raises

ValueError – in case the leftmost nonterminal isn’t the left-hand side of the given production.

rightmost(prod)

Performs a rightmost derivation step.

Applies the specified production to the current rightmost nonterminal in the sentential form.

Parameters

prod (int) – the production to apply, that is G.P[prod].

Returns

A derivation obtained applying the specified production to the present production.

Return type

Derivation

Raises

ValueError – in case the rightmost nonterminal isn’t the left-hand side of the given production.

possible_steps(prod=None, pos=None)

Yields all the possible steps that can be performed given the grammar and current sentential form.

Determines all the position of the sentential form that correspond to the left-hand side of one of the production in the grammar, returning the position and production number. If a production is specified, it yields only the pairs referring to it; similarly, if a position is specified, it yields only the pairs referring to it.

Parameters
  • prod (int) – the production whose left-hand side is to be searched (that is G.P[prod].lhs`) in the sentential form.

  • pos (int) – the position where to look for grammar productions that have a matching left-hand side.

Yields

Pairs of (pord, pos) that can be used as step() argument.

steps()

Returns the steps of the derivation.

Returns: a tuple of (prod, pos) pairs corresponding to this derivation steps.

sentential_form()

Returns the sentential form of the derivation.

Returns: a tuple of (strings representing) grammar symbols corresponding to the sentential form of this derivation steps.

Derivations can be displayed using a ProductionGraph.

Transitions and Automata

Albeit in the context of parsing (that is the main focus of this library), the role of finite state automata and regular grammars is not the main focus, a couple of classes to handle them is provided.

First of all, a basic implementation of a transition is provided; as in the case of grammars, the symbols and terminals are simply strings.

class Transition(frm, label, to)

An automaton transition.

This class represents an automaton transition; it has a frm starting state and a to destination state and a label, the states can be:

whereas the label is a str. A transition is iterable and unpacking can be used to obtain its components, so for example

>>> frm, label, to = Transition({'A', 'B'}, 'c', {'D'})
>>> sorted(frm)
['A', 'B']
>>> label
'c'
>>> to
{'D'}
Parameters
  • frm (str or set of str) – The starting state(s) of the transition.

  • label (str) – The label of the transition.

  • to (str or set of str) – The destination state(s) of the transition.

Raises

ValueError – in case the frm or to states are not nonempty strings, or nonempty set of nonempty strings, or the label is not a nonempty string.

A convenience method is provided to obtain transtions from a suitable string representation of them.

classmethod from_string(transitions)

Builds a tuple of transitions obtained from the given string.

Parameters

trasitions (str) – a string representing transitions.

The string must be a sequence of lines of the form:

frm, label, to

where the parts are strings not containing spaces.s

From transitions one can obtain a representation of a (nondeterministic) finite state automata using the following class.

class Automaton(N, T, transitions, q0, F)

An automaton.

This class represents a (nondeterministic) finite automaton.

Parameters
  • N (set) – The states of the automaton.

  • T (set) – The transition labels.

  • transitions (set of Transition) – The transition of the automata.

  • q0 – The starting state of the automaton.

  • F (set) – The set of final states.

Even for the automata there are a couple of convenience methods to obtain automata from a string representation of the transitions, or from a regular grammar.

classmethod from_string(transitions, F=None, q0=None)

Builds an automaton obtained from the given transitions.

Parameters
  • transitions (str) – a string describing the productions.

  • F (set) – The set of final states.

  • q0 (str) – The starting state of the automaton (inferred from transitions if None).

Once the transitions are determined via a call to Transition.from_string(), the starting state (if not specified) is defined as the frm state of the first transition.

classmethod from_grammar(G)

Builds the automaton corresponding to the given regular grammar.

Parameters

G (Grammar) – the regular grammar to derive the automata from.

The method works under the assumption that the only rules of the grammar are of the form \(A\to aB\), \(A\to B\), \(A\to a\), and \(A\to ε\).

An utility method is provided to represent the transition function of the automata.

δ(X, x)

The transition function.

This function returns the set of states reachable from the given state and input symbol.

Parameters
  • X – the state.

  • x – the input symbol.

Returns

The set of next states of the automaton.

Automata can be displayed using StateTransitionGraph.from_automaton.

Rich dislpay

class Graph(arcs, sep=None)
classmethod from_adjdict(adjdict)
class Tree(root, children=None)
classmethod from_lol(lol)
class ProductionGraph(derivation, compact=None)
class StateTransitionGraph(transitions, S=None, F=None, large_labels=False)

A directed graph with labelled nodes and arcs.

Parameters
  • transitions – an iterable of triples \((f, l, t)\) of three strings representing the \((f, t)\) edge of the graph with label \(l\).

  • S (str) – the starting node.

  • F (set) – the set of final nodes.

  • large_labels (bool) – whether the string labelling nodes are long and require pre-formatting before being drawn or not.

Examples

The following code

StateTransitionGraph([
      ('node a','arc (a,b)','node b'),
      ('node a', 'arc (a,c)', 'node c'),
      ('node b', 'arc (b, c)', 'node c')],
  'node a', {'node b'})

gives the following graph

_images/stg.svg

where nodes and edges are labeled as implied by the first parameter, the staring node has an (unlabelled) edge entering into it, whereas the final nodes are doubly circled.

classmethod from_automaton(A, coalesce_sets=True, large_labels=False)

A factory method to build a StateTransitionGraph starting from an Automaton.

Parameters
  • A (Automaton) – the automaton.

  • coalesce_sets (bool) – whether the automata states are sets and the corresponding labels must be obtained joining the strings in the sets.

Utilities and decorators

closure(f)

Wraps a function in a closure computation.

This decorator takes a function f(S, O) and returns a function F(S, O) that repeatedly calls f on its own returned value and O until it doesn’t change.

Parameters

f – the function to wrap in the closure computation.

Example

Consider the following example where the function f given a set S of integers and an integer m as input, returns the set obtained adding to S all the values s - 1 for all its elements greater than some lower limit m.

>>> @closure
... def reduce_up_to(S, m):
...     return S | {s - 1 for s in S if s > m}
...
>>> reduce_up_to({7,5}, 3)
{3, 4, 5, 6, 7}

It is evident that its closure will return the set of values from m to the largest element in S.

Notes

More formally, consider a function \(f : \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) where \(\mathfrak{D}\) is a domain of interest and \(\mathfrak{X}\) is the (possibly empty) domain of extra parameters. Define as usual the iterated application of \(f\) as \(f^{(n + 1)}(D, X) = f(f^{(n)}(D, X), X)\), where \(f^{(0)}(D, X) = f(D, X)\). Then \(F = \textit{closure}(f)\) is the function \(F: \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) defined as \(F(D, X) = f^{(\hat{n})}(D, X)\) where \(\hat{n}\) is the minimum \(n\) (depending on \(D\) and \(X\)) such that \(f^{(\hat{n} + 1)}(D, X) = f^{(\hat{n})}(D, X)\).

show_calls(show_retval=False)

Wraps a function so that it calls (and return values) are printed when invoked.

This decorator takes a function a decorated function that prints every invocation (with the value of the parameters and optionally the returned value).

Parameters
  • f – the function to wrap in the closure computation.

  • show_retval – wether to show also the return values.

Example

This is how to use the decorator with the classic recursive implementation of the Fibonacci numbers computation.

>>> @show_calls(True)
... def fib(n):
...     if n == 0 or n == 1: return 1
...     return fib(n - 1) + fib(n - 2)
...
>>> _ = fib(3)
┌fib(3)
│┌fib(2)
││┌fib(1)
││└─ 1
││┌fib(0)
││└─ 1
│└─ 2
│┌fib(1)
│└─ 1
└─ 3