API¶
Symbols, alphabets, words, and languages¶
This library main focus are formal languages, with particular attention to parsing aspects.
It is hence important first of all to focus on how symbols are represented;
to keep things simple (and make the library more interoperable with the rest of
Python data structures), in LibLet there isn’t a specific type for such
entities that are always represented by strings
(possibly longer
that one character). Observe in passing that Python hasn’t a type for
characters, but uses strings of length one to represent them.
It is straightforward then to conclude that alphabets are represented by
sets
of strings.
On the other hand, one must pay particular attention to words that are
represented as sequences of strings, most commonly tuples
or
lists
. It will never be the case that a LibLet word coincides
with a Python string! In the very particular case in which all the symbols
have length one (as Python strings) one can use the shortcuts list(s)
and
''.join(w)
to go from string to words, and vice versa.
Finally, languages are represented by sets
of sequences of
strings.
Productions, Grammars, and Derivations¶
The basic building block of a grammar is a production represented by the following class.
-
class
Production
(lhs, rhs)¶ A grammar production.
This class represents a grammar production, it has a left-hand and right-hand sides that can be nonempty
strings
, ortuples
of nonempty strings; the right-hand side can contain ε only if is the only symbol it comprises. A production is iterable and unpacking can be used to obtain its sides, so for example>>> lhs, rhs = Production('A', ('B', 'C')) >>> lhs 'A' >>> rhs ('B', 'C')
- Parameters
- Raises
ValueError – in case the left-hand or right-hand side is not a string, or a tuple of strings.
A convenience method is provided to obtain productions from a suitable string representation of them.
-
classmethod
from_string
(prods, context_free=True)¶ Builds a tuple of productions obtained from the given string.
- Parameters
The string must be a sequence of lines of the form:
lhs -> alternatives
where
alternatives
is a list ofrhs
strings (possibly separated by|
) andlhs
andrhs
are space separated strings that will be used as left-hand and right-hand sides of the returned productions; for example:S -> ( S ) | x T y x T y -> t
- Raises
ValueError – in case the productions are declared as
context_free
but one of them has more than one symbol on the right-hand side.
In order to find productions satisfying a set of properties, a convenience method to build a predicate from a set of keyword arguments is provided.
-
classmethod
such_that
(**kwargs)¶ Returns a conjunction of predicates specified by its named arguments.
This method returns a predicate that can be conveniently used with
filter()
to- Parameters
lhs – returns a predicate that is
True
weather the production left-hand side is equal to the argument value.rhs – returns a predicate that is
True
weather the production right-hand side is equal to the argument value.rhs_len – returns a predicate that is
True
weather the length of the production right-hand side is equal to the argument value.rhs_is_suffix_of – returns a predicate that is
True
weather the the argument value ends with the production.
- Returns
A predicate (that is a one-argument function that retuns
True
orFalse
) that isTrue
weather the production given as argument satisfies all the predicates given by the named arguments.
Example
As an example, consider the following productions and the subset of them you can obtaining by filtering them according to different predicates
>>> prods = Production.from_string("A -> B C\nB -> b\nC -> D") >>> list(filter(Production.such_that(lhs = 'B'), prods)) [B -> b] >>> list(filter(Production.such_that(rhs = ['B', 'C']), prods)) [A -> B C] >>> list(filter(Production.such_that(rhs_len = 1), prods)) [B -> b, C -> D] >>> list(filter(Production.such_that(rhs_is_suffix_of = ('a', 'B', 'C')), prods)) [A -> B C] >>> list(filter(Production.such_that(lhs = 'B', rhs_len = 1), prods)) [B -> b]
For the purpose of presenting the Knuth Automaton in the context of LR(0) parsing, productions are extended to include a dot in the following class.
-
class
Item
(lhs, rhs, pos=0)¶ A dotted production, also known as an item.
>>> item = Item('A', ('B', 'C'), 1) >>> item A -> B•C >>> lhs, rhs, pos = item >>> lhs 'A' >>> rhs ('B', 'C') >>> pos 1
- Parameters
- Raises
ValueError – in case the left-hand , or right-hand side is not a tuple of strings, or the dot pos is invalid.
A grammar can be represented by the following class, that can be instantiated given the usual formal definition of a grammar as a tuple.
-
class
Grammar
(N, T, P, S)¶ A grammar.
This class represents a formal grammar, that is a tuple \((N, T, P, S)\) where \(N\) is the finite set of nonterminals or variables symbols, \(T\) is the finite set of terminals, \(P\) are the grammar productions or rules and, \(S \in N\) is the start symbol.
- Parameters
N (set) – the grammar nonterminals.
T (set) – the grammar terminals.
P (tuple) – the grammar
productions
.S (str) – the grammar start symbol.
Even in the case of grammars, a convenience method is provided to build a grammar from a suitable string representation. Once the productions are obtained with
Production.from_string()
, the remaining parameters are conventionally inferred as detailed below.-
classmethod
from_string
(prods, context_free=True)¶ Builds a grammar obtained from the given productions.
- Parameters
Once the productions are determined via a call to
Production.from_string()
, the remaining defining elements of the grammar are obtained as follows:if the grammar is not context-free the nonterminals is the set of symbols, appearing in (the left-hand, or right-hand side of) any production, beginning with an uppercase letter, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production;
if the grammar is context-free the nonterminals is the set of symbols appearing in a left-hand side of any production, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production.
An utility method is provided to enumerate alternatives for a given left-hand side:
-
alternatives
(N)¶ Yields al the right-hand sides alternatives matching the given nonterminal.
To implement grammar “cleanup strategies” a method to eliminate unwanted symbols (and productions) from a grammar is also provided.
-
restrict_to
(symbols)¶ Returns a grammar using only the given symbols.
- Parameters
symbols – the only allowed symbols (it must contain the start symbol).
- Returns
a new grammar containing only the given symbols (and hence no production of the original grammar that uses a symbol not in the given set.)
- Return type
- Raises
ValueError – in case the start symbol is not among the one to keep.
Once a grammar has been defined, one can build derivations with the help of the following class.
-
class
Derivation
(G)¶ A derivation.
This class is immutable, derivations can be built invoking
step()
,leftmost()
, andrightmost()
- Parameters
G (
Grammar
) – the grammar to which the derivations refers to.
-
step
(prod, pos=None)¶ Performs a derivation step, returning a new derivation.
Applies the specified production(s) to the given position in the sentential form.
- Parameters
- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the production can’t be applied at the specified position.
-
leftmost
(prod)¶ Performs a leftmost derivation step.
Applies the specified production(s) to the current leftmost nonterminal in the sentential form.
- Parameters
prod (int or
Iterable
) – the production to apply, that isG.P[prod]
; if it is an iterable of integers, the productions will be applied in order.- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the leftmost nonterminal isn’t the left-hand side of the given production.
-
rightmost
(prod)¶ Performs a rightmost derivation step.
Applies the specified production(s) to the current rightmost nonterminal in the sentential form.
- Parameters
prod (int or
Iterable
) – the production to apply, that isG.P[prod]
; if it is an iterable of integers, the productions will be applied in order.- Returns
A derivation obtained applying the specified production to the present production.
- Return type
- Raises
ValueError – in case the rightmost nonterminal isn’t the left-hand side of the given production.
-
possible_steps
(prod=None, pos=None)¶ Yields all the possible steps that can be performed given the grammar and current sentential form.
Determines all the position of the sentential form that correspond to the left-hand side of one of the production in the grammar, returning the position and production number. If a production is specified, it yields only the pairs referring to it; similarly, if a position is specified, it yields only the pairs referring to it.
Derivations can be displayed using a ProductionGraph
.
Transitions and Automata¶
Albeit in the context of parsing (that is the main focus of this library), the role of finite state automata and regular grammars is not the main focus, a couple of classes to handle them is provided.
First of all, a basic implementation of a transition is provided; as in the
case of grammars, the symbols and terminals are simply strings
.
-
class
Transition
(frm, label, to)¶ An automaton transition.
This class represents an automaton transition; it has a frm starting state and a to destination state and a label, the states can be:
whereas the label is a
str
. A transition is iterable and unpacking can be used to obtain its components, so for example>>> frm, label, to = Transition({'A', 'B'}, 'c', {'D'}) >>> sorted(frm) ['A', 'B'] >>> label 'c' >>> to {'D'}
- Parameters
- Raises
ValueError – in case the frm or to states are not nonempty strings, or nonempty set of nonempty strings, or the label is not a nonempty string.
A convenience method is provided to obtain transitions from a suitable string representation of them.
From transitions one can obtain a representation of a (nondeterministic) finite state automata using the following class.
-
class
Automaton
(N, T, transitions, q0, F)¶ An automaton.
This class represents a (nondeterministic) finite automaton.
- Parameters
N (set) – The states of the automaton.
T (set) – The transition labels.
transitions (
set
ofTransition
) – The transition of the automata.q0 – The starting state of the automaton.
F (set) – The set of final states.
Even for the automata there are a couple of convenience methods to obtain automata from a string representation of the transitions, or from a regular grammar.
-
classmethod
from_string
(transitions, F=None, q0=None)¶ Builds an automaton obtained from the given transitions.
- Parameters
Once the transitions are determined via a call to
Transition.from_string()
, the starting state (if not specified) is defined as thefrm
state of the first transition.
-
classmethod
from_grammar
(G)¶ Builds the automaton corresponding to the given regular grammar.
- Parameters
G (
Grammar
) – the regular grammar to derive the automata from.
The method works under the assumption that the only rules of the grammar are of the form \(A\to aB\), \(A\to B\), \(A\to a\), and \(A\to ε\).
An utility method is provided to represent the transition function of the automata.
-
δ
(X, x)¶ The transition function.
This function returns the set of states reachable from the given state and input symbol.
- Parameters
X – the state.
x – the input symbol.
- Returns
The set of next states of the automaton.
Automata can be displayed using StateTransitionGraph.from_automaton
.
ANTLR support¶
This library provides a commodity class to deal with ANTLR for the Python 3 target.
-
class
ANTLR
(grammar)¶ An utility class representing an ANTLR (v4) grammar and code generated from it.
Given the grammar the constructor of this class generates the code for the lexer, parser, visitor, and listener using the ANTLR tool; if no errors are found, it then loads the modules (with their original names) and stores a reference to them in the attributes named
Lexer
,Parser
,Visitor
, andListener
. If the constructor is called a second time, it correctly unloads the previously generated modules (so that the current one are correctly corresponding to the current grammar). Moreover, to facilitate debugging the grammar it keeps track both of the grammar source (in an attribute of the same name) and of the sources of the generated modules, in an attribute namedsource
that is adict
indexed byLexer
,Parser
,Visitor
, andListener
.- Parameters
grammar (str) – the grammar to process (in ANTLR v4 format).
- Raises
ValueError – if the grammar does not contain the name.
-
print_grammar
(number_lines=True)¶ Prints the grammar (with line numbers)
- Parameters
number_lines (bool) – if
False
line numbers will not be printed.
-
context
(text, symbol, trace=False, diag=False, build_parse_trees=True, as_string=False, fail_on_error=False)¶ Returns an object subclass of a
antlr4.ParserRuleContext
corresponding to the specified symbol (possibly as a string).- Parameters
text (str) – the text to be parsed.
symbol (str) – the symbol (rule name) the parse should start at.
trace (bool) – if
True
the methodantlr4.Parser.setTrace
will be used to activate tracing.diag (bool) – if
True
the parsing will be run with aantlr4.error.DiagnosticErrorListener
setting the prediction mode toantlr4.atn.PredictionMode.LL_EXACT_AMBIG_DETECTION
.build_parse_trees (bool) – if
False
the fieldantlr4.Parser.buildParseTrees
will be set toFalse
so that no trees will be generated.as_string (bool) – if
True
the method will return the string representation of the context obtained using itstoStringTree
method.fail_on_error (bool) – if
True
the method will returnNone
in case of paring errors.
- Returns
A parser context, in case of parsing errors the it can be used to investigate the errors (unless
fail_on_error
isTrue
in what case this method returnsNone
).
-
tokens
(text)¶ Returns a list of tokens.
This method uses the lexer wrapped in a
antlr4.CommonTokenStream
to obtain the list of tokens (calling thefill
method).- Parameters
text (str) – the text to be processed by the lexer.
-
tree
(text, symbol, simple=False)¶ Returns an annotated
Tree
representing the parse tree (derived from the parse context).Unless a simple tree is required, the returned is an annotated tree whose nodes store context information from the parsing process, more precisely, nodes are
dicts
with the following keys:type
: can berule
ortoken
,name
: the rule label or token name (or the rule name if it has no label: or, similarly, the token itself if it has no name),value
: the token value (only present for tokens named in the lexer part of the grammar),rule
: the rule name (only present for rules, will be different fromname
for labelled rules).
Note that the values are strings (so if the value is a number, it should be converted before usage).
This method removes any
<EOF>
tree returned in the underlying parse tree (unless a simple tree is required).- Parameters
- Returns
liblet.display.Tree
the (possibly annotated) parse tree, orNone
in case of parsing errors.
-
class
AnnotatedTreeWalker
(key, catchall=None)¶ A dispatch table based way to recursively walk an annotated tree.
The aim of this class is to provide a convenient way to recursively apply functions to the subtrees of a given tree according to their root annotations.
More precisely, once an object of this class is instantiated it can be used to register a set of functions named as the values of a specified key contained in the annotated tree root; trees that don’t have a corresponding registered function are handled via a catchall function (few sensible defaults are provided).
Once such functions are registered, the object can be used as a callable and invoked on an annotated tree; this will lead to the recursive invocation of the registered (and the catchall) functions on subtrees. Such functions are invoked with the object as the first argument (and the subtree as second); in this way they can recurse on their subtrees as needed.
- Parameters
key (str) – the key that will be looked up in the node
dict
to determine which registered function to call.catchall (callable) – the catchall function that will be invoked in case the root is not a dict, or does not contain the key, or no function has been registered for the value corresponding to the key.
-
catchall
(func)¶ A decorator used to register a catchall function.
The decorated function must have two arguments: the first will be always an instance of this object, the second the annotate tree on which to operate. The previous registered catchall function will be overwritten.
-
register
(func)¶ A decorator used to register a function.
The function will be registered with his name (possibly replacing a previous registration). The decorated function must have two arguments: the first will be always an instance of this object, the second the annotate tree on which to operate.
-
static
RECOURSE_CHILDREN
(visit, tree)¶ A default catchall that will invoke the recursion on all the subtrees; this function does not emit any warning and does not return any value.
Rich display¶
-
class
Tree
(root, children=None)¶ A n-ary tree with ordered children.
The tree stores its
root
andchildren
in two fields of the same name. If the tree root is adict
the tree is an annotated tree. This kind of trees arise from parsing, for example using thetree()
method of theANTLR
class.A tree is represented as a string as a
(<root>: <children>)
where<root>
is the string representation of the root node content and<children>
is the recursively obtained string representation of its children.It also as a representation as a graph; in such case, if the tree is annotated it will be rendered by Graphviz using HTML-Like Labels built as a table where each dictionary item corresponds to a row with two columns containing the key and value pair of such item.
- Parameters
root – the root node content (can be of any type).
children – an iterable of trees to become the current tree children.
-
classmethod
from_lol
(lol)¶ Builds a tree from a list of lists.
A list of lists (lol) is a list recursively defined such that the first element of a lol is the tree node content and the following elements are lols.
- Parameters
lol (list) – a list representing a tree.
Examples
The following code
Tree.from_lol([{'this': 'is', 'an': 2},[3, ['example']], [5]])
produces the following tree
-
with_threads
(threads)¶ Draws a threaded and annotated tree (as a graph).
Tree threads are arcs among tree nodes (or special nodes), more precisely, a
dict
mapping annotated tree source nodes to adict
whose values are destinations tree nodes; arcs are represented as dotted arrows. Special nodes are: the start tree (an annotated tree whose root contains the(type, <START>)
item), the join trees (annotated trees whose root contains the(type, <JOIN>)
item) and theNone
value; such special nodes are represented as red dots.Args: threads (dict): a dictionary representing the threads.
-
class
ProductionGraph
(derivation, compact=None)¶
-
class
StateTransitionGraph
(transitions, S=None, F=None, large_labels=False)¶ A directed graph with labelled nodes and arcs.
- Parameters
transitions – an iterable of triples \((f, l, t)\) of three strings representing the \((f, t)\) edge of the graph with label \(l\).
S (str) – the starting node.
F (set) – the set of final nodes.
large_labels (bool) – whether the string labelling nodes are long and require pre-formatting before being drawn or not.
Examples
The following code
StateTransitionGraph([ ('node a','arc (a,b)','node b'), ('node a', 'arc (a,c)', 'node c'), ('node b', 'arc (b, c)', 'node c')], 'node a', {'node b'})
gives the following graph
where nodes and edges are labeled as implied by the first parameter, the staring node has an (unlabelled) edge entering into it, whereas the final nodes are doubly circled.
-
classmethod
from_automaton
(A, coalesce_sets=True, large_labels=False)¶ A factory method to build a
StateTransitionGraph
starting from anAutomaton
.
Utilities and decorators¶
-
class
Queue
(iterable=None, maxlen=None)¶ A convenience implementation of a queue providing the usual
enqueue
,dequeue
, andcopy
methods.
-
class
Stack
(iterable=None, maxlen=None)¶ A convenience implementation of a stack providing the usual
push
,pop
,peek
, andcopy
methods.
-
class
Table
(ndim=1, element=<function Table.<lambda>>, no_reassign=False, fmt=None)¶ A one or two-dimensional table able to detect conflicts and with a nice HTML representation, based on
defaultdict
.
-
warn
(msg)¶ Emits a string on the standard error.
-
union_of
(s)¶ Return the set union of its arguments.
-
closure
(f)¶ Wraps a function in a closure computation.
This decorator takes a function
f(S, O)
and returns a functionF(S, O)
that repeatedly callsf
on its own returned value andO
until it doesn’t change.- Parameters
f – the function to wrap in the closure computation.
Example
Consider the following example where the function
f
given a setS
of integers and an integerm
as input, returns the set obtained adding toS
all the valuess - 1
for all its elements greater than some lower limitm
.>>> @closure ... def reduce_up_to(S, m): ... return S | {s - 1 for s in S if s > m} ... >>> reduce_up_to({7,5}, 3) {3, 4, 5, 6, 7}
It is evident that its closure will return the set of values from
m
to the largest element inS
.Notes
More formally, consider a function \(f : \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) where \(\mathfrak{D}\) is a domain of interest and \(\mathfrak{X}\) is the (possibly empty) domain of extra parameters. Define as usual the iterated application of \(f\) as \(f^{(n + 1)}(D, X) = f(f^{(n)}(D, X), X)\), where \(f^{(0)}(D, X) = f(D, X)\). Then \(F = \textit{closure}(f)\) is the function \(F: \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) defined as \(F(D, X) = f^{(\hat{n})}(D, X)\) where \(\hat{n}\) is the minimum \(n\) (depending on \(D\) and \(X\)) such that \(f^{(\hat{n} + 1)}(D, X) = f^{(\hat{n})}(D, X)\).
-
show_calls
(show_retval=False)¶ Wraps a function so that it calls (and return values) are printed when invoked.
This decorator takes a function a decorated function that prints every invocation (with the value of the parameters and optionally the returned value).
- Parameters
f – the function to wrap in the closure computation.
show_retval – wether to show also the return values.
Example
This is how to use the decorator with the classic recursive implementation of the Fibonacci numbers computation.
>>> @show_calls(True) ... def fib(n): ... if n == 0 or n == 1: return 1 ... return fib(n - 1) + fib(n - 2) ... >>> _ = fib(3) ┌fib(3) │┌fib(2) ││┌fib(1) ││└─ 1 ││┌fib(0) ││└─ 1 │└─ 2 │┌fib(1) │└─ 1 └─ 3