API¶
Symbols, alphabets, words, and languages¶
This library main focus are formal languages, with particular attention to parsing aspects.
It is hence important first of all to focus on how symbols are represented;
to keep things simple (and make the library more interoperable with the rest of
Python data structures), in LibLet there isn’t a specific type for such
entities that are always represented by strings
(possibly longer
that one character). Observe in passing that Python hasn’t a type for
characters, but uses strings of length one to represent them.
It is straightforward then to conclude that alphabets are represented by
sets
of strings.
On the other hand, one must pay particular attention to words that are
represented as sequences of strings, most commonly tuples
or
lists
. It will never be the case that a LibLet word coincides
with a Python string! In the very particular case in which all the symbols
have length one (as Python strings) one can use the shortcuts list(s)
and
''.join(w)
to go from string to words, and vice versa.
Finally, languages are represented by sets
of sequences of
strings.
Productions, Grammars, and Derivations¶
The basic building block of a grammar is a production represented by the following class.
- class Production(lhs, rhs)¶
A grammar production.
This class represents a grammar production, it has a left-hand and right-hand sides that can be nonempty
strings
, ortuples
of nonempty strings; the right-hand side can contain ε only if is the only symbol it comprises. A production is iterable and unpacking can be used to obtain its sides, so for example>>> lhs, rhs = Production('A', ('B', 'C')) >>> lhs 'A' >>> rhs ('B', 'C')
- Parameters:
- Raises:
ValueError – in case the left-hand or right-hand side is not a string, or a tuple of strings.
In order to find productions satisfying a set of properties, a convenience method to build a predicate from a set of keyword arguments is provided.
- classmethod such_that(**kwargs)¶
Returns a conjunction of predicates specified by its named arguments.
This method returns a predicate that can be conveniently used with
filter()
to- Parameters:
lhs – returns a predicate that is
True
weather the production left-hand side is equal to the argument value.rhs – returns a predicate that is
True
weather the production right-hand side is equal to the argument value.rhs_len – returns a predicate that is
True
weather the length of the production right-hand side is equal to the argument value.rhs_is_suffix_of – returns a predicate that is
True
weather the the argument value ends with the production.
- Returns:
A predicate (that is a one-argument function that retuns
True
orFalse
) that isTrue
weather the production given as argument satisfies all the predicates given by the named arguments.
Example
As an example, consider the following productions and the subset of them you can obtaining by filtering them according to different predicates
>>> prods = Productions.from_string('A -> B C\nB -> b\nC -> D') >>> list(filter(Production.such_that(lhs='B'), prods)) [B -> b] >>> list(filter(Production.such_that(rhs=['B', 'C']), prods)) [A -> B C] >>> list(filter(Production.such_that(rhs_len=1), prods)) [B -> b, C -> D] >>> list(filter(Production.such_that(rhs_is_suffix_of=('a', 'B', 'C')), prods)) [A -> B C] >>> list(filter(Production.such_that(lhs='B', rhs_len=1), prods)) [B -> b]
- class Productions(iterable=(), /)¶
A sequence (tuple) of productions.
The main purpose of this class is to allow a nicer HTML representation of the grammar productions.
A convenience method is provided to obtain productions from a suitable string representation of them.
- classmethod from_string(prods, context_free=True)¶
Builds a tuple of productions obtained from the given string.
- Parameters:
The string must be a sequence of lines of the form:
lhs -> alternatives
where
alternatives
is a list ofrhs
strings (possibly separated by|
) andlhs
andrhs
are space separated strings that will be used as left-hand and right-hand sides of the returned productions; for example:S -> ( S ) | x T y x T y -> t
- Raises:
ValueError – in case the productions are declared as
context_free
but one of them has more than one symbol on the right-hand side.
For the purpose of presenting the Knuth Automaton in the context of LR(0) parsing, productions are extended to include a dot in the following class.
- class Item(lhs, rhs, pos=0)¶
A dotted production, also known as an item.
>>> item = Item('A', ('B', 'C'), 1) >>> item A -> B•C >>> lhs, rhs, pos = item >>> lhs 'A' >>> rhs ('B', 'C') >>> pos 1
- Parameters:
- Raises:
ValueError – in case the left-hand , or right-hand side is not a tuple of strings, or the dot pos is invalid.
The class has just two metods, useful for the construction of the Knuth Automaton.
- advance(X)¶
Returns a new
Item
obtained advancing the dot past the given symbol.- Parameters:
X (str) – the terminal, or non terminal, to move the dot over.
- Returns:
The new item, or
None
if the symbol after the dot is not the given one.
- symbol_after_dot()¶
Returns the symbol after the dot.
- Returns:
The symbol after the dot, or
None
if the dot is at the end of the right-hand side.
A grammar can be represented by the following class, that can be instantiated given the usual formal definition of a grammar as a tuple.
- class Grammar(N, T, P, S)¶
A grammar.
This class represents a formal grammar, that is a tuple \((N, T, P, S)\) where \(N\) is the finite set of nonterminals or variables symbols, \(T\) is the finite set of terminals, \(P\) are the grammar productions or rules and, \(S \in N\) is the start symbol.
- Parameters:
N (set) – the grammar nonterminals.
T (set) – the grammar terminals.
P (tuple) – the grammar
productions
.S (str) – the grammar start symbol.
Even in the case of grammars, a convenience method is provided to build a grammar from a suitable string representation. Once the productions are obtained with
Productions.from_string()
, the remaining parameters are conventionally inferred as detailed below.- classmethod from_string(prods, context_free=True)¶
Builds a grammar obtained from the given productions.
- Parameters:
Once the productions are determined via a call to
Productions.from_string()
, the remaining defining elements of the grammar are obtained as follows:if the grammar is not context-free the nonterminals is the set of symbols, appearing in (the left-hand, or right-hand side of) any production, beginning with an uppercase letter, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production;
if the grammar is context-free the nonterminals is the set of symbols appearing in a left-hand side of any production, the terminals are the remaining symbols. The start symbol is the left-hand side of the first production.
An utility method is provided to enumerate alternatives for a given left-hand side:
- alternatives(N)¶
Yields al the right-hand sides alternatives matching the given nonterminal.
To implement grammar “cleanup strategies” a method to eliminate unwanted symbols (and productions) from a grammar is also provided.
- restrict_to(symbols)¶
Returns a grammar using only the given symbols.
- Parameters:
symbols – the only allowed symbols (it must contain the start symbol).
- Returns:
a new grammar containing only the given symbols (and hence no production of the original grammar that uses a symbol not in the given set.)
- Return type:
- Raises:
ValueError – in case the start symbol is not among the one to keep.
Once a grammar has been defined, one can build derivations with the help of the following class.
- class Derivation(G, start=None)¶
A derivation.
This class is immutable, derivations can be built invoking
step()
,leftmost()
, andrightmost()
- Parameters:
- step(prod, pos=None)¶
Performs a derivation step, returning a new derivation.
Applies the specified production(s) to the given position in the sentential form.
- Parameters:
- Returns:
A derivation obtained applying the specified production to the present production.
- Return type:
- Raises:
ValueError – in case the production can’t be applied at the specified position.
- leftmost(prod)¶
Performs a leftmost derivation step.
Applies the specified production(s) to the current leftmost nonterminal in the sentential form.
- Parameters:
prod (int or
Iterable
) – the production to apply, that isG.P[prod]
; if it is an iterable of integers, the productions will be applied in order.- Returns:
A derivation obtained applying the specified production to the present production.
- Return type:
- Raises:
ValueError – in case the leftmost nonterminal isn’t the left-hand side of the given production.
- rightmost(prod)¶
Performs a rightmost derivation step.
Applies the specified production(s) to the current rightmost nonterminal in the sentential form.
- Parameters:
prod (int or
Iterable
) – the production to apply, that isG.P[prod]
; if it is an iterable of integers, the productions will be applied in order.- Returns:
A derivation obtained applying the specified production to the present production.
- Return type:
- Raises:
ValueError – in case the rightmost nonterminal isn’t the left-hand side of the given production.
- possible_steps(prod=None, pos=None)¶
Yields all the possible steps that can be performed given the grammar and current sentential form.
Determines all the position of the sentential form that correspond to the left-hand side of one of the production in the grammar, returning the position and production number. If a production is specified, it yields only the pairs referring to it; similarly, if a position is specified, it yields only the pairs referring to it.
Derivations can be displayed using a ProductionGraph
.
Transitions and Automata¶
Albeit in the context of parsing (that is the main focus of this library), the role of finite state automata and regular grammars is not the main focus, a couple of classes to handle them is provided.
First of all, a basic implementation of a transition is provided; as in the
case of grammars, the symbols and terminals are simply strings
.
- class Transition(frm, label, to)¶
An automaton transition.
This class represents an automaton transition; it has a frm starting state and a to destination state and a label, the states can be:
whereas the label is a
str
. A transition is iterable and unpacking can be used to obtain its components, so for example>>> frm, label, to = Transition({'A', 'B'}, 'c', {'D'}) >>> sorted(frm) ['A', 'B'] >>> label 'c' >>> to {'D'}
- Parameters:
- Raises:
ValueError – in case the frm or to states are not nonempty strings, or nonempty set of nonempty strings, or the label is not a nonempty string.
A convenience method is provided to obtain transitions from a suitable string representation of them.
From transitions one can obtain a representation of a (nondeterministic) finite state automata using the following class.
- class Automaton(N, T, transitions, q0, F)¶
An automaton.
This class represents a (nondeterministic) finite automaton.
- Parameters:
N (set) – The states of the automaton.
T (set) – The transition labels.
transitions (
set
ofTransition
) – The transition of the automata.q0 – The starting state of the automaton.
F (set) – The set of final states.
Even for the automata there are a couple of convenience methods to obtain automata from a string representation of the transitions, or from a regular grammar.
- classmethod from_string(transitions, F=None, q0=None)¶
Builds an automaton obtained from the given transitions.
- Parameters:
Once the transitions are determined via a call to
Transition.from_string()
, the starting state (if not specified) is defined as thefrm
state of the first transition.
- classmethod from_grammar(G)¶
Builds the automaton corresponding to the given regular grammar.
- Parameters:
G (
Grammar
) – the regular grammar to derive the automata from.
The method works under the assumption that the only rules of the grammar are of the form \(A\to aB\), \(A\to B\), \(A\to a\), and \(A\to ε\).
An utility method is provided to represent the transition function of the automata.
- δ(X, x)¶
The transition function.
This function returns the set of states reachable from the given state and input symbol.
- Parameters:
X – the state.
x – the input symbol.
- Returns:
The set of next states of the automaton.
Automata can be displayed using StateTransitionGraph.from_automaton
.
Instantaneous Descriptions¶
During the analysis of some parsing algorithms can be convenient to keep track of the computation of the related (nondeterministic) pushdown automata. The following classes have the purpose to represent an instantaneous description of the automaton given by its stack and, the tape content and the position of the reading head.
- class InstantaneousDescription(G)¶
An Instantaneous Description.
This class represents a instantaneous description of a pushdown auotmata.
This is the superclass describing an instantaneous description of an automata; it comprises a tape and the head_pos, a stack and a sequence of steps that have taken the automaton to the current configuration. It is immutable, subclasses define how to obtain a new description given a move (that in general differ for top-down and bottom-up parsers).
- head()¶
Returns the symbol under the tape head.
- class TopDownInstantaneousDescription(G, word=None)¶
An Instantaneous Description for a top-down parsing automaton.
This class represents a instantaneous description of a pushdown auotmata.
- Parameters:
This class represents an instantaneous description of an automata performing top-down parsing; elements on the stack are symbols from the grammar, an extra ♯ is added at the bottom of the stack and the end of the tape to make it easier to determine if the input word has been accepted.
- class BottomUpInstantaneousDescription(G, word=None)¶
An Instantaneous Description for a bottom-up parsing automaton.
This class represents a instantaneous description of a pushdown auotmata.
- Parameters:
This class represents an instantaneous description of an automata performing bottom-down parsing; elements on the stack are the
trees
of the parsing forest.
ANTLR support¶
This module provides a commodity class to deal with ANTLR for the Python 3 target.
- class ANTLR(grammar)¶
An utility class representing an ANTLR (v4) grammar and code generated from it.
Given the grammar the constructor of this class generates the code for the lexer, parser, visitor, and listener using the ANTLR tool; if no errors are found, it then loads the modules (with their original names) and stores a reference to them in the attributes named
Lexer
,Parser
,Visitor
, andListener
. If the constructor is called a second time, it correctly unloads the previously generated modules (so that the current one are correctly corresponding to the current grammar). Moreover, to facilitate debugging the grammar it keeps track both of the grammar source (in an attribute of the same name) and of the sources of the generated modules, in an attribute namedsource
that is adict
indexed byLexer
,Parser
,Visitor
, andListener
.- Parameters:
grammar (str) – the grammar to process (in ANTLR v4 format).
- Raises:
ValueError – if the grammar does not contain the name.
- classmethod load(path)¶
Loads a grammar from file.
- Parameters:
path (str) – the path of the file where the grammar was saved.
- save(path)¶
Saves this grammar to file.
- Parameters:
path (str) – the path of the file where to save the grammar.
- print_grammar(number_lines=True)¶
Prints the grammar (with line numbers)
- Parameters:
number_lines (bool) – if
False
line numbers will not be printed.
- context(text, symbol, trace=False, diag=False, build_parse_trees=True, as_string=False, fail_on_error=False, antlr_hook=None)¶
Returns an object subclass of a
antlr4.ParserRuleContext
corresponding to the specified symbol (possibly as a string).- Parameters:
text (str) – the text to be parsed.
symbol (str) – the symbol (rule name) the parse should start at.
trace (bool) – if
True
the methodantlr4.Parser.setTrace
will be used to activate tracing.diag (bool) – if
True
the parsing will be run with aantlr4.error.DiagnosticErrorListener
setting the prediction mode toantlr4.atn.PredictionMode.LL_EXACT_AMBIG_DETECTION
.build_parse_trees (bool) – if
False
the fieldantlr4.Parser.buildParseTrees
will be set toFalse
so that no trees will be generated.as_string (bool) – if
True
the method will return the string representation of the context obtained using itstoStringTree
method.fail_on_error (bool) – if
True
the method will returnNone
in case of paring errors.antlr_hook (function) – if not
None
will be called with the lexer and parser as arguments to further configure them before use
- Returns:
A parser context, in case of parsing errors the it can be used to investigate the errors (unless
fail_on_error
isTrue
in what case this method returnsNone
).
- tokens(text)¶
Returns a list of tokens.
This method uses the lexer wrapped in a
antlr4.CommonTokenStream
to obtain the list of tokens (calling thefill
method).- Parameters:
text (str) – the text to be processed by the lexer.
- tree(text, symbol, simple=False, antlr_hook=None)¶
Returns an annotated
Tree
representing the parse tree (derived from the parse context).Unless a simple tree is required, the returned is an annotated tree whose nodes store context information from the parsing process, more precisely, nodes are
dicts
with the following keys:type
: can berule
ortoken
,name
: the rule label or token name (or the rule name if it has no label: or, similarly, the token itself if it has no name),value
: the token value (only present for tokens named in the lexer part of the grammar),rule
: the rule name (only present for rules, will be different fromname
for labelled rules).src
: the first (and last token, in case of a rule), corresponding to the node.line
: the line number of the (first) token.
Note that the values are strings (so if the value is a number, it should be converted before usage).
This method removes any
<EOF>
tree returned in the underlying parse tree (unless a simple tree is required).- Parameters:
text (str) – the text to be parsed.
symbol (str) – the symbol (rule name) the parse should start at.
simple (bool) – if
True
the returned tree nodes will be strings (with no context information).antlr_hook (function) – if not
None
will be called with the lexer and parser as arguments to further configure them before use
- Returns:
liblet.display.Tree
the (possibly annotated) parse tree, orNone
in case of parsing errors.
- class AnnotatedTreeWalker(key, catchall_func=None, dispatch_table=None)¶
A dispatch table based way to recursively walk an annotated tree.
The aim of this class is to provide a convenient way to recursively apply functions to the subtrees of a given tree according to their root annotations.
More precisely, once an object of this class is instantiated it can be used to register a set of functions named as the values of a specified key contained in the annotated tree root; trees that don’t have a corresponding registered function are handled via a catchall function (few sensible defaults are provided).
Once such functions are registered, the object can be used as a callable and invoked on an annotated tree; this will lead to the recursive invocation of the registered (and the catchall) functions on subtrees. Such functions are invoked with the object as the first argument (and the subtree as second); in this way they can recurse on their subtrees as needed.
- Parameters:
key (str) – the key that will be looked up in the node
dict
to determine which registered function to call.catchall (callable) – the catchall function that will be invoked in case the root is not a dict, or does not contain the key, or no function has been registered for the value corresponding to the key.
- classmethod load(path, globals_dict=None)¶
Loads a walker from a file.
- save(path)¶
Saves this walker to file.
- Parameters:
path (str) – the path of the file where to save the walker.
- catchall(func)¶
A decorator used to register a catchall function.
The decorated function must have two arguments: the first will be always an instance of this object, the second the annotate tree on which to operate. The previous registered catchall function will be overwritten.
- register(func, name=None)¶
A decorator used to register a function.
The function will be registered with his (or the given) name (possibly replacing a previous registration, and omitting the trailing
_
that can be used to avoid conflicts with builtins). The decorated function must have two arguments: the first will be always an instance of this object, the second the annotate tree on which to operate.
- static RECOURSE_CHILDREN(visit, tree)¶
A default catchall that will invoke the recursion on all the subtrees; this function does not emit any warning and does not return any value.
LLVM support¶
This very experimental module provides a commodity class to play with LLVM IR language.
- class LLVM(name, code='')¶
An utility class to play with LLVM IR language
- new_variable()¶
Returns a new identifier for a variable.
- new_label()¶
Returns a new identifier for a label.
- append_code(code)¶
Appends the given code.
- write_and_compile()¶
Wraps the code in some boilerplate, writes it to disk and compiles it.
- control_flow_graph()¶
Returns the control flow graph.
- mem2reg()¶
Outputs the result of mem2reg optimization.
Rich display¶
- class Graph(arcs, sep=None)¶
- neighbors(src)¶
Returns (a set containing) the neighbors of the given node.
- Parameters:
src – the node.
- class Tree(root, children=None)¶
A n-ary tree with ordered children.
The tree stores its
root
andchildren
in two fields of the same name.If the tree root is a
dict
the tree is an annotated tree. Such kind of trees arise from parsing, for example using thetree()
method of theANTLR
class; these trees are automatically endowed with anattr
attribute defined as anAttrDict
wrapping theroot
, so that, for an annotated treet
, it is completely equivalent to writet.root['key']
ort.attr.key
(both for reading, and writing).A tree is represented as a string as a
(<root>: <children>)
where<root>
is the string representation of the root node content and<children>
is the recursively obtained string representation of its children.It also has a representation as a graph; in such case, if the tree is annotated it will be rendered by Graphviz using HTML-Like Labels built as a table where each dictionary item corresponds to a row with two columns containing the key and value pair of such item.
- Parameters:
root – the root node content (can be of any type).
children – an iterable of trees to become the current tree children.
- classmethod from_lol(lol)¶
Builds a tree from a list of lists.
A list of lists (lol) is a list recursively defined such that the first element of a lol is the tree node content and the following elements are lols.
- Parameters:
lol (list) – a list representing a tree.
Examples
The following code
Tree.from_lol([{'this': 'is', 'an': 2},[3, ['example']], [5]])
produces the following tree
- with_threads(threads)¶
Draws a threaded and annotated tree (as a graph).
Tree threads are arcs among tree nodes (or special nodes), more precisely, a
dict
mapping annotated tree source nodes to adict
whose values are destinations tree nodes; arcs are represented as dotted arrows. Special nodes are: the start tree (an annotated tree whose root contains the(type, <START>)
item), the join trees (annotated trees whose root contains the(type, <JOIN>)
item) and theNone
value; such special nodes are represented as red dots.Args: threads (dict): a dictionary representing the threads.
- class ProductionGraph(derivation, compact=None)¶
- class StateTransitionGraph(transitions, S=None, F=None, large_labels=False)¶
A directed graph with labelled nodes and arcs.
- Parameters:
transitions – an iterable of triples \((f, l, t)\) of three strings representing the \((f, t)\) edge of the graph with label \(l\).
S (str) – the starting node.
F (set) – the set of final nodes.
large_labels (bool) – whether the string labelling nodes are long and require pre-formatting before being drawn or not.
Examples
The following code
StateTransitionGraph([ ('node a','arc (a,b)','node b'), ('node a', 'arc (a,c)', 'node c'), ('node b', 'arc (b, c)', 'node c')], 'node a', {'node b'})
gives the following graph
where nodes and edges are labeled as implied by the first parameter, the staring node has an (unlabelled) edge entering into it, whereas the final nodes are doubly circled.
- classmethod from_automaton(A, coalesce_sets=True, large_labels=False)¶
A factory method to build a
StateTransitionGraph
starting from anAutomaton
.
Utilities and decorators¶
- class Queue(iterable=None, maxlen=None)¶
A convenience implementation of a queue providing the usual
enqueue
,dequeue
, andcopy
methods.
- class Stack(iterable=None, maxlen=None)¶
A convenience implementation of a stack providing the usual
push
,pop
,peek
, andcopy
methods.
- class Table(ndim=1, element=<function Table.<lambda>>, no_reassign=False, fmt=None)¶
A one or two-dimensional table able to detect conflicts and with a nice HTML representation, based on
defaultdict
.
- class AttrDict(mapping)¶
A
MutableMapping
implementation that wraps a given mappingd
so that ifad = AttrDict(d)
it will then become completely equivalent to writed['key']
,ad['key']
orad.key
.
- warn(msg)¶
Emits a string on the standard error.
- peek(s)¶
Deprecated. Use first
- union_of(s)¶
Return the set union of its arguments.
- closure(f)¶
Wraps a function in a closure computation.
This decorator takes a function
f(S, O)
and returns a functionF(S, O)
that repeatedly callsf
on its own returned value andO
until it doesn’t change.- Parameters:
f – the function to wrap in the closure computation.
Example
Consider the following example where the function
f
given a setS
of integers and an integerm
as input, returns the set obtained adding toS
all the valuess - 1
for all its elements greater than some lower limitm
.>>> @closure ... def reduce_up_to(S, m): ... return S | {s - 1 for s in S if s > m} >>> reduce_up_to({7, 5}, 3) {3, 4, 5, 6, 7}
It is evident that its closure will return the set of values from
m
to the largest element inS
.Notes
More formally, consider a function \(f : \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) where \(\mathfrak{D}\) is a domain of interest and \(\mathfrak{X}\) is the (possibly empty) domain of extra parameters. Define as usual the iterated application of \(f\) as \(f^{(n + 1)}(D, X) = f(f^{(n)}(D, X), X)\), where \(f^{(0)}(D, X) = f(D, X)\). Then \(F = \textit{closure}(f)\) is the function \(F: \mathfrak{D}\times \mathfrak{X} \to \mathfrak{D}\) defined as \(F(D, X) = f^{(\hat{n})}(D, X)\) where \(\hat{n}\) is the minimum \(n\) (depending on \(D\) and \(X\)) such that \(f^{(\hat{n} + 1)}(D, X) = f^{(\hat{n})}(D, X)\).
- show_calls(show_retval=False)¶
Wraps a function so that it calls (and return values) are printed when invoked.
This decorator takes a function a decorated function that prints every invocation (with the value of the parameters and optionally the returned value).
- Parameters:
f – the function to wrap in the closure computation.
show_retval – wether to show also the return values.
Example
This is how to use the decorator with the classic recursive implementation of the Fibonacci numbers computation.
>>> @show_calls(True) ... def fib(n): ... if n == 0 or n == 1: ... return 1 ... return fib(n - 1) + fib(n - 2) >>> _ = fib(3) ┌fib(3) │┌fib(2) ││┌fib(1) ││└─ 1 ││┌fib(0) ││└─ 1 │└─ 2 │┌fib(1) │└─ 1 └─ 3