- Source: E-graph
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let
Σ
{\displaystyle \Sigma }
be a set of uninterpreted functions, where
Σ
n
{\displaystyle \Sigma _{n}}
is the subset of
Σ
{\displaystyle \Sigma }
consisting of functions of arity
n
{\displaystyle n}
. Let
i
d
{\displaystyle \mathbb {id} }
be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of
f
∈
Σ
n
{\displaystyle f\in \Sigma _{n}}
to e-class IDs
i
1
,
i
2
,
…
,
i
n
∈
i
d
{\displaystyle i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id} }
is denoted
f
(
i
1
,
i
2
,
…
,
i
n
)
{\displaystyle f(i_{1},i_{2},\ldots ,i_{n})}
and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:
A union-find structure
U
{\displaystyle U}
representing equivalence classes of e-class IDs, with the usual operations
f
i
n
d
{\displaystyle \mathrm {find} }
,
a
d
d
{\displaystyle \mathrm {add} }
and
m
e
r
g
e
{\displaystyle \mathrm {merge} }
. An e-class ID
e
{\displaystyle e}
is canonical if
f
i
n
d
(
U
,
e
)
=
e
{\displaystyle \mathrm {find} (U,e)=e}
; an e-node
f
(
i
1
,
…
,
i
n
)
{\displaystyle f(i_{1},\ldots ,i_{n})}
is canonical if each
i
j
{\displaystyle i_{j}}
is canonical (
j
{\displaystyle j}
in
1
,
…
,
n
{\displaystyle 1,\ldots ,n}
).
An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
a hashcons
H
{\displaystyle H}
(i.e. a mapping) from canonical e-nodes to e-class IDs, and
an e-class map
M
{\displaystyle M}
that maps e-class IDs to e-classes, such that
M
{\displaystyle M}
maps equivalent IDs to the same set of e-nodes:
∀
i
,
j
∈
i
d
,
M
[
i
]
=
M
[
j
]
⇔
f
i
n
d
(
U
,
i
)
=
f
i
n
d
(
U
,
j
)
{\displaystyle \forall i,j\in \mathbb {id} ,M[i]=M[j]\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)}
= Invariants
=In addition to the above structure, a valid e-graph conforms to several data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes
f
(
i
1
,
…
,
i
n
)
,
f
(
j
1
,
…
,
j
n
)
{\displaystyle f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})}
are congruent when
f
i
n
d
(
U
,
i
k
)
=
f
i
n
d
(
U
,
j
k
)
,
k
∈
{
1
,
…
,
n
}
{\displaystyle \mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}}
. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
= Operations
=E-graphs expose wrappers around the
a
d
d
{\displaystyle \mathrm {add} }
,
f
i
n
d
{\displaystyle \mathrm {find} }
, and
m
e
r
g
e
{\displaystyle \mathrm {merge} }
operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
= Equivalent formulations
=An e-graph can also be formulated as a bipartite graph
G
=
(
N
⊎
i
d
,
E
)
{\displaystyle G=(N\uplus \mathrm {id} ,E)}
where
i
d
{\displaystyle \mathrm {id} }
is the set of e-class IDs (as above),
N
{\displaystyle N}
is the set of e-nodes, and
E
⊆
(
i
d
×
N
)
∪
(
N
×
i
d
)
{\displaystyle E\subseteq (\mathrm {id} \times N)\cup (N\times \mathrm {id} )}
is a set of directed edges.
There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.
E-matching
Let
V
{\displaystyle V}
be a set of variables and let
T
e
r
m
(
Σ
,
V
)
{\displaystyle \mathrm {Term} (\Sigma ,V)}
be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words,
T
e
r
m
(
Σ
,
V
)
{\displaystyle \mathrm {Term} (\Sigma ,V)}
is the smallest set such that
V
⊂
T
e
r
m
(
Σ
,
V
)
{\displaystyle V\subset \mathrm {Term} (\Sigma ,V)}
,
Σ
0
⊂
T
e
r
m
(
Σ
,
V
)
{\displaystyle \Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)}
, and when
x
1
,
…
,
x
n
∈
T
e
r
m
(
Σ
,
V
)
{\displaystyle x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)}
and
f
∈
Σ
n
{\displaystyle f\in \Sigma _{n}}
, then
f
(
x
1
,
…
,
x
n
)
∈
T
e
r
m
(
Σ
,
V
)
{\displaystyle f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)}
. A term containing variables is called a pattern, a term without variables is called ground.
An e-graph
E
{\displaystyle E}
represents a ground term
t
∈
T
e
r
m
(
Σ
,
∅
)
{\displaystyle t\in \mathrm {Term} (\Sigma ,\emptyset )}
if one of its e-classes represents
t
{\displaystyle t}
. An e-class
C
{\displaystyle C}
represents
t
{\displaystyle t}
if some e-node
f
(
i
1
,
…
,
i
n
)
∈
C
{\displaystyle f(i_{1},\ldots ,i_{n})\in C}
does. An e-node
f
(
i
1
,
…
,
i
n
)
∈
C
{\displaystyle f(i_{1},\ldots ,i_{n})\in C}
represents a term
g
(
j
1
,
…
,
j
n
)
{\displaystyle g(j_{1},\ldots ,j_{n})}
if
f
=
g
{\displaystyle f=g}
and each e-class
M
[
i
k
]
{\displaystyle M[i_{k}]}
represents the term
j
k
{\displaystyle j_{k}}
(
k
{\displaystyle k}
in
1
,
…
,
n
{\displaystyle 1,\ldots ,n}
).
e-matching is an operation that takes a pattern
p
∈
T
e
r
m
(
Σ
,
V
)
{\displaystyle p\in \mathrm {Term} (\Sigma ,V)}
and an e-graph
E
{\displaystyle E}
, and yields all pairs
(
σ
,
C
)
{\displaystyle (\sigma ,C)}
where
σ
⊂
V
×
i
d
{\displaystyle \sigma \subset V\times \mathbb {id} }
is a substitution mapping the variables in
p
{\displaystyle p}
to e-class IDs and
C
∈
i
d
{\displaystyle C\in \mathbb {id} }
is an e-class ID such that the term
σ
(
p
)
{\displaystyle \sigma (p)}
is represented by
C
{\displaystyle C}
. There are several known algorithms for e-matching, the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.
Extraction
Given an e-class and a cost function that maps each function symbol in
Σ
{\displaystyle \Sigma }
to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard. There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.
Complexity
An e-graph with n equalities can be constructed in O(n log n) time.
Equality saturation
Equality saturation is a technique for building optimizing compilers using e-graphs. It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.
Equality saturation is used in specialized optimizing compilers, e.g. for deep learning and linear algebra. Equality saturation has also been used for translation validation applied to the LLVM toolchain.
E-graphs have been applied to several problems in program analysis, including fuzzing, abstract interpretation, and library learning.
References
de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3.
Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv:2004.03082. doi:10.1145/3434304. S2CID 226282597.
Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086.
Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847.
Goharshady, Amir Kafshdar; Lam, Chun Kit; Parreaux, Lionel (2024-10-08). "Fast and Optimal Extraction for Sparse Equality Graphs". Proceedings of the ACM on Programming Languages. 8 (OOPSLA2): 361:2551–361:2577. doi:10.1145/3689801.
External links
The Egg Project
A Colab notebook explaining e-graphs
Kata Kunci Pencarian:
- Graph database
- Knowledge Graph
- Teori graf
- Kota Bandung
- Ziarat
- Kendaraan bermotor
- Algoritma Dijkstra
- Khoirul Anwar
- Penggalian data
- Sejarah komunikasi
- E-graph
- Glossary of graph theory
- Bipartite graph
- Graph (discrete mathematics)
- Graph labeling
- Planar graph
- Eulerian path
- Directed graph
- Graph coloring
- Graph database
- 1
- 2
The Passion of the Christ (2004)
Irati (2023)
Wingwomen (2023)
White Bird (2023)
Justice League: Crisis on Infinite Earths Part Two (2024)
No More Posts Available.
No more pages to load.