- Source: Graph edit distance
In mathematics and computer science, graph edit distance (GED) is a measure of similarity (or dissimilarity) between two graphs.
The concept of graph edit distance was first formalized mathematically by Alberto Sanfeliu and King-Sun Fu in 1983.
A major application of graph edit distance is in inexact graph matching, such
as error-tolerant pattern recognition in machine learning.
The graph edit distance between two graphs is related to the
string edit distance between strings.
With the interpretation of strings as
connected, directed acyclic graphs of
maximum degree one, classical definitions
of edit distance such as Levenshtein distance,
Hamming distance
and Jaro–Winkler distance may be interpreted as graph edit distances
between suitably constrained graphs. Likewise, graph edit distance is
also a generalization of tree edit distance between
rooted trees.
Formal definitions and properties
The mathematical definition of graph edit distance is dependent upon the definitions of
the graphs over which it is defined, i.e. whether and how the vertices and edges of the
graph are labeled and whether the edges are directed.
Generally, given a set of graph edit operations (also known as elementary graph operations), the graph edit distance between two graphs
g
1
{\displaystyle g_{1}}
and
g
2
{\displaystyle g_{2}}
, written as
G
E
D
(
g
1
,
g
2
)
{\displaystyle GED(g_{1},g_{2})}
can be defined as
G
E
D
(
g
1
,
g
2
)
=
min
(
e
1
,
.
.
.
,
e
k
)
∈
P
(
g
1
,
g
2
)
∑
i
=
1
k
c
(
e
i
)
{\displaystyle GED(g_{1},g_{2})=\min _{(e_{1},...,e_{k})\in {\mathcal {P}}(g_{1},g_{2})}\sum _{i=1}^{k}c(e_{i})}
where
P
(
g
1
,
g
2
)
{\displaystyle {\mathcal {P}}(g_{1},g_{2})}
denotes the set of edit paths transforming
g
1
{\displaystyle g_{1}}
into (a graph isomorphic to)
g
2
{\displaystyle g_{2}}
and
c
(
e
)
≥
0
{\displaystyle c(e)\geq 0}
is the cost of each graph edit operation
e
{\displaystyle e}
.
The set of elementary graph edit operators typically includes:
vertex insertion to introduce a single new labeled vertex to a graph.
vertex deletion to remove a single (often disconnected) vertex from a graph.
vertex substitution to change the label (or color) of a given vertex.
edge insertion to introduce a new colored edge between a pair of vertices.
edge deletion to remove a single edge between a pair of vertices.
edge substitution to change the label (or color) of a given edge.
Additional, but less common operators, include operations such as edge splitting that introduces a new vertex into an edge (also creating a new edge), and edge contraction that eliminates vertices of degree two between edges (of the same color). Although such complex edit operators can be defined in terms of more elementary transformations, their use allows finer parameterization of the cost function
c
{\displaystyle c}
when the operator is cheaper than the sum of its constituents.
A deep analysis of the elementary graph edit operators is presented in
And some methods have been presented to automatically deduce these elementary graph edit operators. And some algorithms learn these costs online:
Applications
Graph edit distance finds applications in handwriting recognition, fingerprint recognition and cheminformatics.
Algorithms and complexity
Exact algorithms for computing the graph edit distance between a pair of graphs typically transform the problem into one of finding the minimum cost edit path between the two graphs.
The computation of the optimal edit path is cast as a pathfinding search or shortest path problem, often implemented as an A* search algorithm.
In addition to exact algorithms, a number of efficient approximation algorithms are also known. Most of them have cubic computational time
Moreover, there is an algorithm that deduces an approximation of the GED in linear time
Despite the above algorithms sometimes working well in practice, in general the problem of computing graph edit distance is NP-hard (for a proof that's available online, see Section 2 of Zeng et al.), and is even hard to approximate (formally, it is APX-hard).
References
Kata Kunci Pencarian:
- Graph edit distance
- Edit distance
- Graph matching
- Graph operations
- Metric space
- Distance
- List of terms relating to algorithms and data structures
- Ged
- Cograph
- Graphon