- Source: Generalized context-free grammar
Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules. Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.
Description
A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form
f
(
⟨
x
1
,
.
.
.
,
x
m
⟩
,
⟨
y
1
,
.
.
.
,
y
n
⟩
,
.
.
.
)
=
γ
{\displaystyle f(\langle x_{1},...,x_{m}\rangle ,\langle y_{1},...,y_{n}\rangle ,...)=\gamma }
, where
γ
{\displaystyle \gamma }
is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like
X
→
f
(
Y
,
Z
,
.
.
.
)
{\displaystyle X\to f(Y,Z,...)}
, where
Y
{\displaystyle Y}
,
Z
{\displaystyle Z}
, ... are string tuples or non-terminal symbols.
The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.
Example
A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1), which generates the palindrome language
{
w
w
R
:
w
∈
{
a
,
b
}
∗
}
{\displaystyle \{ww^{R}:w\in \{a,b\}^{*}\}}
, where
w
R
{\displaystyle w^{R}}
is the string reverse of
w
{\displaystyle w}
, we can define the composition function conc as in (2a) and the rewrite rules as in (2b).
The CF production of abbbba is
S
aSa
abSba
abbSbba
abbbba
and the corresponding GCFG production is
S
→
c
o
n
c
(
⟨
a
⟩
,
S
,
⟨
a
⟩
)
{\displaystyle S\to conc(\langle a\rangle ,S,\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
c
o
n
c
(
⟨
b
⟩
,
S
,
⟨
b
⟩
)
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
c
o
n
c
(
⟨
b
⟩
,
c
o
n
c
(
⟨
b
⟩
,
S
,
⟨
b
⟩
)
,
⟨
b
⟩
)
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
c
o
n
c
(
⟨
b
⟩
,
c
o
n
c
(
⟨
b
⟩
,
c
o
n
c
(
⟨
ϵ
⟩
,
⟨
ϵ
⟩
,
⟨
ϵ
⟩
)
,
⟨
b
⟩
)
,
⟨
b
⟩
)
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle ),\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
c
o
n
c
(
⟨
b
⟩
,
c
o
n
c
(
⟨
b
⟩
,
⟨
ϵ
⟩
,
⟨
b
⟩
)
,
⟨
b
⟩
)
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,\langle \epsilon \rangle ,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
c
o
n
c
(
⟨
b
⟩
,
⟨
b
b
⟩
,
⟨
b
⟩
)
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,\langle bb\rangle ,\langle b\rangle ),\langle a\rangle )}
c
o
n
c
(
⟨
a
⟩
,
⟨
b
b
b
b
⟩
,
⟨
a
⟩
)
{\displaystyle conc(\langle a\rangle ,\langle bbbb\rangle ,\langle a\rangle )}
⟨
a
b
b
b
b
a
⟩
{\displaystyle \langle abbbba\rangle }
Linear Context-free Rewriting Systems (LCFRSs)
Weir (1988) describes two properties of composition functions, linearity and regularity. A function defined as
f
(
x
1
,
.
.
.
,
x
n
)
=
.
.
.
{\displaystyle f(x_{1},...,x_{n})=...}
is linear if and only if each variable appears at most once on either side of the =, making
f
(
x
)
=
g
(
x
,
y
)
{\displaystyle f(x)=g(x,y)}
linear but not
f
(
x
)
=
g
(
x
,
x
)
{\displaystyle f(x)=g(x,x)}
. A function defined as
f
(
x
1
,
.
.
.
,
x
n
)
=
.
.
.
{\displaystyle f(x_{1},...,x_{n})=...}
is regular if the left hand side and right hand side have exactly the same variables, making
f
(
x
,
y
)
=
g
(
y
,
x
)
{\displaystyle f(x,y)=g(y,x)}
regular but not
f
(
x
)
=
g
(
x
,
y
)
{\displaystyle f(x)=g(x,y)}
or
f
(
x
,
y
)
=
g
(
x
)
{\displaystyle f(x,y)=g(x)}
.
A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a proper subclass of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.
On the other hand, LCFRSs are strictly more expressive than linear-indexed grammars and their weakly equivalent variant tree adjoining grammars (TAGs). Head grammar is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.
LCFRS are weakly equivalent to (set-local) multicomponent TAGs (MCTAGs) and also with multiple context-free grammar (MCFGs [1]). and minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in polynomial time.
See also
Range concatenation grammar
References
Kata Kunci Pencarian:
- Generalized context-free grammar
- Context-free
- Context-free grammar
- Generalized phrase structure grammar
- Context-sensitive language
- Pumping lemma for context-free languages
- Straight-line grammar
- GCFC
- Transformational grammar
- Phrase structure grammar