test

antara97

Chapter11.ppt

Home >Computer Science homework help >test

Languages and Automata

Section 11.1 Regular Languages

Problem: Suppose the input strings to a program must be strings over the alphabet {a, b}

that contain exactly one substring bb. In other words, the strings must be of the form xbby,

where x and y are strings over {a, b} that do not contain bb, x does not end in b, and y does

not begin with b. In a few minutes, we’ll see how to describe the strings formally.

A regular language over alphabet A is a language constructed by the following rules:

 and {} are regular languages.
{a} is a regular language for all a  A.
If M and N are regular language, then so are M  N, MN, and M*.

Example. Let A = {a, b}. Then the following languages are a sampling of the regular

languages over A:

, {}, {a}, {b}, {a, b}, {ab}, {a}* = {, a, aa, aaa, …, an, … }.

A regular expression over alphabet A is an expression constructed by the following rules:

 and  are regular expressions.
a is a regular expression for all a  A.
If R and S are regular expressions, then so are (R), R + S, R·S, and R*.

The hierarchy in the absence of parentheses is, * (do it first), ·, + (do it last). Juxtaposition will be used in place of ·.

Example. Let A = {a, b}. Then the following expressions are a sampling of the regular

expressions over A:

, , a, b, ab, a + ab, (a + b)*.

Regular expressions represent regular languages

Regular expressons represent regular languages by the following correspondence, where

L(R) denotes the regular language of the regular expression R.

L() = ,

L() = {},

L(a) = {a} for all a  A,

L(R + S) = L(R)  L(S),

L(RS) = L(R)L(S),

L(R*) = L(R)*.

Example. The regular expression ab + a* represents the following regular language:

L(ab + a*) = L(ab)  L(a*)

= L(a)L(b)  L(a)*

= {a}{b}  {a}*

= {ab}  {, a, aa, aaa, …, an, … }

={ab, , a, aa, aaa, …, an, … }.

Example. The regular expression (a + b)* represents the following regular language:

L((a + b)*) = (L(a + b))* = {a, b}*, the set of all possible strings over {a, b}.

Back to the Problem: Suppose the input strings to a program must be strings over the

alphabet {a, b} that contain exactly one substring bb. In other words, the strings must be of

the form xbby, where x and y are strings over {a, b} that do not contain bb, x does not end

in b, and y does not begin with b. How can we describe the set of inputs formally?

Solution: let x = (a + ba)* and y = (a + ab)*.

Quiz. Find a regular expresson for {abn | n  N}  {ban | n  N}.

Answer. ab* + ba*.

Quiz. Use a sentence to describe the language of (b + ab)*( + a).

Answer. All strings over {a, b} whose substrings of a’s have length 1.

The Algebra of Regular Expressions

Equality: Regular expressions R and S are equal, written R = S, when L(R) = L(S).

Examples. a + b = b + a, a + a = a, aa* = a*a, ab ≠ ba.

Properties of +, ·, and closure

+ is commutative, associative,  is identity for +, and R + R = R.

· is associative,  is identity for ·, and  is zero for ·.

· distributes over +

(closure properties)

* = * = .

R* = R*R* = (R*)* = R + R*.

R* =  + R* =  + RR* = ( + R)* = ( + R)R*.

R* = (R + R2 + … + Rk)* =  + R + R2 + … + Rk–1 + RkR* for any k ≥ 1.

R*R = RR*.

(R + S)* = (R* + S*)* = (R*S*)* = (R*S)*R* = R*(SR*)*.

R(SR)* = (RS)*R.

(R*S)* =  + (R + S)*S and (RS*)* =  + R(R + S)*.

Proof: All properties can be verified by showing that the underlying regular languages are

equal as sets. QED.

Quiz. Explain each inequality.

(1). (a + b)* ≠ a* + b*. (2) (a + b)* ≠ a*b*.

Answers. (1) ab  L((a + b)*) – L(a* + b*).

(2) ba  L((a + b)*) – L(a*b*).

Quiz. Simplify the regular expression aa(b* + a) + a(ab* + aa).

Answer. aa(b* + a) + a(ab* + aa)

= aa(b* + a) + aa(b* + a) · distributes over +

= aa(b* + a) R + R = R.

Example/Quiz. Show that (a + aa)(a + b)* = a(a + b)*.

Proof: (a + aa)(a + b)* = (a + aa)a*(ba*)* (R + S)* = R*(SR*)*

= a( + a)a*(ba*)* R = R and · distributes over +

= aa*(ba*)* ( + R)R* = R*

= a(a + b)* (R + S)* = R*(SR*)* QED.

Example/Quiz. Show that a*(b + ab*) = b + aa*b*.

Proof: a*(b + ab*) = ( + aa*)(b + ab*) R* =  + RR*

= b + ab* + aa*b + aa*ab* · distributes over +

= b + (ab* + aa*ab*) + aa*b + is commutative and associative

= b + ( + aa*)ab* + aa*b R = R and · distributes over +

= b + a*ab* + aa*b R* =  + RR*

= b + aa*b* + aa*b R*R = RR*

= b + aa*(b* + b) · distributes over +

= b + aa*b*. R* = R* + R QED.

Example. Show that a* + abb*a = a* + ab*a.

Proof: Starting on the RHS, we have

a* + ab*a = a* + a( + bb*)a R* =  + RR*

= a* + aa + abb*a · distributes over +

= ( + aa*) + aa + abb*a R* =  + RR*

=  + (aa* + aa) + abb*a + is associative

=  + a(a* + a) + abb*a · distributes over +

=  + aa* + abb*a R* = R + R*

= a* + abb*a R* =  + RR* QED.

Example. Show that (a + aa + … + an)(a + b)* = a(a + b)* for all n ≥ 1.

Proof (by induction): If n = 1, the statement becomes a(a + b)* = a(a + b)*, which is true. If

n = 2, the statement becomes (a + aa)(a + b)* = a(a + b)*, which is true by a previous

example. Let n > 2 and assume the statement is true for 1 ≤ k < n. We need to prove the

statement is true for n. The LHS of the statement for n is

(a + aa + … + an)(a + b)*

= a(a + b)* + (aa + … + an)(a + b)* · distributes over +

= a(a + b)* + a(a + aa + … + an–1)(a + b)* · distributes over +

= a(a + b)* + aa(a + b)* induction assumption

= (a + aa)(a + b)* · distributes over +

= a(a + b)* induction assumption

The last line is the RHS of the statement for n. So the statement is true for n. Therefore, the

statement is true for all n ≥ 1. QED.

Example/Quiz: Use regular algebra to show that R + (R + S)* = (R + S)*.

Proof:

R + (R + S)* = R + [L + (R + S) + (R + S)(R+ S)*] R* =  + R + R2 + … + Rk–1 + RkR*

= L + (R + R) + S + (R + S)(R+ S)* + is associative

= L + R + S + (R + S)(R+ S)* R + R = R

= L + (R + S) + (R + S)(R+ S)* + is associative

= (R+ S)* R* =  + R + R2 + … + Rk–1 + RkR*

QED.

Take-home quiz: Use regular algebra to show that (a + ab)*a = a(a + ba)*.

Section 11.2 Finite Automata

Can a machine(i.e., algorithm) recognize a regular language? Yes!

Deterministic Finite Automata

A deterministic finite automaton (DFA) over an alphabet A is a finite digraph (where

vertices or nodes are called states) for which each state emits one labeled edge for each

letter of A. One state is designated as the start state and a set of states may be final states.

Example. Either of the following alternatives is acceptable for representing a DFA, where

final states are indicated by double circles.

The Execution of DFA for input string w  A* begins at the start state and follows a path

whose edges concatenate to w. The DFA accepts w if the path ends in a final state.

Otherwise the DFA rejects w. The language of a DFA is the set of accepted strings.

Example. The example DFA accepts the strings

a, b, ab, bb, abb, bbb, …, abn, bbn, ….

So the language of the DFA is given by the regular expression (a + b)b*.

Start

a, b

Quiz. Find a DFA for the language of a + aa*b.

Theorem (Kleene) The class of regular languages is exactly the same as the class of

languages accepted by DFAs.

Quiz. Find an DFA for each of the following languages over the alphabet {a, b}.

(a) . (b) {}. (c) {(ab)n | n  N}, which has regular expression (ab)*.

Start

a, b

Solution:

Start

a, b

(b):

Start

a, b

Solutions: (a):

Start

a, b

(c):

Table Representation of a DFA

A DFA over A can be represented by a transition function T : States  A  States, where

T(i, a) is the state reached from state i along the edge labeled a, and we mark the start and

final states. For example, the following figures show a DFA and its transition table.

Note: T can be extended to T : States  A*  States by

T(i, L) = i and T(i, aw) = T(T(i, a), w) for a  A and w  A*.

Quiz: Calculate T(0, bba).

Solution: T(0, bba) = T(1, ba) = T(1, a) = T(2, L) = 2.

Example/Quiz. Back to the problem of describing input strings over {a, b} that contain

exactly one substring bb. We observed that the strings could be described by the regular

expression (a + ba)*bb(a + ab)*. Find a DFA to recognize the language.

Start

a, b

Start

A solution:

a, b

Nondeterministic Finite Automata

A nondeterministic finite automaton (NFA) over an alphabet A is similar to a DFA except

that L-edges are allowed, there is no requirement to emit edges from a state, and multiple

edges with the same letter can be emitted from a state.

Example. The following NFA recognizes the language of a + aa*b + a*b.

Table representation of NFA

An NFA over A can be represented by a function T : States  A  {L}  power(States),

where T(i, a) is the set of states reached from state i along the edge labeled a, and we mark

the start and final states. The following figure shows the table for the preceding NFA.

Start

Theorem (Rabin and Scott) The class of regular languages is exactly the same as the

class of languages accepted by NFAs.

Quizzes. Find an NFA for each of the following languages over {a, b}.

(a) . (b) {}. (c) {(ab)n | n  N}, which has regular expression (ab)*.

ExampleQuiz. Back to the problem of describing input strings over {a, b} that contain

exactly one substring bb. We observed that the strings could be described by the regular

expression (a + ba)*bb(a + ab)*. Find an NFA to recognize the language.

Start

Solutions: (a):

Start

(b):

Start

(c):

Start

A solution:

Algorithm: Transform a Regular Expression into a Finite Automaton

Start by placing the regular expression on the edge between a start and final state:

Apply the following rules to obtain a finite automaton after erasing any -edges.

Quiz. Use the algorithm to construct a finite automaton for (ab)* + ba.

Start

Regular expression

R + S

transforms to

Start

Answer:

Algorithm: Transform a Finite Automaton into a Regular Expression

Connect a new start state s to the start state of the FA and connect each final state of the FA

to a new final state ƒ as shown in the figure.

If needed, combine all multiple edges between the same two nodes into one edge with label

the sum of the labels on the multiple edges. If there is no edge between two states, assume

there is an -edge.

Now eliminate each state k of the FA by constructing a new edge (i, j) for each pair of edges

(i, k) and (k, j) where i ≠ k and j ≠ k. The new label new(i, j) is defined in terms of the old

labels by the formula

new(i, j) = old(i, j) + old(i, k)old(k, k)*old(k, j)

Given

Connect to start state of FA

Connect from each final state of FA

A + BC*D

Example.

becomes

Quiz. Use the algorithm to transform the following NFA into a regular expression. (Half the

class eliminate state 0 then state 1 and half the class eliminate state 1 then state 0.)

Solution: Connect the NFA to new a start

state s and a new final state f as pictured.

First Solution: Eliminate state 0 to obtain:

new(s, 1) =  + La*a = a*a.

new(1, 1) =  + ba*a = ba*a.

Second Solution: Eliminate state 1 to obtain:

new(0, f) =  + a*L = a.

new(0, 0) = a + a*b = a + ab.

Eliminate state 1 to obtain:

new(s, f) =  + a*a(ba*a)*L = a*a(ba*a)*.

Eliminate state 0 to obtain:

new(s, f) =  + L(a + ab)*a = (a + ab)*a.

Start

a*a

ba*a

a*a(ba*a)*

a + ab

(a + ab)*a

Quiz. Use regular algebra to show the following equality from the previous quiz.

a*a(ba*a)* = (a + ab)*a.

Proof: a*a(ba*a)* = a*[a((ba*)a)*] · is associative

= a*[(a(ba*))*a] R(SR)* = (RS)*R

= a*[((ab)a*)*a] · is associative

= [a*((ab)a*)*]a · is associative

= (a + ab)*a R*(SR*)* = (R + S)*. QED.

Automata with Output (Mealy and Moore)

Mealy machine: Associate an output action with each

transition between states.

Challenge: Describe a two-floor elevator system with a Mealy or Moore machine.

Moore machine: Associate an output action with each state.

Theorem: Mealy and Moore machines are equivalent.

Example/Quiz. Output the complement of a binary number.

i/A

j/B

a/A

Start

0/1

1/0

Start

0/L

2/0

1/1

Solutions:

Section 11.3 Constructing Efficient Finite Automata

First we’ll see how to transform an NFA into a DFA. Then we’ll see how to transform a

DFA into a minimum-state DFA.

Transforming an NFA into a DFA

The l-closure of a state s, denoted l(s), is the set consisting of s together with all states that

can be reached from s by traversing l-edges. The l-closure of a set S of states, denoted

l(S), is the union of the l-closures of the states in S.

Example. Given the following NFA as a graph and as a transition table.

Some sample l-closures for the NFA are as follows:

l(0) = {0, 1, 2}

l(1) = {1, 2}

l(2) = {2}

l() = 

l({1, 2}) = {1, 2}

l({0, 1, 2}) = {0, 1, 2}.

Algorithm: Transform an NFA into a DFA

Construct a DFA table TD from an NFA table TN as follows:

1. The start state of the DFA is l(s), where s is the start state of the NFA.

2. If {s1, …, sn} is a DFA state and a  A, then

TD({s1, …, sn}, a) = l(TN(s1, a)  …  TN(sn, a)).

3. A DFA state is final if one of its elements is an NFA final state.

Example. Given the following NFA.

The algorithm constructs the following DFA transition table TD, where it is also written in

simplified form after a renumbering of the states.

Quiz. Use the algorithm to transform the following NFA into a DFA.

Solution: The algorithm constructs the following DFA transition table TD, where it is

also written in simplified form after a renumbering of the states.

Transforming an DFA into a minimum-state DFA

Let S be the set of states that can be reached from the start state of a DFA over A.

For states s, t  S let s ~ t mean that for all strings w  A* either T(s, w) and T(t, w) are

both final or both nonfinal. Observe that ~ is an equivalence relation on S. So it partitions S

into equivalence classes.

Observe also that the number of equivalence classes is the minimum number of states

needed by a DFA to recognize the language of the given DFA.

Algorithm: Transform a DFA to a minimum-state DFA

1. Construct the following sequence of sets of possible equivalent pairs of distinct states:

E0  E1  …  Ek = Ek+1,

where

E0 = {{s, t} | s and t are either both final or both nonfinal}

and

Ei+1 = {{s, t}  Ei | {T(s, a), T(t, a)}  Ei or T(s, a) = T(t, a)} for every a  A}.

Ek represents the distinct pairs of equivalent states from which ~ can be generated.

2. The equivalence classes form the states of the minimum state DFA with transition table Tmin defined by

Tmin([s], a) = [T(s, a)].

3. The start state is the class containing the start state of the given DFA.

4. A final state is any class containing a final state of the given DFA.

Example. Use the algorithm to transform the following DFA into a minimum-state DFA.

Solution: The set of states is S = {0, 1, 2, 3, 4}. To find the equivalent states calculate:

E0 = {{0, 4}, {1, 2}, {1, 3}, {2, 3}}

E1 = {{1, 2}, {1, 3}, {2, 3}}

E2 = {{1, 2}, {1, 3}, {2, 3}} = E1.

So 1 ~ 2, 1 ~ 3, 2 ~ 3. This tells us that S is partitioned by {0}, {1, 2, 3}, {4}, which we

name [0], [1], [4], respectively. So the minimum-state DFA has three states.

Quiz: What regular expression equality arises from the two DFAs?

Answer: a + aa + (aaa + aab + ab)(a + b)* = a(a + b)*.

a, b

Min-state Table

Renamed Table

a, b

Min-state DFA graph

Quiz. Is the following DFA a minimum-state DFA?

Answer: No. E0 = {{0, 1}, {0, 5}, {1, 5}, {2, 3}, {2, 4}, {3, 4}}

E1 = {{1, 5}, {2, 4}}

E2 = {{1, 5}, {2, 4}} = E1.

So 1 ~ 5 and 2 ~ 4. This gives us {0}, {1, 5}, {2, 4}, {3}, with names

[0], [1], [2], [3], respectively with the following minimum-state DFA.

E0 = {{0, 1}}

E1 = {{0, 1}} = E0.

So 0 ~ 1, which tells us that the partition is the whole set of states {0, 1} = [0]. Therefore,

we obtain the following minimum-state DFA.

Answer. No. Use the minimum-state algorithm.

a, b

Quiz. Is the following DFA a minimum-state DFA?

Section 11.4 Regular Language Topics

Regular languages are also characterized by special grammars called regular grammars

whose productions take the following form, where w is a string of terminals.

A  wB or A  w.

Example. A regular grammar for the language of a*b* is

S  L | aS | T

T  b | bT.

Quiz. Write a regular grammar for {ab, acb, accb. …, acnb, …}.

Solution: A regular expression for the language is ac*b. A regular grammar is

S  aT

T  b | cT.

Regular grammars as we have defined them are often called right-regular because there are

also left-regular grammars that require productions to have the form

A  Bw or A  w.

Any language with a right-regular grammar also has a left-regular grammar, and

conversely.

Example. For the regular expression a*bc* we have the following grammars.

Right regular: S  aS | bT

T  L | cT.

Left Regular: S  Sc | Tb

T  L | Ta.

Transforming an NFA to a Regular Grammar

State names become the nonterminals. So rename them to be uppercase letters.

2. The start state becomes the start symbol of the grammar.

3. For each state transition from I to J labeled with x construct a production I  xJ.

4. For each final state F construct a production F  L.

Example/Quiz. Transform the following NFA into a regular grammar.

Solution:

S  aI | F

I  aI | bF

F  L

This grammar can be simplified to

S  aI | L

I  aI | b

Notice in the previous example that right-regular grammars derive strings by examining the

leftmost letter first, while left-regular grammars examine the rightmost letter first.

Example/Quiz. Find a right-regular grammar and a left-regular grammar for the language

of the regular expression a(b + c)*de*.

Right regular: S  aT

T  bT | cT | dU

U  L | eU.

Left Regular: S  Se | Td

T  a | Tb | Tc.

Start

Example/Quiz. Transform the regular expression a*(ba)* into a regular grammar.

Solution: Draw an NFA and then use the algorithm.

An NFA:

The resulting grammar:

S  aS | bI | L

I  aF

F  bJ | L

J  aF.

Quiz. Simplify the grammar.

Answer:

S  aS | baF | L

F  baF | L

Quiz. What happens when a DFA has

an error state, like the following DFA?

Answer: The resulting grammar:

S  aF | bE

F  aE | bJ | L

J  aF | bE

E  aE | bE.

But E does not derive a

terminal string. So the

grammar simplifies to

S  aF

F  bJ | L

J  aF.

Quiz: Simplify the grammar

further.

Answer: S  aF

F  baF | L.

Start

a, b

Transforming a Regular Grammar to an NFA

1. Replace any production with multiple terminals by productions with single terminals.

2. The start state is the grammar start symbol.

3. Transform I  aJ into a transition from I to J labeled with a.

4. Transform I  J into a transition from I to J labeled with L.

5. Transform each I  a into a transition from I to new single final state F labeled with a.

6. The final states are F together with each state I with a production I  L.

Example/Quiz. Transform the following regular grammar into a NFA.

S  abS | T | L

T  cT | d

Quiz. What is the regular expression for the language of the grammar?

Answer: (ab)*(L + c*d)

Solution. Transform S  abS into S  aI

and I  bS, so the grammar becomes

S  aI | T | L

I  bS

T  cT | d

Now the NFA can be drawn:

Start

Properties of Regular Languages

When we know some properties of regular languages they can help us argue, BWOC, that

certain languages are not regular.

The Pumping Lemma

If L is an infinite regular language, then it is recognized by a DFA with, say, m states.

If s in L and | s | ≥ m, then an acceptance path for s must pass through some state twice.

The following graph depicts the situation.

The dotted arrows represent the path to acceptance for s and the letters x, y, and z represent

the concatenation of the letters along the edges of the path. So s = xyz and y ≠ L. Assume

that the middle state is the first repeated state on the path. So | xy | ≤ m. Since the loop can

be traversed any number of times, we have the pumping property: xykz  L for all k  N

Example. The language L = {anbn | n  N} is not regular.

Proof: Assume, BWOC, that L is regular. Since L is infinite, the pumping lemma applies.

Choose s = ambm. Then s = xyz, where y ≠ L, | xy | ≤ m, and xykz  L for all k  N.

Since | xy | ≤ m and s = ambm = xyz, it follows that xy is a string of a’s.

Since y ≠ L, it must be that y = ai for some i > 0.

We’ll try for a contradiction with k = 2. The pumping property implies xy2z  L.

But xy2z = am + ibm, which is not in L because i > 0.

This contradiction implies that L is not regular. QED.

Start

Quiz. In the previous proof we exhibited a contradiction when k = 2. Find similar

contradictions for k = 0 and k = 3.

Answer: (k = 0) The pumping property implies xy0z  L. In other words, xz  L.

But xz = am – ibm, which is not in L because i > 0. This contradiction implies that L is not

regular. QED.

(k = 3) The pumping property implies xy3z  L. But xy3z = am + 2ibm, which is not in L

because i > 0. This contradiction implies that L is not regular. QED.

Example/Quiz. Prove the language L = {aabnac2n | n  N} is not regular.

Proof: Assume, BWOC, that L is regular. Since L is infinite, the pumping lemma applies.

Choose s = aabmac2m. Then s = xyz, where y ≠ L, | xy | ≤ m and xykz  L for all k  N.

Since | xy | ≤ m, there are three cases for y:

Case 1: y = aabi for some 0 ≤ i ≤ m – 2. (Remember, | xy | ≤ m, so x = L in this case.)

Case 2: y = abi for some 0 ≤ i ≤ m – 2. (In this case, we have x = a.)

Case 3: y = bi for some 0 < i ≤ m – 2 – j. (In this case, x = aabj for some 0 ≤ j ≤ m – 3)

We can obtain a contradiction in each case by considering k = 0, which implies xz  L.

Case 1: xz = bm – iac2m which is not in L because strings of L must begin with aa.

Case 2: xz = abm – iac2m which is not in L because strngs of L must begin with aa.

Case 3: xz = aabm – iac2m which is not in L because strings in L must have twice as many c’s as b’s which can’t be the case since i > 0.

So each of the three cases yields a contradiction. Therefore L is not regular. QED.

Section 11.5 Context-Free Languages

We know that the language {anbn | n  N} is not regular, but it certainly has a non-regular

grammar, such as

S  aSb | L.

A context-free grammar has productions of the form

N  w

Where N is a nonterminal and w is any string containing terminals and/or nonterminals.

A context-free language is the set of strings derived from a context-free grammar.

Example. {anbn | n  N} is a C-F language derived from the C-F grammar S  aSb | L.

Example. Any regular grammar is context-free. So regular languages are C-F languages.

Quiz. Find a grammar for {anbn + 2 | n  N}.

Answer. S  aSb | bb.

Quiz. Find a grammar for{wwR| w  {a, b}*}, where wR is the reverse of w.

Answer. S  aSa | bSb | L.

Techniques for Constucting Grammars:

Let L and M be two C-F grammars with disjoint sets of nonterminals and with start symbols

A and B, respectively. Then

L  M has grammar S  A | B.
LM has grammar S  AB.
L* has grammar S  AS | L.

Example. Let L be the language of strings over {a, b} with the same number of a’s and b’s.

Does L have the following grammar?

S  aSbS | bSaS | L.

It’s easy to see that the language of the grammar is a subset of L.

What about the other way?

Assume that w  L and show w is derived by the grammar.

If w = L, then S  L.

Let w ≠ L and assume that if s  L and | s | < | w |, then S + s.

Show that S + w. Consider the four cases:

1. w = asb for some string s. In this case, s  L and | s | < | w |. So by induction we have

S + s. Therefore, we have S  aSbS  aSb + asb = w.

2. w = bsa for some string s. Similar to case 1.

3. w = axa for some string x. In this case, x has two more b’s than a’s. So x  L.

What do we do now?

Notice, for example, if | w | = 4, then w = abba. If | w | = 6, then w has one of the forms

aabbba, ababba, abbaba abbbaa.

We claim that x can be written in the form x = ubbv where u, v  L. (Can you prove it?)

So by induction we have derivations S + u and S + v. Therefore, we have

S  aSbS + aubS  aubbSaS + aubbvaS  aubbva = axa = w.

4. w = bxb for some string x. Similar to case 3.

QED.

Section 11.6 Pushdown Automata

A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations

pop, push, and nop. PDAs always start with one designated symbol on the stack. A state

transition depends on the input symbol and the top of the stack. The machine then performs

a stack operation and enters the next state.

Representation can be graphical or with sets of 5-tuples. For example:

or (i, b, C, pop, j)

Execution: If the machine is in state i and the input letter is b and C is on top of the stack,

then pop the stack and enter state j.

Nondeterminism can occur in two ways, as in the following examples.

Acceptance: A string w is accepted by a PDA if there is a path from the start state to a

final state such that the input symbols on the path edges concatenate to w. Otherwise, w

is rejected.

Example. A PDA to accept the language {anbn | n > 0} as a graph and as a set of 5-tuples.

(0, a, X, push(a), 0)

(0, a, a, push(a), 0)

(0, b, a, pop, 1)

(1, b, a, pop, 1)

(1, L, X, nop, 2).

Quiz. How would you modify the machine to accept {anbn | n  N}?

Answer: Add the instruction (0, L, X, nop, 2).

Example/Quiz. Find a PDA to accept the language {a2nbn | n  N}.

ID for input aaaabb:

(0, aaaabb, X)

(0, aaabb, aX)

(0, aabb, aaX)

(0, abb, aaaX)

(0, bb, aaaaX)

(1, b, aaaX)

(2, b, aaX)

(1, L, aX)

(2, L, X)

(3, L, X).

A solution:

Start

Empty-Stack Acceptance. A PDA can also be defined to accept a string by the condition

that the stack is empty. The two types of acceptance are equivalent in that they both define

PDAs that accept the same class of languages.

Transform final-state acceptance to empty-stack acceptance:

Introduce a new start state s, a new start symbol Y, a new empty state e, and construct edges

and actions as shown in the diagram below. Note that ? stands for any stack symbol.

Quiz: If the final-state PDA is deterministic, the transformed empty-stack PDA could be

nondeterministic. Why?

Answer: If the final-state PDA has an edge emitted from a final state, then the empty-stack

PDA constructs a nondeterministic choice from that final state. For example, a final-state

PDA for {abncn | n  N} could have a final state k to accept the string a that also emits an

edge to take care of any string abn + 1cn + 1. The empty-stack PDA would construct a

nondeterministic choice from k by adding instructions of the form (k, L, ?, pop, e).

Start

final-state PDA

Transform empty-stack acceptance to final-state acceptance:

Introduce a new start state s, a new start symbol Y, a new final state ƒ, and construct edges

and actions as shown in the diagram below.

Quiz: If the empty-stack PDA is deterministic, then the transformed final-state PDA is also

deterministic. Why?

Answer: Because the instructions of the empty-stack PDA do not use stack symbol Y. So the

only time a new instruction can be executed is when Y is on top of the stack, meaning that

the empty-stack PDA has emptied it’s original stack and can’t move.

Theorem: The context-free languages are exactly the languages accepted by PDAs.

Proof: We’ll see two algorithms, one to transform a C-F grammar into an empty-stack

PDA, and one to transform an empty-stack PDA into a C-F grammar. QED.

Start

empty-stack PDA

Transform a C-F grammar to empty-stack PDA

The idea is to construct a PDA whose execution corresponds to a leftmost derivation.

The PDA has one state, but we allow multiple stack operations. The stack symbols are the

terminals and nonterminals of the grammar. The designated starting stack symbol is the

grammar start symbol. We’ll illustrate the algorithm with an example.

Example. A leftmost derivation of abbcb:

S  aSb

 aBCb

 abBCb

 abbCb

 abbcCb

 abbcb.

(0, abbcb, S)

(0, abbcb, aSb)

(0, bbcb, Sb)

(0, bbcb, BCb)

(0, bbcb, bBCb)

(0, bcb, BCb)

(0, bcb, bCb)

(0, cb, Cb)

(0, cb, cCb)

(0, b, Cb)

(0, b, b)

(0, L, L).

Example. ID for input string abbcb:

Terminals Corresponding PDA Instructions

a (0, a, a, pop, 0)

b (0, b, b, pop, 0)

c (0, c, c, pop, 0)

Productions

S  aSb (0, L, S, pop, push(b), push(S), push(a), 0)

S  BC (0, L, S, pop, push(C), push(B), 0)

B  bB (0, L, B, pop, push(B), push(b), 0)

B  b (0, L, B, pop, push(b), 0)

C  cC (0, L, C, pop, push(C), push(c), 0)

C  L (0, L, C, pop, 0)

Grammar

S  aSb | BC

B  bB | b

C  cC | L.

Transform an empty-stack PDA to a C-F grammar

The idea is to construct a grammar whose leftmost derivations correspond to computation

sequences. For each stack symbol B and each pair of states i and j construct a nonterminal

Bij from which a leftmost derivation will correspond to a computation sequence that starts

in state i with B on the stack and ends in state j with B popped from the stack.

Bij  a

Bik  aBjk for each state k.

Bil  aCjkBkl for each state k and l.

S  Xsj for each state j. (S is start symbol)

Corresponding Grammar Productions

PDA Instruction

(For the following instructions, the dotted lines represent

possible paths to states that pop the desired stack symbol.)

Start

Example/Quiz. Use the algorithm to

transform the following empty-stack PDA

into a C-F grammar.

Quiz: Simplify the preceding grammar by

deleting productions that do not derive terminal

strings and productions that are not reachable

from the start symbol.

Solution:

The start state 0 with X on the stack gives:

S  X00 | X01

The pop operation (1, a, X, pop, 1) gives:

X11  a

The pop operation (1, a, A, pop, 1) gives:

A11  a

The push operation (0, a, X, push(A), 0) gives:

X00  aA00X00 | aA01X10

X01  aA00X01 | aA01X11

The nop operation (0, a, A, nop, 1) gives:

A00  aA10

A01  aA11

Solution: S  X01

X01  aA01X11

A01  aA11

X11  a

A11  a

or, S  aaaa.

Start

Example. We’ll try to find a grammar for the language L = {w  {a, b}* | na(w) = nb(w)} by

constructing an empty-stack PDA to accept L and then transforming it into a C-F grammar.

Consider the following single-state empty-stack PDA.

Idea of Proof: Assume, WLOG, that the input

string begins with a. Then A is pushed and this

action continues for each consecutive a. When the

first b occurs, A is popped, which means that an a-b

pair has been counted. The only way to get back to

X on top is if the string has an equal number of a’s

and b’s so far. If the input is exhausted, then pop X

and accept the string. Otherwise, the process

continues as described.

PDA Transformed into C-F grammar:

S  X00

X00  L | aA00X00 | bB00X00

A00  b | aA00A00

B00  a | bB00B00 .

A Simplified C-F grammar:

S  L | aAS | bBS

A  b | aAA

B  a | bBB.

Example Derivation of aababb:

S  aAS  aaAAS  aabAS

 aabaAAS  aababAS

 aababbS  aababb.

Start

Nondeterministic PDAs are more powerful than deterministic PDAs.

An example is to consider the language of even palindromes over {a, b}. A context-free

grammar for the language is given by

S  L | aSa | bSb

Any PDA to accept the language must make a nondeterministic decision to start comparing

the second half of a string with the reverse of the first half.

Quiz: Transform the grammar into a PDA

A Solution (using the algorithm)

(0, a, a, pop, 0)

(0, b, b, pop, 0)

(0, L, S, pop, 0)

(0, L, S, pop, push(a), push(S), push(a), 0)

(0, L, S, pop, push(b), push(S), push(b), 0).

Examples/Quizzes. For a string x and letter a let na(x) be the number of a’s in x.

Let L = {x  {a, b}* | na(x) = nb(x)}. A grammar for L with start symbol E can be written as:

E  aEbE | bEaE | L.

Use this information to find grammars for the following languages.

{x  {a, b}* | na(x) = 1 + nb(x)}.

Solution: S  EaE.

2. {x  {a, b}* | na(x) = 2 + nb(x)}.

Solution: S  EaEaE.

3. {x  {a, b}* | na(x) > nb(x)}.

Solution: S  EaET

T  aET | L.

4. {x  {a, b}* | na(x) < nb(x)}.

Solution: S  EbET

T  bET | L.

5. {x  {a, b}* | na(x) ≠ nb(x)}.

Solution: This language is the union of the languages in (3) and (4). Rename the

nonterminals in the grammars for (3) and (4) as follows:

(3) A  EaET

T  aET | L.

(4) B  EbEU

U  bEU | L.

Then S  A | B is the desired grammar.

Section 11.7 Context-Free Language Topics

We know (via a theorem) that the context-free languages are exactly those languages that

are accepted by PDAs.

When a context-free language can be recognized by a deterministic final-state PDA, it is

called a deterministic context-free language.

An LL(k) grammar has the property that a parser can be constructed to scan an input string

from left to right and build a leftmost derivation by examining next k input symbols to

determine the unique production for each derivation step.

If a language has an LL(k) grammar, it is called an LL(k) language.

LL(k) languages are deterministic context-free languages, but there are deterministic

context-free languages that are not LL(k). (See text for an example on page 789.)

Example. Consider the language {anb | n  N}.

It has the LL(1) grammar S  aS | b.

A parser can examine one input letter to decide whether to use S  aS or S  b for

the next derivation step.

(2) It has the LL(2) grammar S  aaS | ab | b.

A parser can examine two input letters to determine whether to use S  aaS or S  ab

for the next derivation step. Notice that the grammar is not LL(1).

(3) Quiz. Find an LL(3) grammar that is not LL(2).

Solution. S  aaaS | aab | ab | b.

Answer: Any derivation starts with S  AB. The next derivation step uses one of the

productions A  aAb or A  L, depending on whether the scanned input letter is a or not.

The argument is the same for B-productions.

It is not LL(1): Let the input be ab. The first letter a tells us to start with S  aSb.

The letter a in aSb matches the first input letter, so we lookahead to the second input letter b.

This tells us we must use S  T to get S  aSb  aTb.

The lookahead remains at b, so we can’t determine whether it is the last b of the input string.

So we don’t know whether to choose T  bT or T  L for the next step. Thus the grammar is

not LL(1).

It is not LL(2): Let the input be aabb. The first two input letters aa, tell us S  aSb.

The letter a in aSb matches the first input, so we lookahead to the next two-letter substring ab.

We must use S  aSb to get S  aSb  aaSbb.

Now the lookahead becomes bb, so we must use S  T to get S  aSb  aaSbb  aaTbb.

The lookahead remains at bb, so we can’t determine whether these are the last two b’s of the

input string. So we don’t know whether to choose T  bT or T  L for the next step. Thus

the grammar is not LL(2).

Quiz (on your time): Find a general argument to show the grammar is not LL(k).

Example/Quiz. Why is the following grammar

for {anbn + k | n, k  N} an-LL(1) grammar?

S  AB

A  aAb | L

B  bB | L.

Example. The following grammar for {anbn + k | n, k  N}

is not LL(k) for every k.

S  aSb | T

T  bT | L.

Example. The language {an + kbn | k, n  N} is not deterministic context-free. So it has no

LL(k) grammar for every k.

Proof: Any PDA for the language must keep a count of the a’s with the stack so that when the

b’s come along the stack can be popped with each b. But there might still be a’s on the stack

(e.g., when k > 0), so there must be a nondeterministic state transition to a final state from the

popping state. i.e., We need two instructions like, (i, b, a, pop, i) and (i, L, a, pop, final).

Note: The previous two examples show that {ambn | m ≤ n} is LL(1) but {ambn | m ≥ n} is not

LL(k) for any k.

Grammar Transformations

Left factoring: Sometimes we can “left-factor” an LL(k) grammar to obtain an equivalent

LL(n) grammar where n < k.

Example. The grammar S  aaS | ab | b is LL(2) but not LL(1). But we can factor out the

common prefix a from productions S  aaS | ab to obtain

S  aT

T  aS | b.

This gives the new grammar: S  aT | b

T  aS | b.

Quiz: Find an LL(k) grammar where k is as

small as possible that is equivalent to the

following grammar.

S  abS | abcT | ab

T  cT | c.

Solution: The given grammar is LL(3)

and it can be left-factored to become

an LL(1) grammar as follows:

S  abR

R  S | cT | L

T  cU

U  T | L.

Removing left recursion: A left-recursive grammar is one which has a derivation of the form

A + Ax

for some nonterminal A and sentential form x.

Left-recursive grammars are not LL(k) for any k.

Example. The language {ban | n  N} has a grammar S  Sa | b, which is left-recursive. It is

not LL(k) for any k. Consider the following cases:

LL(1) case: If the input string is ba, the lookahead is b. So we don’t know whether there are

any a’s to the right of b. Therefore we don’t know which production to start the derivation.

LL(2) case: If the input string is baa, the lookahead is ba. So the derivation starts with S 

Sa. But the a in Sa denotes the a at the right end of the derived string. So the input string

could be ba or baa. Therefore we don’t know which production to choose next.

LL(k) case: If the input string is ba…a with k a’s, the lookahead is ba…a with k – 1 a’s. So

the derivation after k –1 steps is S  Sa  Saa  …  Sa…a. Now, the input string could

be ba…a (length k) or ba…a (length k + 1). Therefore we don’t know which production to

choose next.

Sometimes we can obtain an LL(k) grammar by removing left recursion.

Algorithm Idea for direct left recursion:

Transform: A  Aw | Au | Av | a | b.

To: A  aB | bB

B wB | uB | vB | L.

Example/Quiz. Remove left recursion.

S  Sa | b.

Solution. S  bT

T  aT | L.

It is LL(1).

Example/Quiz. Remove left recursion.

S  Saa | aab | aac.

Solution: S  aabT | aacT

T  aaT | L.

It is LL(3).

Example (Removing indirect left recursion). Consider the following grammar.

S  Ab | a

A  Sa | b.

The grammar is left-recursive because of the indirect left recursion S  Ab  Sab.

To remove the indirect left recursion, replace A in S  Ab by the right side of A  Sa | b to

obtain S  Sab | bb. The grammar becomes S  Sab | bb | a. Remove the left recursion:

S  bbT | aT

T  abT | L.

Quiz: Remove left recursion from the grammar: S  Ab | a

A  SAa | b.

Solution: Replace A in S  Ab | a by the right side of A  SAa | b to obtain

S  SAab | bb | a

A  SAa | b.

Now remove the direct left recursion: S  bbT | aT

T  AabT | L

A  SAa | b.

Quiz: Rewrite the previous solution as LL(1).

Solution: S  aaU

U  bT | cT

T  aaT | L.

Remove L-productions from grammars for langauges without L.

Find nonterminals that derive L.
For each production A  w construct all productions A  w’ where w’ is obtained from

w by removing one or more occurrences of the nonterminals from Step 1.

Combine the original productions with those of step 2 and eliminate any L-productions.

Example. Remove L-productions from the grammar

S  ABc

A  aA | L

B  bB | L.

Solution. Step 1: The nonterminals A and B derive L.

Step 2: From the production S  ABc we construct S  Bc | Ac | c.

From the production A  aA we construct A  a.

From the production B  bB we construct B  b.

Step 3: S  ABc | Bc | Ac | c

A  aA | a

B  bB | b.

Quiz. Remove L -productions from

S  ABc | Ab | c

A  ABa | L

B  Bbc | L.

Solution.

S  ABc | Ab | c| Bc | Ac | b

A  ABa | Ba | Aa | a

B  Bbc | bc.

Chomsky Normal Form. Productions have one of the following forms

A  a (a a terminal)

A  BC

S  L (if L is in the language).

Advantages: Parse trees are binary, which are easy to represent. Any string of length n > 0

can be derived in 2n – 1 steps.

Algorithm. Transform to Chomsky normal form (with the additional property that no start symbol occurs on the right side of a production)

1. If the start symbol S occurs on some right side, create a new start symbol S´ and a

new production S´  S.

2. Remove A  L (if A ≠ S) by previous algorithm. (If S  L is removed, add it back.)

3. Remove unit productions (i.e., A  B): If A  B or A + B, then construct productions

A  w where B  w is not a unit production. Now remove all unit productions.

4. For each production whose right side has two or more symbols, replace all occurrrences

of each terminal a with a new nonterminal A and also add the new production A  a.

5. Replace each production B  C1…Cn with n > 2 with B  C1D where D  C2 …Cn.

Repeat this step until all right sides have length two.

Example. Construct a Chomsky normal form for the grammar

S  aSb | D

D  Dc | L.

Solution.

Step 1: Add the production S´  S.

Step 2: S´  S | L

S  aSb | ab | D

D  Dc | c.

Step 3: S´  aSb | ab | Dc | c | L

S  aSb | ab | Dc | c

D  Dc | c.

Step 4: S´  ASB | AB | DC | c | L

S  ASB | AB | DC | c

D  DC | c

A  a

B  b

C  c.

Step 5: Replace S´ ASB and S  ASB by

S´ AE, S´ AE, and E  SB.

Quiz. Construct a Chomsky normal form for the grammar

S  aTbb | U | L

T  cT | c

U  Ud | d.

Solution. Step 1: No change. Step 2: No change.

Step 3: Remove unit production S  U to obtain

S  aTbb | Ud | d | L

T  cT | c

U  Ud | d.

Step 4: Transform right sides of length at least two into strings of nonterminals.

S  ATBB | UD | d | L

T  CT | c

U  UD | d.

A  a

B  b

C  c

D  d.

Step 5: Replace S  ATBB with the productions

S  AE, E  TF, F  BB.

Greibach Normal Form. Productions have one of the following forms

A  b (b a terminal)

A  bD1…Dk

S  L (if L is in the language).

Advantage: Any string of length n > 0 can be derived in n steps.

Algorithm (idea). Transform context-free grammar to Greibach normal form.

Perform steps 1, 2, and 3 of the Chomsky algorithm.
Remove all left-recursion, including indirect, without adding L

3. Make substitutions to transform the grammar into the proper form.

Example. Put the following grammar into Greibach normal form.

S  AB | Ac | d

A  aA | a

B Ab | c.

Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are non needed.

Step 3: Replace A in S  AB | Ac | d with aA | a to obtain

S  aAB | aB | aAc | ac | d.

Replace A in B Ab | c with theright side of A  aA | a to obtain B aAb | ab | c.

Now add the new productions C  c and D  b to obtain the proper form:

S  aAB | aB | aAC | aC | d

A  aA | a

B aAD | aD | c

C  c

D  b.

Example. Put the following grammar into Greibach normal form.

S  AB | Ac | d

A  aA | a

B Ab | c.

Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are not needed.

Step 3: Replace A in S  AB | Ac | d with aA | a to obtain

S  aAB | aB | aAc | ac | d.

Replace A in B Ab | c with theright side of A  aA | a to obtain

B aAb | ab | c.

Now add the new productions C  c and D  b and make appropriate replacements

to obtain the proper form:

S  aAB | aB | aAC | aC | d

A  aA | a

B aAD | aD | c

C  c

D  b.

Properties of Context-Free Languages

When we know some properties of context-free languages they can help us argue, BWOC,

that certain languages are not context-free.

The Pumping Lemma

If L is an infinite context-free language, then any grammar for L must be recursive, so there

must be derivations of the the following form where u, v, w, x, and y are terminal strings.

S + uNy

N + vNx (where v and x are not both L)

N + w.

These derivations lead to derivations like

S + uNy + uvNxy + uv2Nx2y + uvkNxky + uvkwxky  L for all k  N.

This is the basis for the Pumping Lemma:

There is an integer m > 0 such that if z  L and | z | ≥ m, then z has the form z = uvwxy where

1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky  L for all k  N.

Note: The number m depends on the grammar as we’ll see in the following example.

For this grammar m = 4 can be used in the pumping lemma because any derivation of a

string z with | z | ≥ 4 must use the nonterminal N. For example, if | z | = 8 and z = abcccccd,

then the pumping lemma factors z = abcccccd = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ 4 and

uvkwxky  L for all k  N. In this case let u = a, v = L, w = b, x = c, and y = ccccd.

Example. The language L = {anbncn+k | k, n  N} is not context-free.

Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies.

Choose z = ambmcm where m is the positive integer from the lemma.

Then z = ambmcm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky  L for all k  N.

Observe neither v nor x can contain distinct letters. For example, if v = …a…b…, then

v2 = …a…b……a…b…, which can’t appear as a substring of any string in L.

So v and x must be strings of repeated occurrences of a single letter.

Now since | vwx | ≤ m, there are two possible places in ambmcm where v and x must occur:

(1) v and x occur in ambm.

(2) v and x occur in bmcm.

But we obtain the following contradictions because v and x are not both L.

(1) Let k = 2 to obtain uv2wx2y = am+ibm+jcm, where i > 0 or j > 0. So uv2wx2y  L

(2) Let k = 0 to obtain uwy = ambm-icm–j, where i > 0 or j > 0. So we have uwy  L.

These contradictions imply that L is not context-free. QED.

Example. Suppose we have the following

grammar for {L, bbc}  {abcnd | n  N}.

S  aNd | bbc | L

N Nc | b.

Here are a few derivations:

S  aNd  abd

S  aNd  aNcd  abcd

S  aNd  aNcd  aNccd  abccd

S + abckd for any k in N.

Example/Quiz. Prove that the language L = {ss | s  {a, b}*} is not context-free.

Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies.

Choose z = ambmambm where m is the positive integer from the lemma.

Then z = ambmambm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky  L for all k  N.

Now since | vwx | ≤ m, there are three possible places in ambmambm where v and x must occur:

(1) v and x occur in ambm (on the left of z).

(2) v and x occur in bmam (in the center of z).

(3) v and x occur in ambm (on the right of z).

Notice that v and x can consist only of repetitions a single letter. For example, in case (1)

suppose v = aibj for some i > 0 and j > 0 and x = bn for some n ≥ 0. Then, letting k = 0, we

would obtain uwy = am–ibm–j–nambm, which cannot be in L. The argument is similar for the

other cases. So v and x must consist only of repetitions of a single letter.

We need to find a contradiction in each of the three cases. We’ll do it by using k = 0.

This tells us that uwy  L. But we obtain the following contradictions because v and x are not

both L.

(1) uwy = am–ibm–jambm where either i > 0 or j > 0 So uwy  L,

(2) uwy = ambm–iam–jbm where either i > 0 or j > 0. So uwy  L.

(3) uwy = ambmam–ibm–j where either i > 0 or j > 0. So uwy  L.

Therefore L is not context-free. QED.

Remark: Be careful that the choice of z is not in a context-free sublanguage of L. For example, if we chose z = (ab)m(ab)m in the preceding example, we would not get any contradictions.