CFG Reduction assignment

profilebeststudent143
CFGReducePart21.docx

COSC5315 Program-5 Fall 2015

CFG Reduction (Part-2)

Background:

There are two main reasons to reduce CF grammars:

· While empty-production rules and unit-productions are useful, they may become obstacles in some applications such as top-down compiler parser construction and conversions to certain normal forms (Chomsky Normal form and Greibach Normal form in particular). So, in such cases, their elimination is necessary.

· A grammar symbol is useless (and hence its very existence should be denied and deleted) if it is not Derivable (or Reachable) or it is not Live.

It is Reachable if it can be derived by the Start symbol of the grammar. By definition, the Start symbol of a grammar is Reachable but others may or may not be. A grammar symbol is Live if it can derive a terminal string (a string consisting only of zero or more terminal symbols not including any non-terminal symbol). By definition, a terminal symbol is Live as we believe that each terminal can derive itself (which is a terminal string of one terminal) in zero step of derivation.

So, each useless symbol should be eliminated from the grammar. Good news is that the elimination of all useless symbol is straightforward.

All that needs to be done is delete a production rule if it contains a useless symbol on either side of the rule and no compensation is needed, which makes the elimination process very simple.

· While their eliminations are simple and easy the identifications of useless symbols are not quite that simple partly because the elimination of one useless symbol may make what appears to be a useful symbol useless all of a sudden. Therefore, for the sake of efficiency, it is desirable to find and eliminate all dead (i.e., those that are not Live) symbols first before attempting to find symbols that are not Reachable.

Task: (Second part of CFG Reduction)

This program constitutes the continuation part of CF grammar reduction process. You must assume that all useless symbols have been identified and any rule containing those useless symbols on either side of a rule have already deleted.

Your program will output the following:

a. The input grammar rules.

b. All nullable nonterminals

c. All unit-production rules, both explicit and implicit.

Note that the input grammar has the following two explicit unit rules,

among others:

Body --> S

S --> SS

They are explicit in the sense that they actually appear in the grammar and

Visible as there are others not exactly visible nor explicit.

Because of these two unit rules, we must assume that we also have the following implicit (hidden) unit rule: Body --> SS

While the first two are explicit this last one is called implicit (as it is implied by the first two)

Input grammar:

Use the following input grammar to test run your program:

(Note that every Nonterminal must appear at least once on the LHS of a production rule. Hence if a symbol does not appear even once as the LHS

of a production rule, it must be a terminal symbol.)

Program --> Block

Block --> begin Body end

Body --> S ; Body

Body --> S

S --> Dec SS

S --> SS

Dec --> int IdList

Dec --> real IdList

Dec --> Empty-string

SS --> Block

SS --> Id = Exp

SS --> Print ExpList SR

SS --> Print IdList

SS --> Empty-string

SS --> If Exp then SS

SS --> Read IdList

SS --> Some More

IdList --> Id , IdList

IdList --> Id

Id --> aa

SR --> SS : SR

Some --> More Exp

ExpList --> Exp

ExpList --> Exp , ExpList

Exp --> Exp + Term

Exp --> Term

Term --> Term * Factor

Term --> Factor

Factor --> Primary ** Factor

Factor --> Primary

Primary --> id

Primary --> Empty-string