Simplification of Context Free Grammars | Theory of Computation - Computer Science Engineering (CSE) PDF Download

Part II. Simplification of Context Free Grammars

3 Simplification of CFGs

A context-free grammar (CFG) can often be transformed into a simpler but equivalent grammar that generates the same language. The usual simplification steps are:

  • Eliminating useless symbols (non-generating and non-reachable symbols).
  • Eliminating ε-productions (productions that produce the empty string).
  • Eliminating unit productions (productions of the form A → B where both A and B are non-terminals).

3.1 Eliminating Useless Symbols

A symbol (terminal or non-terminal) of a grammar is called useful if it occurs in the derivation of some string of terminals from the start symbol. A non-terminal is useful precisely when it is both generating and reachable.

Generating. A non-terminal Y is called generating if it can derive a string consisting only of terminals. Formally, Y is generating if there exists a terminal string \\(w \in T^*\\) such that \\(Y \Rightarrow^* w\\).

3.1 Eliminating Useless Symbols

Reachable. A non-terminal Y is called reachable if there is a derivation from the start symbol S that produces a sentential form containing Y. Formally, S \\(\\Rightarrow^*\\) αYβ for some strings \\(α,β\\).

3.1 Eliminating Useless Symbols

If a non-terminal is not generating or not reachable, it is useless and all productions containing it may be removed. The usual procedure is two-phase:

  1. Find all generating non-terminals by repeatedly marking any non-terminal that has a production whose RHS consists only of terminals and already marked non-terminals.
  2. Find all reachable symbols by starting at the start symbol and marking every non-terminal that appears in the RHS of a production whose LHS has been marked reachable; continue until no new symbols are found.
  3. Remove all productions that involve a non-terminal that is not both generating and reachable; also remove any terminals that appear only with removed productions.

Worked examples

Example 1

Grammar:

S → AB | a
A → b

Start symbol: S

Analysis and simplification:

  • Generating: A derives the terminal string b, and S derives the terminal string a. Symbol B has no production that leads to terminals, so B is non-generating.
  • Remove any production containing the non-generating symbol B: remove S → AB.
  • After removal the grammar is S → a, A → b. Now A is not reachable from S (no derivation from S uses A), so remove A → b.
  • Final reduced grammar: S → a. No useless symbols remain.

Example 2

Grammar:

S → aB | bX
A → BAd | bSX | a
B → aSB | bBX
X → SBD | aBx | ad

Start symbol: S

Analysis and simplification:

  • Step 1 - generating symbols: scan productions to find non-terminals that can derive terminal strings.
    • A has production A → a, so A is generating.
    • X has production X → ad, so X is generating.
    • Since S → bX and X is generating, S is generating as well.
    • B appears only in productions with non-generating patterns (its productions include B → aSB and B → bBX), and no combination gives a terminal string; thus B is non-generating.
  • Remove productions containing non-generating B. Grammar becomes:
    • S → bX
    • A → bSX | a
    • X → ad
  • Step 2 - reachability: start at S. X is reachable via S → bX. A is not reachable from S (no RHS from reachable non-terminals contains A), so remove A productions.
  • Final reduced grammar:
    • S → bX
    • X → ad

Example 3

Grammar:

A → xyz | Xyzz
X → Xz | xY z
Y → yY y | Xz
Z → Zy | z

Start symbol: A

Analysis and simplification:

  • A has production A → xyz and Z → z, so A and Z are generating.
  • X and Y cannot derive a string of terminals because their productions always keep non-terminals that do not terminate; so X and Y are non-generating.
  • Remove productions of X and Y, giving:
    • A → xyz
    • Z → Zy | z
  • Now check reachability from A; Z does not appear in any RHS reachable from A, so Z is not reachable. Remove Z productions.
  • Final reduced grammar: A → xyz.

Example 4

Grammar:

S → aC | SB
A → bSCa
B → aSB | bBC
C → aBC | ad

Start symbol: S

Analysis and simplification:

  • C → ad shows C is generating.
  • S → aC with generating C makes S generating.
  • A → bSCa contains S and C (both useful), so A is generating.
  • B appears on the RHS of its own productions and there is no termination without itself; so B is non-generating. Remove productions that contain B.
  • After removal the grammar becomes:
    • S → aC
    • A → bSCa
    • C → ad
  • Check reachability from S: A is not reachable. Remove A.
  • Final reduced grammar:
    • S → aC
    • C → ad

3.2 Removal of Unit Productions

Unit production is a production of the form A → B where A and B are non-terminals. The aim is to eliminate chains of unit productions while preserving the language (except possibly the empty string if special care is required).

Standard procedure to remove unit productions:

  1. Compute the unit-closure relation: for each non-terminal A, find all non-terminals B such that A ⇒* B using only unit productions.
  2. For every pair (A, B) in this relation and for every non-unit production B → α (where α is not a single non-terminal), add the production A → α to the grammar.
  3. After adding those productions, remove all original unit productions.

Worked examples

Example 1

Grammar:

S → AB
A → a
B → C | b
C → D
D → E
E → a

Start symbol: S

Analysis and simplification:

  • Unit chain found: B → C → D → E.
  • Since E → a, propagate terminal production back along the chain: add D → a, then C → a, then B → a.
  • Now remove unit productions B → C, C → D, D → E.
  • After removing unit productions, remove unreachable symbols (here C, D, E are not reachable from S), leaving:
    • S → AB
    • A → a
    • B → a | b

Example 2

Grammar (expression grammar fragment):

S → a | b | Sa | Sb | S0 | S1
F → S | (E)
T → F | T * F
E → T | E + T

Start symbol: E

Analysis and simplification:

  • Unit productions: F → S, T → F, E → T.
  • Remove F → S by adding all non-unit productions of S to F, giving F → a | b | Sa | Sb | S0 | S1 | (E).
  • Remove T → F by adding non-unit productions of F to T.
  • Remove E → T by adding non-unit productions of T to E.
  • The resulting grammar contains no unit productions; the original structure of operator-precedence productions such as T * F and E + T is preserved.

Example 3

Grammar:

S → A | bb
A → B | b
B → S | a

Start symbol: S

Analysis and simplification:

  • Unit productions: S → A, A → B, B → S. These form a unit cycle between S, A and B.
  • Compute reachable terminals via unit chains:
    • S → A → B → a gives S → a.
    • S → A → B → S → bb gives S → bb (already present).
    • Similarly propagate to obtain:
      • A → a | bb | b
      • B → a | bb | b
  • Remove unit productions. Final grammar:
    • S → a | b | bb
    • A → a | b | bb
    • B → a | b | bb

3.3 Removal of ε-Productions

A production of the form A → ε (where ε denotes the empty string) is an ε-production. To eliminate ε-productions while preserving the language (except possibly the empty string), follow this standard method:

  1. Find the set of nullable non-terminals: a non-terminal A is nullable if A ⇒* ε.
  2. For every production B → α, where α is a sequence of terminals and non-terminals, generate new productions by taking all possibilities of deleting some (but not all) nullable non-terminals from α. Add these new productions (excluding the case that yields an empty RHS unless the start symbol must generate ε).
  3. Remove all explicit ε-productions (except possibly S → ε if the language originally contained ε and one wishes to retain it; special care is needed for the start symbol).

The algorithm is usually implemented as: compute nullable set, then for each production iterate over each occurrence of nullable non-terminals and add new productions with those occurrences omitted (for all combinations). Finally, delete the original ε-productions.

Worked examples

Example 1

Grammar:

S → aA
A → b | ε

Start symbol: S

Analysis and simplification:

  • Nullable set: A is nullable because A → ε.
  • For production S → aA, generate an additional production by deleting nullable A: add S → a.
  • Remove the ε-production A → ε. Final grammar:
    • S → aA | a
    • A → b

Example 2

Grammar:

S → ABAC
A → aA | ε
B → bB | ε
C → c

Start symbol: S

Analysis and simplification:

  • Nullable set: A and B are nullable.
  • For S → ABAC consider all combinations of deleting occurrences of nullable A and B (but preserve at least one symbol on RHS unless original grammar allowed S → ε and it must be retained). The generated productions include:
    • S → ABAC (original)
    • S → ABC (delete the first A)
    • S → BAC (delete the second A)
    • S → BC (delete both A occurrences)
    • S → AAC, S → AC, S → C, etc., according to combinations that arise when handling multiple nullable occurrences; include all that make sense and are not duplicates.
  • Also replace A → aA by adding A → a because A is nullable.
  • Similarly for B → bB add B → b.
  • Remove explicit ε-productions A → ε and B → ε.
  • One consistent resulting grammar is:
    • S → ABAC | ABC | BAC | BC | AAC | AC | C
    • A → aA | a
    • B → bB | b
    • C → c
    which contains no ε-productions.

Example 3

Grammar:

S → aSa | bSb | ε

Start symbol: S

Analysis and simplification:

  • S is nullable because S → ε.
  • For each recursive production, remove nullable occurrences of S within the RHS to produce shorter alternatives:
    • From S → aSa we obtain S → aa by deleting the inner S.
    • From S → bSb we obtain S → bb.
  • Remove the ε-production if we wish to produce a grammar without ε-productions; include the shorter alternatives instead. Final grammar:
    • S → aSa | bSb | aa | bb
  • Note: if the original language included ε and it must be preserved, keep a distinguished start symbol S' → S | ε trick to preserve ε while removing other ε-productions.

Example 4

Grammar:

S → a | Xb | aYa
X → Y | ε
Y → b | X

Start symbol: S

Analysis and simplification:

  • Nullable set: X is nullable because X → ε. Initially Y is not nullable (it produces b or X), but after substitution it may get new forms.
  • Replace occurrences of X by ε where appropriate:
    • From S → Xb we obtain S → b (deleting X), and keep S → Xb as well if X remains a non-terminal option before final removal.
    • Propagate transformations through the grammar and compute nullable again if needed.
  • Eventually eliminate explicit ε productions and tidy up repeated or redundant productions. One suitable result after full elimination is:
    • S → a | Xb | aYa | b | aa
    • X → Y
    • Y → b | X
  • After further removal of unit productions and unreachable symbols the grammar can be simplified even more (follow the unit-removal and useless symbol algorithms to finish).

Practical notes and common pitfalls

  • When eliminating ε-productions special care is needed if the start symbol can derive ε; typical solution is to introduce a fresh start symbol S' with S' → S | ε and then eliminate other ε-productions.
  • Order of simplification matters for convenience but not correctness: a common workflow is (1) remove useless symbols, (2) remove ε-productions, (3) remove unit productions, and finally again remove any newly created useless symbols.
  • Always check for reachability after each major transformation because removal of productions can make previously reachable non-terminals unreachable.
  • When adding productions during ε or unit elimination avoid creating duplicate productions; duplicates do not change the language but clutter the grammar.

Exercises

Simplify the following context-free grammars (remove useless symbols, ε-productions and unit productions as appropriate; show intermediate steps):

  1. S → AB
    A → a
    B → b
    B → C
    E → c | ε
    Start symbol: S

  2. S → aAa
    A → Sb | bCC | DaA
    C → abb | DD
    E → aC
    D → aDA
    Start symbol: S

  3. S → bS | bA | ε
    A → ε
    Start symbol: S

  4. S → bS | AB
    A → ε
    B → ε
    D → a
    Start symbol: S

  5. S → A
    A → B
    B → C
    C → a
    Start symbol: S

  6. S → AB
    A → a
    B → C | b
    C → D
    D → E
    E → a
    Start symbol: S

  7. S → ABCD
    A → a
    B → C | b
    C → D
    D → c
    Start symbol: S

  8. S → aS | AC | AB
    A → ε
    C → ε
    B → bB | bb
    D → c
    Start symbol: S

  9. S → ABC | A0A
    A → 0A | B0C | 000 | B
    B → 1B1 | 0 | D
    C → CA | AC
    D → ε
    Start symbol: S

  10. S → AAA | B
    A → 0A | B
    B → ε
    Start symbol: S

End of chapter: Simplification of Context Free Grammars.
The document Simplification of Context Free Grammars is a part of the Computer Science Engineering (CSE) Course Theory of Computation.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

FAQs on Simplification of Context Free Grammars

1. What is the difference between useless productions and useless symbols in CFG simplification?
Ans. Useless productions are rules that can never be reached from the start symbol, while useless symbols are variables or terminals that don't appear in any derivation of valid strings. Both must be eliminated during CFG simplification to reduce grammar complexity and improve parsing efficiency for theory of computation applications.
2. How do I identify and remove epsilon productions from a context-free grammar?
Ans. Epsilon productions (rules producing empty strings) are identified by finding nullable variables-those deriving epsilon directly or indirectly. Remove them by adding alternative productions without nullable symbols, then delete original epsilon rules. This simplification step ensures the grammar generates only non-empty strings while maintaining equivalent language recognition.
3. Why do we need to remove unit productions when simplifying context-free grammars?
Ans. Unit productions (A → B where both are variables) create redundant derivation chains that increase parsing steps unnecessarily. Removing them by substituting productions directly eliminates intermediate steps, making grammar analysis faster and cleaner. This simplification reduces grammar size without changing the language generated by the CFG.
4. What's the correct order for simplifying a context-free grammar-which step comes first?
Ans. Begin by removing epsilon productions, then unit productions, then useless symbols. This sequence prevents reintroducing eliminated rules during later steps. Following this CFG simplification order ensures each transformation builds correctly on previous ones, resulting in a clean, minimal grammar suitable for parsing and compilation tasks.
5. Can a simplified context-free grammar still generate the same language as the original?
Ans. Yes, simplified grammars generate identical languages to their originals-simplification only removes redundancy, not expressiveness. By eliminating epsilon, unit, and useless productions systematically, the grammar becomes more efficient while preserving all valid string derivations, crucial for maintaining language equivalence in theory of computation studies.
Explore Courses for Computer Science Engineering (CSE) exam
Get EduRev Notes directly in your Google search
Related Searches
Previous Year Questions with Solutions, Summary, shortcuts and tricks, Important questions, past year papers, video lectures, mock tests for examination, Viva Questions, Extra Questions, ppt, Objective type Questions, practice quizzes, Sample Paper, Exam, pdf , study material, Semester Notes, Free, Simplification of Context Free Grammars, Simplification of Context Free Grammars, Simplification of Context Free Grammars, MCQs;