Normal Forms for CFGs - Theory of Computation - Computer Science Engineering

Part III. Normal Forms for CFGs

Standard or normal forms are canonical representations for context-free grammars (CFGs). Any CFG can be converted into these forms. The principal advantages of representing a CFG in a normal form are:

  • the grammar becomes simpler and more regular, and
  • it becomes easier to implement algorithms (for example, parsing algorithms) and to reason about the grammar formally.

Two commonly used normal forms for CFGs are:

  • Chomsky Normal Form (CNF), and
  • Greibach Normal Form (GNF).

Chomsky Normal Form (CNF)

A context-free grammar G is in Chomsky Normal Form (CNF) if every production of G is one of the following forms:

  • A → a, where A is a non-terminal and a is a terminal,
  • A → BC, where A, B, C are non-terminals (B and C are not the start symbol together if this would create other forbidden forms),
  • S → ε is permitted only when S is the start symbol and ε is the empty string (no other production may produce ε unless required by the original language).

Thus, in CNF every right-hand side (RHS) is either a single terminal or exactly two non-terminals (except for the permitted start-symbol ε production).

Why CNF is useful

  • CNF restricts productions to a small set of canonical forms; this simplifies many formal proofs about CFGs.
  • CNF is used by standard parsing algorithms such as the CYK algorithm (Cocke-Younger-Kasami), which requires the grammar in binary (two non-terminal) form.
  • It provides a uniform binary branching structure that is convenient for analysis and implementation.

Conversion to CNF - standard procedure

Every CFG can be converted into an equivalent grammar in CNF (that is, a grammar that generates the same language, possibly except for the empty string which needs special care). The usual ordered steps are:

  1. Eliminate useless symbols (remove non-terminals that do not derive any terminal string, and symbols not reachable from the start symbol).
  2. Remove ε-productions (productions A → ε) except possibly S → ε if the original language contains ε. Replace occurrences of nullable variables appropriately.
  3. Eliminate unit productions (productions of the form A → B where A and B are non-terminals) by substituting B's productions into A.
  4. Ensure that in every production every terminal on the RHS of length ≥ 2 is replaced by a corresponding non-terminal that produces that terminal (introduce new non-terminals for terminals as needed).
  5. Break RHSs of length ≥ 3 into a chain of binary productions (introduce new non-terminals to make every RHS of length 2).

Worked examples converting to CNF

Example 1

Given CFG:

S → bA | aB A → bAA | aS | a B → aBB | bS | b

Conversion steps (summary):

  • Replace terminals appearing with non-terminals in RHS by new non-terminals. Introduce P → b and Q → a.
  • Rewrite productions that mix terminals and non-terminals using P and Q:

    S → P A | Q B P → b Q → a A → P A A | Q S | a B → Q B B | P S | b

  • Break RHSs with more than two non-terminals into binary forms. Introduce R → A A and T → B B, then replace:

    A → P R | Q S | a B → Q T | P S | b R → A A T → B B

  • Now all productions are either of the form non-terminal → terminal or non-terminal → non-terminal non-terminal. The grammar is in CNF (with S as start symbol).

Example 2

Given CFG:

S → 1A | 0B A → 1AA | 0S | 0 B → 0BB | 1

Conversion summary:

  • Introduce P → 1 and Q → 0 to replace terminals inside longer RHSs.
  • Rewrite S, A, B using P and Q:

    S → P A | Q B P → 1 Q → 0 A → P A A | Q S | 0 B → Q B B | 1

  • For AA and BB introduce R → A A and T → B B, then:

    A → P R | Q S | 0 B → Q T | 1 R → A A T → B B

  • Grammar is in CNF (S is start symbol).

Example 3

Given CFG:

S → a | b | C S S

Conversion summary:

  • The production S → C S S has three symbols on RHS. Introduce P → S S and use:
  • S → a | b | C P P → S S

  • This grammar is now in CNF (terminals appear lone; other productions have exactly two non-terminals on RHS).

Example 4 (corrected and explained)

Given CFG:

S → abSb | a | aAb A → bS | aAAb

Conversion steps (carefully):

  • Introduce non-terminals for terminals: V → a and W → b.
  • Replace terminals in longer RHSs using V and W:

    S → V W S W | a | V A W A → W S | V A A W V → a W → b

    (Keep S → a unchanged because CNF allows A → a.)
  • Break long RHSs into binary productions by introducing new non-terminals:

    U1 → V W U2 → U1 S U3 → V A U5 → A A U6 → V U5

    Now rewrite S and A:

    S → U2 W | a | U3 W A → W S | U6 W

  • List of all productions in CNF form:

    S → U2 W | a | U3 W A → W S | U6 W U1 → V W U2 → U1 S U3 → V A U5 → A A U6 → V U5 V → a W → b

  • All productions are now of the form non-terminal → terminal or non-terminal → non-terminal non-terminal; the grammar is in CNF.

Exercises - Convert these grammars to CNF

  1. S → bA | aB A → bAA | aS | a B → aBB | bS | b

    where S is the start symbol.
  2. S → a A D A → a B | b A B B → b D → d

    where S is the start symbol.
  3. S → a A b B A → a A | a B → b B | b

    where S is the start symbol.
  4. S → a A b B A → A b | b B → B a | a

    where S is the start symbol.
  5. S → a A | b B A → b A A | a B → B B a | b

    where S is the start symbol.
  6. S → a b S b | a b

    where S is the start symbol.

Greibach Normal Form (GNF)

A context-free grammar G is in Greibach Normal Form (GNF) if every production has the form:

  • A → a X, where a is a single terminal and X is a (possibly empty) sequence of non-terminals,
  • and optionally S → ε if S is the start symbol and the language includes the empty string.

In other words, in GNF each production must begin with exactly one terminal, optionally followed by zero or more non-terminals. GNF is particularly useful for constructing top-down parsers and for proving properties that rely on a terminal leading each derivation step.

Conversion to GNF - outline of the method

Every CFG can be converted into an equivalent grammar in GNF. The standard procedure is as follows:

  1. Start with a grammar that has no useless symbols, no ε-productions (except possibly S → ε), and no unit productions. A convenient first step is to convert the grammar to CNF and then remove unit productions and useless symbols if any remain.
  2. Rename non-terminals as A1, A2, ..., An so that A1 is the start symbol. This provides a fixed ordering of non-terminals.
  3. For i = 1 to n, ensure all productions for Ai are of the form:
    • Ai → a α (where a is terminal and α is a (possibly empty) string of non-terminals), or
    • Ai → Aj β with j > i (a higher-indexed variable followed by non-terminals).
    This is achieved by repeatedly substituting productions for lower-indexed non-terminals that occur as the first symbol on the RHS.
  4. If left recursion appears (a production of the form A → A γ), eliminate it using standard left-recursion elimination by introducing new non-terminals and rewriting the productions so they start with a terminal.
  5. Apply substitution and left-recursion elimination repeatedly until every production begins with a terminal. At that point the grammar is in GNF.

Detailed remarks on substitution and left recursion

Substitution: when a production Ai → Aj γ has j < i (that is, it begins with a lower-indexed non-terminal), replace the occurrence of Aj by each RHS of Aj. Repeat until every production for Ai begins with a terminal or with a non-terminal with index greater than i.

Eliminating left recursion: if after substitution some productions for a non-terminal A have the form A → A α | β where β are productions that do not begin with A, then introduce a new non-terminal Z and rewrite as:

A → β Z Z → α Z | ε

Then ensure β (and subsequently Z's productions) start with terminals by further substitution as necessary.

Worked examples converting to GNF

Example 1

Given CFG:

S → A B | B C A → a B | b A | a B → b B | c C | b C → c

Procedure and result (summary):

  • Substitute productions for A and B where they appear as leftmost symbols in S to ensure S productions begin with terminals. After substitution:
  • S → a B B | b A B | a B | b B C | c C C | b C A → a B | b A | a B → b B | c C | b C → c

  • Now each production for S, A, B, C begins with a terminal; the grammar is in GNF.

Example 2

Given CFG:

S → aba S a | aba

Conversion summary:

  • Introduce non-terminals for terminals: A → a and B → b. Then rewrite:
  • S → A B A S A | A B A A → a B → b

  • Each production now begins with a terminal; the grammar is in GNF.

Example 3

Given CFG:

S → A B A → a A | b B | b B → b

Substitute A's productions into S so that S productions begin with terminals:

S → a A B | b B B | b B A → a A | b B | b B → b

All productions start with a terminal; the grammar is in GNF.

Example 4

Given CFG:

S → a b S b | a a

Introduce A → a and B → b, then rewrite:

S → a B S B | a A A → a B → b

After substitution the productions begin with terminals; the grammar is in GNF.

Example 5

Given CFG:

S → a S a | b S b | a a | b b

Introduce A → a and B → b and transform so that each production begins with a terminal. One possible GNF form is:

S → a S P | b S Q | a P | b Q P → a Q → b P → a Q → b

All productions start with terminals; the grammar is in GNF.

Example 6

Given CFG:

S → A A | a A → S S | b

Conversion outline:

  • Start: grammar is in CNF. Rename non-terminals A1 = S and A2 = A.
  • Apply substitution to remove productions that begin with a non-terminal of lower index, then eliminate left recursion where it appears, and continue substituting so that every production begins with a terminal. The examples in the worked input illustrate the long sequence of substitutions and left-recursion elimination needed to reach GNF.

Example 7

Given CFG:

S → A B A → B S | a B → S A | b

Conversion steps (sketch):

  • Rename variables in order A1, A2, A3 and substitute RHS productions so that each production for a lower-indexed non-terminal begins with a terminal or with a higher-indexed non-terminal.
  • Eliminate left recursion by introducing helper non-terminals and repeatedly applying substitution until each production begins with a terminal symbol.
  • The final grammar (after systematic substitution and left-recursion elimination) has all productions starting with terminals and hence is in GNF. The full substitution sequence is mechanical but may become lengthy; the worked example in the previous section shows the detailed substitutions.

Exercises - Convert these grammars to GNF

  1. A → a B D | b D B | c | A B | A D

    Convert this grammar to Greibach normal form (GNF).
  2. S → A B A → B S | b B → S A | a

    Convert this grammar to GNF.
  3. E → E + T | T T → T * F | F F → ( E ) | a

    Convert this grammar (an expression grammar) to GNF.
  4. S → Y Y | 0 Y → S S | 1

    Convert this grammar to GNF.
  5. S → X Y X → Y S | 1 Y → S X | 0

    Convert this grammar to GNF.
  6. S → X Y X → 0 X | 1 Y | 1 Y → 1

    Convert this grammar to GNF.
  7. S → 0 1 S 1 | 1 1

    Convert this grammar to GNF.

Summary

Both CNF and GNF are standard normal forms that make CFGs easier to analyse and process. CNF limits productions to single terminals or pairs of non-terminals and is essential for algorithms such as CYK. GNF requires that every production begin with a terminal and is helpful for constructing certain top-down parsing procedures and for formal proofs about derivation lengths. Conversion procedures exist for both normal forms: conversion to CNF follows a fixed sequence of simplifying steps, and conversion to GNF typically proceeds from an already simplified grammar (often CNF) and then uses substitution and left-recursion elimination to ensure every production begins with a single terminal.

The document Normal Forms for CFGs is a part of the Computer Science Engineering (CSE) Course Theory of Computation.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

FAQs on Normal Forms for CFGs

1. What's the difference between Chomsky Normal Form and Greibach Normal Form?
Ans. Chomsky Normal Form (CNF) restricts productions to A → BC (two non-terminals) or A → a (single terminal), while Greibach Normal Form (GNF) requires productions in the form A → aα, where a is a terminal followed by zero or more non-terminals. CNF is widely used for parsing algorithms; GNF eliminates left recursion and is essential for top-down parsing techniques in context-free grammar conversions.
2. How do I convert a CFG into Chomsky Normal Form step by step?
Ans. Converting to Chomsky Normal Form involves eliminating ε-productions, removing unit productions, and converting long productions into two-nonterminal chains. First, identify productions violating CNF rules. Replace terminals in multi-symbol productions with new non-terminals. Convert A → BCD into A → BX where X → CD. This standardisation ensures compatibility with CYK parsing algorithm and simplifies grammar analysis significantly.
3. Why do we need normal forms for context-free grammars in exams?
Ans. Normal forms simplify grammar analysis, parsing, and prove theoretical properties of context-free languages. Chomsky Normal Form enables efficient parsing using the CYK algorithm within O(n³) time complexity. Greibach Normal Form eliminates left recursion, preventing infinite loops in top-down parsing. These standardised structures are critical for GATE and university exams testing parsing techniques and computational complexity understanding.
4. Can every context-free grammar be converted to Chomsky Normal Form?
Ans. Almost every CFG can be converted to Chomsky Normal Form, except those generating the empty string (ε). If a grammar generates ε, add a new start symbol S' with production S' → S | ε, then convert the original grammar. This technique preserves the language while achieving CNF, making the transformation universal for practical context-free grammars used in compiler design and formal language studies.
5. What are ε-productions and why must they be removed for normal forms?
Ans. Epsilon-productions (A → ε) allow non-terminals to derive empty strings, violating Chomsky and Greibach Normal Form constraints. Removing them requires identifying nullable non-terminals and adding alternative productions to capture all derivations without ε. This elimination process ensures productions strictly follow normal form rules-A → BC or A → a-necessary for deterministic parsing algorithms and theoretical proofs in automata theory.
Explore Courses for Computer Science Engineering (CSE) exam
Get EduRev Notes directly in your Google search
Related Searches
mock tests for examination, Important questions, practice quizzes, Free, Sample Paper, Normal Forms for CFGs, MCQs, shortcuts and tricks, Normal Forms for CFGs, Normal Forms for CFGs, Extra Questions, Viva Questions, Objective type Questions, Semester Notes, pdf , Exam, Summary, ppt, study material, Previous Year Questions with Solutions, video lectures, past year papers;