Context Free Grammars (CFG) - Theory of Computation - Computer Science

Context Free Grammars and Languages

According to the Chomsky hierarchy, Type-2 languages are called context-free languages (CFLs). These languages are generated by context-free grammars (CFGs) and are recognised by push-down automata (PDA). Every regular language is a subset of the class of context-free languages.

Context Free Grammars and Languages
Context Free Grammars and Languages

Part I. Context-Free Grammars (CFG)

A context-free grammar G is a 4-tuple G = (V, Σ, P, S) where:

  • V is a finite set of non-terminal symbols (variables).
  • Σ is a finite set of terminal symbols, with V ∩ Σ = ∅.
  • P is a finite set of productions (or rules).
  • S ∈ V is the start symbol.

A grammar is context-free if every production in P has the form A → α where A is a single non-terminal and α is a string from (V ∪ Σ)* (possibly empty). That is, the left-hand side of every production is exactly one non-terminal.

Every regular grammar is a context-free grammar as a special case.

Example 1

An example of a CFG is:

S → ASA | BSB | a | b

A → a

B → b

Here the components can be identified as:

  • V = {S, A, B}
  • Σ = {a, b}
  • P = {S → ASA, S → BSB, S → a, S → b, A → a, B → b}
  • S is the start symbol.

Example 2 (Programming-language style statements)

Consider grammar fragments for statements similar to a C-like language:

Stmt → id = Expr ;

Stmt → if ( Expr ) Stmt

Stmt → if ( Expr ) Stmt else Stmt

Stmt → while ( Expr ) Stmt

Stmt → do Stmt while ( Expr ) ;

Stmt → { Stmts }

Stmts → Stmts Stmt | ε

Here the non-terminals could be {Stmts, Stmt, Expr} and terminals include symbols such as id, =, ;, (, ), if, else, while, do. The start symbol is Stmts. This CFG captures typical nested and recursive statement forms in imperative languages.

Language of a CFG (Context-Free Language)

Given a CFG G, the language generated by G, denoted L(G), is the set of all strings over the terminals Σ that can be derived from the start symbol S using the productions in P.

Example

Consider the CFG:

S → aA | bB

A → x

B → y

The language corresponding to this CFG is {ax, by}.

Example

There are two standard methods to check whether a given string belongs to L(G): derivations and reductions. Both are commonly illustrated using derivation trees (parse trees).

Derivations

A derivation shows step-by-step how the start symbol can be rewritten (using productions) to obtain a target string of terminals. One may consider a leftmost derivation, a rightmost derivation, or any derivation sequence. A derivation tree (parse tree) gives a hierarchical view of the derivation.

Example 1

Grammar:

S → aSb | ab

Check whether the string aabb belongs to L(G).

Example 1

Using derivations we can show:

S ⇒ aSb ⇒ aaSbb ⇒ aabb

Thus aabb is in the language.

Example 2

Grammar:

S → aSa | bSb | c

Check whether the string abbcbba belongs to L(G).

Example 2

The derivation tree demonstrates that abbcbba is derivable from S and hence belongs to the language.

Example 3

Grammar:

S → aAS | a | SS

A → SbA | ba

Check whether the string aabaa can be derived.

Example 3

The parse tree shows a valid derivation and therefore aabaa ∈ L(G).

Example 4

Grammar:

S → aAS | a

A → SbA | SS | ba

Check whether the string aabbaa (that is a2b2a2) belongs to the language.

Example 4

The derivation tree shows that the string is derivable, so it belongs to the CFL.

Reductions (Inverse Derivations)

Reductions proceed from a terminal string backwards by repeatedly replacing substrings that match right-hand sides of productions with their corresponding left-hand non-terminal, building an inverted parse tree until the start symbol is reached (if possible).

Example

Grammar:

S → aAS | a | SS

A → SbA | ba

Check whether aabaa ∈ L(G) using reductions.

Example

By successive reductions we reach the start symbol S; thus aabaa is in the language.

Designing CFGs for Given Languages

The following examples show standard constructions and patterns when designing CFGs for languages described by algebraic conditions, regular expressions, or structural requirements. Each example preserves the original language specification and the grammars given.

Example 1

Design a CFG for L = { an bn | n ≥ 0 }.

At n = 0 the empty string ε belongs to L, so include S → ε.

To generate matching numbers of a's and b's with all a's before b's use S → a S b.

Hence the grammar is:

S → a S b | ε

Example 2

Design a CFG for L = { an b2n | n ≥ 0 }.

Include S → ε for n = 0.

To ensure two b's for each a use S → a S b b.

Thus:

S → a S b b | ε

Example 3

Design a CFG over {0,1} that generates all strings of even length.

Include S → ε for length 0.

Every additional two symbols can be one of 00, 01, 10, 11 so use productions that prepend such pairs:

S → 00 S | 01 S | 10 S | 11 S | ε

Example 4

Design a CFG over {0,1} to generate strings that alternate 0 and 1.

Shortest alternating strings of length ≥ 2 are 01 and 10. To capture longer alternating strings that may end with one symbol, the grammar:

S → 01 | 10 | 01 A | 10 B

A → 01 A | 0 | 01

B → 10 B | 1 | 10

Example 5

Write a CFG for the regular expression a* b (a | b)*.

Analysis: strings have any number of a's, then a single b, then any string of a's and b's.

One convenient CFG:

S → A b B

A → a A | ε

B → a B | b B | ε

Derivation example from the grammar shows aabab can be produced.

Example 6

Write a CFG that generates strings with equal numbers of a's and b's (order unrestricted).

A simple CFG that pairs a's and b's in interleaved fashion is:

S → a S b S | b S a S | ε

This grammar produces strings where the total number of a's equals the total number of b's, though the positions may intermix.

Example 7

Design a CFG for the regular language (a | b)*.

Grammar:

S → a S | b S | ε

Example 8

Design a CFG for L(G) = { w wR | w ∈ {0,1}* }, the set of even-length palindromes that are mirror images about the centre.

Grammar:

S → 0 S 0 | 1 S 1 | ε

For example:

S ⇒ 0 S 0 ⇒ 0 0 S 0 0 ⇒ 0 0 1 S 1 0 0 ⇒ 0 0 1 1 0 0

Example 9

Design a CFG for L = { an bm | n ≠ m }.

Split into two cases, n > m and n < m, and take the union.

Case 1 (n > m):

S1 → A S1

S1 → a S1 b | ε

A → a A | a

Case 2 (n < m):

S2 → S2 B

S2 → a S2 b | ε

B → b B | b

Combine:

S → S1 | S2

Here S is the start symbol and each branch generates strings where either a's outnumber b's or b's outnumber a's.

Example 10

Design a CFG for L = { 0n 1n | n ≥ 0 } ∪ { 1n 0n | n ≥ 0 }.

Construct separate grammars and take their union:

S1 → 0 S1 1 | ε

S2 → 1 S2 0 | ε

S → S1 | S2

Example 11

Design a CFG for L = { an b2n | n ≥ 0 } (repeat of earlier example).

S → a S b b | ε

Example 12

Write a CFG for L = { a2n bm | n > 0, m ≥ 0 }, strings that start with an even positive number of a's followed by any number of b's.

Grammar:

S → a a A B

A → a a A | ε

B → b B | ε

Example 13

Write a CFG for L = { an b2n cm | n, m ≥ 0 }.

This language requires the a's to appear only at the start (if at all), followed by b's in twice the number of a's, and then any number of c's. One grammar is:

S → A B

A → a A b b | ε

B → c B | ε

Example 14

Design a CFG for L = { an bm c2m | n, m ≥ 0 }.

The CFG:

S → A B

A → a A | ε

B → b B c c | ε

Example 15

Design a CFG for L = { w c wR | w ∈ {a,b}* } (strings that have a central c and a mirrored prefix/suffix).

The CFG is:

S → a S a | b S b | c

Additional Concepts (Summary and Important Properties)

The following standard concepts and properties are essential to a full understanding of CFGs and CFLs:

  • Parse trees / derivation trees: hierarchical representations of derivations that show parent→children expansions for each production used.
  • Leftmost and rightmost derivations: sequences where at each step the leftmost (or rightmost) non-terminal is replaced; useful for parser design.
  • Ambiguity: a grammar is ambiguous if some string has two distinct parse trees (or two distinct leftmost/rightmost derivations). Ambiguity is an important practical concern in language design; some languages are inherently ambiguous.
  • Normal forms: Chomsky Normal Form (CNF) and Greibach Normal Form (GNF) are canonical forms for CFGs useful in proofs and algorithms (for example, CNF is used by the CYK parsing algorithm).
  • Parsing algorithms: common algorithms include top-down (recursive descent, LL parsers) and bottom-up (LR parsers). The CYK algorithm decides membership in O(n^3) time for grammars in CNF.
  • Closure properties: CFLs are closed under union, concatenation and Kleene star, but not closed under intersection or complement in general. Intersection of a CFL with a regular language is a CFL.
  • Pumping lemma for CFLs: a technical tool used to prove that certain languages are not context-free.
  • Equivalence with PDAs: Every CFL can be recognised by some nondeterministic push-down automaton (PDA), and conversely every language accepted by a PDA is context-free.

Summary

Context-free grammars provide a concise and powerful formalism to describe hierarchical and nested structures common in programming languages and formal languages. CFGs generate context-free languages, which are exactly the languages recognised by push-down automata. Understanding derivations, parse trees, grammar construction techniques, ambiguity, and normal forms is fundamental for both theoretical study and practical applications such as compiler design and syntactic analysis.

The document Context Free Grammars (CFG) is a part of the Computer Science Engineering (CSE) Course Theory of Computation.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

FAQs on Context Free Grammars (CFG)

1. What is a context-free grammar (CFG)?
Ans. A context-free grammar (CFG) is a formalism used to describe the syntax of a language. It consists of a set of production rules that define how symbols can be combined to form valid strings in the language.
2. What is the role of derivations in context-free grammars?
Ans. Derivations in context-free grammars describe the step-by-step process of generating a valid string in the language. Each derivation applies one production rule at a time, replacing a non-terminal symbol with a sequence of symbols according to the production rule.
3. How are reductions used in the context of context-free grammars?
Ans. Reductions are used in the context of context-free grammars to simplify a derived string by replacing a substring with a non-terminal symbol. This is done to ensure that the derived string adheres to the production rules of the grammar.
4. Can you provide an example of a context-free grammar?
Ans. Sure! Here's an example of a context-free grammar: S -> aSb S -> ε This grammar generates strings consisting of any number of 'a's followed by the same number of 'b's.
5. What is the relationship between context-free grammars and computer science engineering (CSE)?
Ans. Context-free grammars play a fundamental role in computer science engineering (CSE) as they are used to define the syntax of programming languages. Understanding context-free grammars is essential for designing compilers and interpreters, which are crucial components of CSE systems.
Explore Courses for Computer Science Engineering (CSE) exam
Get EduRev Notes directly in your Google search
Related Searches
past year papers, Context Free Grammars (CFG), Semester Notes, Context Free Grammars (CFG), shortcuts and tricks, practice quizzes, Context Free Grammars (CFG), Summary, MCQs, Objective type Questions, Viva Questions, Important questions, pdf , Sample Paper, Previous Year Questions with Solutions, Free, Extra Questions, video lectures, Exam, mock tests for examination, ppt, study material;