Part V. Formal Representation of Languages
Here we introduce the concept of formal languages and grammars.We know that a grammar is specified for a natural language such as English. To define valid sentences and to give structural descriptions of sentences a grammar is used. Linguistics is a branch to study the theory of languages. Scientists in this branch of study have defined a formal grammar to describe English that would make language translation using computers very easy.
English Grammar
First we will consider how grammars can be specified for English language. Then we will extend this model to our programming language grammar specification.
An example for grammar is,
ARTICLE → a | an | the
NOUN_PHRASE → ARTICLE NOUN
NOUN → boy | apple | ε
Productions
The above are called productions or rules of the grammar. Two productions are given above.
First production is,
ARTICLE → a | an | the
It can also be written as,
ARTICLE → a
ARTICLE → an
ARTICLE → the
Second production is,
NOUN_PHRASE → ARTICLE NOUN
Third production is,
NOUN → boy | apple | ε
It can also be written as,
NOUN → boy
NOUN → apple
NOUN → ε
Non-terminals
In the above, ARTICLE, NOUN_PHRASE, NOUN are called non-terminal symbols. A non-terminal is specified using upper case letters. Note that the left hand side of a production always contains a single non-terminal.
Terminals
In the above, ’a’, ’an’, ’the’, ’boy’, ’apple’ are called terminal symbols. A terminal is written using lowercase letters. ε means an empty string.
Start Symbol
In a grammar specification, one non-terminal symbol is called the start symbol. Here, NOUN_PHRASE is the start symbol. Thus a grammar consists of a set of productions, terminals, non-terminals and start symbol. If the matching left hand side of a certain rule can be found to occur in a string, it may be replaced by the corresponding right hand side.
8.1 Formal Definition of a Grammar
A grammar is where
VN is a finite non empty set whose elements are called non-terminals.
∑ is a finite non empty set whose elements are called terminals.
VN ∩ ∑ = φ
S is a special non terminal called start symbol.
P is a finite set whose elements are α → β, whose α and β are terminals or non terminals. Elements of P are called productions or production rules.
For example,
Consider the productions,
S → aB|bA
A → aS|bAA|a
B → bS|aBB|b
where S is the start symbol.
For the above productions the grammar is defined as,
VN = {S, A, B}
∑ = {a, b}
Sis the start symbol S
P = {S → aB|bA, A → aS|bAA|a, B → bS|aBB|b}
Derivations
Productions are used to derive various sentences.
Example 1:
Consider the grammar given below:
SENTENCE → NOUN_PHRASE VERB_PHRASE
NOUN_PHRASE → ARTICLE NOUN
VERB_PHRASE → VERB NOUN_PHRASE
ARTICLE → a | an | the
NOUN → boy | apple
VERB → apple
where SENTENCE is the start symbol.
Using this grammar, the sentence
’the boy ate an apple’
is derived as shown below:
Example 2:
Consider the grammar given below:
S → baaS | baa
In the above, S is the start symbol, S is a non-terminal and a, b are terminals.
To derive a sentence, we begin with the start symbol, S.
From the above, we can rewrite, S as baa.
Thus the sentence, baa is a valid sentence according to the grammar.
From the above rule, rewrite S as baaS. Then rewrite S in baaS as baaS. Then we get baabaaS. Again rewrite S as
baa. We get baabaabaa.
Thus baabaabaa is a valid sentence.
Example 3:
Following is a grammar for expressions:
Expr → Expr Op num | num
Op → + | - | * | /
In the above grammar, Expr, Op are non-terminals.
num, +, -, *, / are terminals.
Using the above grammar, a large set of expressions can be derived.
Example 4:
The following is a grammer for if..else construct in C.
Stmt → if ( Expr ) Stmt else Stmt | if ( Expr ) Stmt
Example 5:
The following is a grammar for a set of statements in C.
Stmt → id = Expr;
| if (Expr) Stmt
| if (Expr) Stmt else Stmt
| while (Expr) stmt
| do Stmt while (Expr);
| { Stmts }
Stmts → Stmts Stmt
| ε
8.2 Formal Definition of a Language
The language generated by a grammar G is L(G) is defined as
L(G) = { w∈ ∑∗ | S∗ ⇒G w }
The elements of L(G) are called sentences.
In a simple way, L(G) is the set of all sentences derived from the start symbol S.
For example, for the grammar
S → baaS | baa
L(G) = {baa, baabaa, baabaabaa, . . . . . . . . . . }
Example 1:
Consider the grammar given below:
ARTICLE → a | an | the
NOUN_PHRASE → ARTICLE NOUN
NOUN → boy | apple
where NOUN_PHRASE is the start symbol.
Find the language L(G) generated by the grammar.
i)
NOUN_PHRASE → ARTICLE NOUN
⇒ a boy
ii)
NOUN_PHRASE → ARTICLE NOUN
⇒ an apple
iii)
NOUN_PHRASE → ARTICLE NOUN
⇒ the boy
iv)
NOUN_PHRASE → ARTICLE NOUN
⇒ the apple
v)
NOUN_PHRASE → ARTICLE NOUN
⇒ a apple
vi)
NOUN_PHRASE → ARTICLE NOUN
⇒ an boy
Hence L(G) = { a boy, an apple, the apple, the boy, a apple, an boy}
Note that ’a apple’ and ’an boy’ are in the set even if they are not valid English phrases.
Exercises:
1. Let grammar G = [ {S,C}, {a,b}, P, S ]
where P consists of
S → aCa
C → aCa|b. Find L(G).
2. Let G = [{S, A1, A2}, {a, b}, P, S], where P consists of
S → aA1A2a
A1 → baA1A2b
A2 →A1ab
aA1 → baa
bA2b → abab
Check whether w = baabbabaaabbaba is in L(G).
2. Let grammar G = [ {S}, {0,1}, {S →0S1, S → ∈}, S ]. Find L(G).
9 Chomsky Classification of Languages
[Kavitha2004]
A number of language families are there. Noam Chomsky, a founder of formal language theory, provided an initial classification into four language types, type 0 to type 3. A grammar represents the structure of a language as we saw earlier. Four types of languages and their associated grammars are defined in Chomsky Hierarchy (defined by Noam Chomsky).The languages are
Type- 0 languages or Unrestricted languages,
Type- 1 languages or Context sensitive languages,
Type- 2 languages or Context free languages,
Type- 3 languages or regular languages.
Each language family is a proper subset of another one. The following figure shows the relationship.
Type-0 languages are generated by Type-0 grammars, Type-1 languages by Type-1 grammars, Type-2 languages by
Type-2 grammars and Type-3 languages by Type-3 grammars.
Thus the corresponding grammars are
Type- 0 grammars or Unrestricted grammars,
Type- 1 grammars or Context sensitive grammars,
Type- 2 grammars or Context free grammars,
Type- 3 grammars or regular grammars.
Type- 0 or Unrestricted languages vs Type- 0 grammars
Type- 0 languages are defined by arbitrary grammars (Type-0).
Type- 0 grammars have productions of the form, α → β, such that α ∉ ε.
Type- 0 languages are recognized using Turing Machines (TM).
The following is an example for an Unrestricted grammar:
S → ACaB
Ca → aaC
CB → DB
aD → Da
aE → Ea
AE → ε
Type- 1 or Context- sensitive languages vs Type- 1 grammars
Type- 1 languages are described by Type- 1 grammars.
LHS of a Type- 1 grammar is not longer than the RHS.
Type- 1 grammars are represented by the productions α → β, such that |α| ≤ |β|.
Type- 1 languages are recognized using Linear Bounded Automata (LBA).
The following is an example for a Context Sensitive grammar:
S → SBC
S → aC
B → a
CB → BC
Ba → aa
C → b
Type- 2 or Context- free languages vs Type- 2 grammars
Type- 2 languages are described by Type- 2 grammars or context free grammars.
LHS of a Context free grammar (Type-2) is a single non-terminal. The RHS consists of an arbitrary string of terminals and non-terminals. Context free grammars are represented by the productions A → β.
Type 2 languages are recognized using Pushdown Automata (PDA).
The following is an example for a Context free grammar:
S → aSb
S → ε
The English grammar given below is a context free grammar.
ARTICLE → a | an | the
NOUN_PHRASE → ARTICLE NOUN
NOUN → boy | apple
Type- 3 or Regular languages vs Type- 3 grammars
Type- 3 languages are described by Type- 3 grammars.
The LHS of a Type- 3 grammar is a single non-terminal. The RHS is empty, or consists of a single terminal or a single
terminal followed by a non- terminal.
Type- 3 grammars are represented by the productions A → aB or A → a.
Type- 3 languages are recognized using Finite State Automata (FA).
The following is an example for a regular grammar:
A → aA
A →ε
A → bB
B → bB
B → ε
A → c
Following figure shows the relation between the four types of languages and machines.
18 videos|69 docs|44 tests
|
1. What is the Chomsky hierarchy in formal language theory? |
2. What is the significance of formal representation of languages in computer science engineering? |
3. How are regular languages represented in the Chomsky hierarchy? |
4. What are context-free languages in the context of the Chomsky hierarchy? |
5. How does the Chomsky hierarchy relate to the complexity of languages? |
|
Explore Courses for Computer Science Engineering (CSE) exam
|