Page 1
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 1
CBSE | DEPARTMENT OF SKILL EDUCATION
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 7: NATURAL LANGUAGE PROCESSING
One (01) Mark Questions
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users.
A chatbot is also known as an artificial conversational entity (ACE), chat robot, talk bot,
chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text
or text-to-speech, in lieu of providing direct contact with a live human agent.
2. What is the full form of NLP?
Natural Language Processing
3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.
4. What is the difference between stemming and lemmatization?
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
Page 2
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 1
CBSE | DEPARTMENT OF SKILL EDUCATION
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 7: NATURAL LANGUAGE PROCESSING
One (01) Mark Questions
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users.
A chatbot is also known as an artificial conversational entity (ACE), chat robot, talk bot,
chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text
or text-to-speech, in lieu of providing direct contact with a live human agent.
2. What is the full form of NLP?
Natural Language Processing
3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.
4. What is the difference between stemming and lemmatization?
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 2
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
5. What is the full form of TFIDF?
Term Frequency and Inverse Document Frequency
6. What is meant by a dictionary in NLP?
Dictionary in NLP means a list of all the unique words occurring in the corpus. If some
words are repeated in different documents, they are all written just once as while
creating the dictionary.
7. What is term frequency?
Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.
8. Which package is used for Natural Language Processing in Python programming?
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building
Python programs that can work with human language data.
9. What is a document vector table?
Document Vector Table is used while implementing Bag of Words algorithm.
In a document vector table, the header row contains the vocabulary of the corpus and
other rows correspond to different documents.
If the document contains a particular word it is represented by 1 and absence of word is
represented by 0 value.
OR
Document Vector Table is a table containing the frequency of each word of the
vocabulary in each document.
10. What do you mean by corpus?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
That is, we will be working on text from multiple documents and the term used for the
whole textual data from all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been
produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a
bunch of text files in a directory, often alongside many other directories of text files.
Page 3
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 1
CBSE | DEPARTMENT OF SKILL EDUCATION
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 7: NATURAL LANGUAGE PROCESSING
One (01) Mark Questions
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users.
A chatbot is also known as an artificial conversational entity (ACE), chat robot, talk bot,
chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text
or text-to-speech, in lieu of providing direct contact with a live human agent.
2. What is the full form of NLP?
Natural Language Processing
3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.
4. What is the difference between stemming and lemmatization?
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 2
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
5. What is the full form of TFIDF?
Term Frequency and Inverse Document Frequency
6. What is meant by a dictionary in NLP?
Dictionary in NLP means a list of all the unique words occurring in the corpus. If some
words are repeated in different documents, they are all written just once as while
creating the dictionary.
7. What is term frequency?
Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.
8. Which package is used for Natural Language Processing in Python programming?
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building
Python programs that can work with human language data.
9. What is a document vector table?
Document Vector Table is used while implementing Bag of Words algorithm.
In a document vector table, the header row contains the vocabulary of the corpus and
other rows correspond to different documents.
If the document contains a particular word it is represented by 1 and absence of word is
represented by 0 value.
OR
Document Vector Table is a table containing the frequency of each word of the
vocabulary in each document.
10. What do you mean by corpus?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
That is, we will be working on text from multiple documents and the term used for the
whole textual data from all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been
produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a
bunch of text files in a directory, often alongside many other directories of text files.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 3
Two (02) Mark Questions
1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of
written words and spoken words which humans use in their daily lives and operates on
this.
2. Differentiate between a script-bot and a smart-bot. (Any 2 differences)
Script-bot Smart-bot
? A scripted chatbot doesn’t carry
even a glimpse of A.I
? Script bots are easy to make
? Script bot functioning is very
limited as they are less powerful.
? Script bots work around a script
which is programmed in them
? No or little language processing
skills
? Limited functionality
? Smart bots are built on NLP and
ML.
? Smart –bots are comparatively
difficult to make.
? Smart-bots are flexible and
powerful.
? Smart bots work on bigger
databases and other resources
directly
? NLP and Machine learning skills
are required.
? Wide functionality
3. Give an example of the following:
? Multiple meanings of a word
? Perfect syntax, no meaning
? Example of Multiple meanings of a word –
His face turns red after consuming the medicine
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste of that
medicine?
? Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In Human
language, a perfect balance of syntax and semantics is important for better
understanding.
4. What is inverse document frequency?
To understand inverse document frequency, first we need to understand document
frequency.
Document Frequency is the number of documents in which the word occurs irrespective
of how many times it has occurred in those documents.
In case of inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator.
For example, if the document frequency of a word “AMAN” is 2 in a particular document
then its inverse document frequency will be 3/2. (Here no. of documents is 3)
Page 4
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 1
CBSE | DEPARTMENT OF SKILL EDUCATION
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 7: NATURAL LANGUAGE PROCESSING
One (01) Mark Questions
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users.
A chatbot is also known as an artificial conversational entity (ACE), chat robot, talk bot,
chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text
or text-to-speech, in lieu of providing direct contact with a live human agent.
2. What is the full form of NLP?
Natural Language Processing
3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.
4. What is the difference between stemming and lemmatization?
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 2
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
5. What is the full form of TFIDF?
Term Frequency and Inverse Document Frequency
6. What is meant by a dictionary in NLP?
Dictionary in NLP means a list of all the unique words occurring in the corpus. If some
words are repeated in different documents, they are all written just once as while
creating the dictionary.
7. What is term frequency?
Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.
8. Which package is used for Natural Language Processing in Python programming?
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building
Python programs that can work with human language data.
9. What is a document vector table?
Document Vector Table is used while implementing Bag of Words algorithm.
In a document vector table, the header row contains the vocabulary of the corpus and
other rows correspond to different documents.
If the document contains a particular word it is represented by 1 and absence of word is
represented by 0 value.
OR
Document Vector Table is a table containing the frequency of each word of the
vocabulary in each document.
10. What do you mean by corpus?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
That is, we will be working on text from multiple documents and the term used for the
whole textual data from all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been
produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a
bunch of text files in a directory, often alongside many other directories of text files.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 3
Two (02) Mark Questions
1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of
written words and spoken words which humans use in their daily lives and operates on
this.
2. Differentiate between a script-bot and a smart-bot. (Any 2 differences)
Script-bot Smart-bot
? A scripted chatbot doesn’t carry
even a glimpse of A.I
? Script bots are easy to make
? Script bot functioning is very
limited as they are less powerful.
? Script bots work around a script
which is programmed in them
? No or little language processing
skills
? Limited functionality
? Smart bots are built on NLP and
ML.
? Smart –bots are comparatively
difficult to make.
? Smart-bots are flexible and
powerful.
? Smart bots work on bigger
databases and other resources
directly
? NLP and Machine learning skills
are required.
? Wide functionality
3. Give an example of the following:
? Multiple meanings of a word
? Perfect syntax, no meaning
? Example of Multiple meanings of a word –
His face turns red after consuming the medicine
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste of that
medicine?
? Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In Human
language, a perfect balance of syntax and semantics is important for better
understanding.
4. What is inverse document frequency?
To understand inverse document frequency, first we need to understand document
frequency.
Document Frequency is the number of documents in which the word occurs irrespective
of how many times it has occurred in those documents.
In case of inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator.
For example, if the document frequency of a word “AMAN” is 2 in a particular document
then its inverse document frequency will be 3/2. (Here no. of documents is 3)
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 4
5. Define the following:
? Stemming
? Lemmatization
Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes
(“ing”, “ly”, “es”, “s” etc) from a word.
Stemming is a process of reducing words to their word stem, base or root form (for
example, books — book, looked — look).
Lemmatization: Lemmatization, on the other hand, is an organized & step by step
procedure of obtaining the root form of the word, it makes use of vocabulary (dictionary
importance of words) and morphological analysis (word structure and grammar
relations).
The aim of lemmatization, like stemming, is to reduce inflectional forms to a common
base form. As opposed to stemming, lemmatization does not simply chop off inflections.
Instead it uses lexical knowledge bases to get the correct base forms of words.
OR
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
6. What do you mean by document vectors?
Document Vector contains the frequency of each word of the vocabulary in a particular
document.
In document vector vocabulary is written in the top row. Now, for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word appears
again, increment the previous value by 1. And if the word does not occur in that
document, put a 0 under it.
7. What is TFIDF? Write its formula.
Term frequency–inverse document frequency, is a numerical statistic that is intended to
reflect how important a word is to a document in a collection or corpus.
Page 5
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 1
CBSE | DEPARTMENT OF SKILL EDUCATION
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 7: NATURAL LANGUAGE PROCESSING
One (01) Mark Questions
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users.
A chatbot is also known as an artificial conversational entity (ACE), chat robot, talk bot,
chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text
or text-to-speech, in lieu of providing direct contact with a live human agent.
2. What is the full form of NLP?
Natural Language Processing
3. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.
4. What is the difference between stemming and lemmatization?
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 2
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
5. What is the full form of TFIDF?
Term Frequency and Inverse Document Frequency
6. What is meant by a dictionary in NLP?
Dictionary in NLP means a list of all the unique words occurring in the corpus. If some
words are repeated in different documents, they are all written just once as while
creating the dictionary.
7. What is term frequency?
Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.
8. Which package is used for Natural Language Processing in Python programming?
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building
Python programs that can work with human language data.
9. What is a document vector table?
Document Vector Table is used while implementing Bag of Words algorithm.
In a document vector table, the header row contains the vocabulary of the corpus and
other rows correspond to different documents.
If the document contains a particular word it is represented by 1 and absence of word is
represented by 0 value.
OR
Document Vector Table is a table containing the frequency of each word of the
vocabulary in each document.
10. What do you mean by corpus?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
That is, we will be working on text from multiple documents and the term used for the
whole textual data from all the documents altogether is known as corpus.
OR
A corpus is a large and structured set of machine-readable texts that have been
produced in a natural communicative setting.
OR
A corpus can be defined as a collection of text documents. It can be thought of as just a
bunch of text files in a directory, often alongside many other directories of text files.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 3
Two (02) Mark Questions
1. What are the types of data used for Natural Language Processing applications?
Natural Language Processing takes in the data of Natural Languages in the form of
written words and spoken words which humans use in their daily lives and operates on
this.
2. Differentiate between a script-bot and a smart-bot. (Any 2 differences)
Script-bot Smart-bot
? A scripted chatbot doesn’t carry
even a glimpse of A.I
? Script bots are easy to make
? Script bot functioning is very
limited as they are less powerful.
? Script bots work around a script
which is programmed in them
? No or little language processing
skills
? Limited functionality
? Smart bots are built on NLP and
ML.
? Smart –bots are comparatively
difficult to make.
? Smart-bots are flexible and
powerful.
? Smart bots work on bigger
databases and other resources
directly
? NLP and Machine learning skills
are required.
? Wide functionality
3. Give an example of the following:
? Multiple meanings of a word
? Perfect syntax, no meaning
? Example of Multiple meanings of a word –
His face turns red after consuming the medicine
Meaning - Is he having an allergic reaction? Or is he not able to bear the taste of that
medicine?
? Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but it does not make any sense. In Human
language, a perfect balance of syntax and semantics is important for better
understanding.
4. What is inverse document frequency?
To understand inverse document frequency, first we need to understand document
frequency.
Document Frequency is the number of documents in which the word occurs irrespective
of how many times it has occurred in those documents.
In case of inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator.
For example, if the document frequency of a word “AMAN” is 2 in a particular document
then its inverse document frequency will be 3/2. (Here no. of documents is 3)
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 4
5. Define the following:
? Stemming
? Lemmatization
Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes
(“ing”, “ly”, “es”, “s” etc) from a word.
Stemming is a process of reducing words to their word stem, base or root form (for
example, books — book, looked — look).
Lemmatization: Lemmatization, on the other hand, is an organized & step by step
procedure of obtaining the root form of the word, it makes use of vocabulary (dictionary
importance of words) and morphological analysis (word structure and grammar
relations).
The aim of lemmatization, like stemming, is to reduce inflectional forms to a common
base form. As opposed to stemming, lemmatization does not simply chop off inflections.
Instead it uses lexical knowledge bases to get the correct base forms of words.
OR
Stemming is a technique used to extract the base form of the words by removing affixes
from them. It is just like cutting down the branches of a tree to its stems. For example,
the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In search
queries, lemmatization allows end users to query any version of a base word and get
relevant results.
OR
Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
OR
Stemming algorithms work by cutting off the end or the beginning of the word, taking
into account a list of common prefixes and suffixes that can be found in an inflected
word.
Lemmatization on the other hand, takes into consideration the morphological analysis
of the words. To do so, it is necessary to have detailed dictionaries which the algorithm
can look through to link the form back to its lemma.
6. What do you mean by document vectors?
Document Vector contains the frequency of each word of the vocabulary in a particular
document.
In document vector vocabulary is written in the top row. Now, for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word appears
again, increment the previous value by 1. And if the word does not occur in that
document, put a 0 under it.
7. What is TFIDF? Write its formula.
Term frequency–inverse document frequency, is a numerical statistic that is intended to
reflect how important a word is to a document in a collection or corpus.
CBSE Question Bank – AI – Class 10 – Chapter- 7 Natural Language Processing 5
The number of times a word appears in a document divided by the total number of
words in the document. Every document has its own term frequency.
8. Which words in a corpus have the highest values and which ones have the least?
Stop words like - and, this, is, the, etc. have highest values in a corpus. But these words
do not talk about the corpus at all. Hence, these are termed as stopwords and are mostly
removed at the pre-processing stage only.
Rare or valuable words occur the least but add the most importance to the corpus.
Hence, when we look at the text, we take frequent and rare words into consideration.
9. Does the vocabulary of a corpus remain the same before and after text
normalization? Why?
No, the vocabulary of a corpus does not remain the same before and after text
normalization. Reasons are –
? In normalization the text is normalized through various steps and is lowered to
minimum vocabulary since the machine does not require grammatically correct
statements but the essence of it.
? In normalization Stop words, Special Characters and Numbers are removed.
? In stemming the affixes of words are removed and the words are converted to their base
form.
So, after normalization, we get the reduced vocabulary.
10. What is the significance of converting the text into a common case?
In Text Normalization, we undergo several steps to normalize the text to a lower level.
After the removal of stop words, we convert the whole text into a similar case,
preferably lower case. This ensures that the case-sensitivity of the machine does not
consider same words as different just because of different cases.
11. Mention some applications of Natural Language Processing.
Natural Language Processing Applications-
? Sentiment Analysis.
? Chatbots & Virtual Assistants.
? Text Classification.
? Text Extraction.
? Machine Translation
? Text Summarization
? Market Intelligence
? Auto-Correct
Read More