Humanities/Arts Exam  >  Humanities/Arts Notes  >  Informatics Practices for Class 12  >  NCERT Textbook: Data Handling using Pandas - I

NCERT Textbook: Data Handling using Pandas - I | Informatics Practices for Class 12 - Humanities/Arts PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


2.1 Introduct Ion to Python LIbrar Ies Python libraries contain a collection of built-
in modules that allow us to perform many 
actions without writing detailed programs 
for it. Each library in Python contains a large 
number of modules that one can import and 
use.  
NumPy, Pandas and Matplotlib are three 
well-established Python libraries for scientific 
and analytical use. These libraries allow us 
to manipulate, transform and visualise data 
easily and efficiently.  
NumPy, which stands for ‘Numerical 
Python’, is a library we discussed in class 
XI. Recall that, it is a package that can 
be used for numerical data analysis and 
“If you don't think carefully, you 
might believe that programming 
is just typing statements in a 
programming language.”
—  W. Cunningham
Chapter
 2
Data Handling Using 
Pandas - I
In this chapter
 » Introduction to 
Python Libraries
 » Series
 » DataFrame
 » Importing and 
Exporting Data 
between CSV Files 
and DataFrames
 » Pandas Series Vs 
NumPy ndarray
Chapter 2.indd   27 11/26/2020   12:32:46 PM
2024-25
Page 2


2.1 Introduct Ion to Python LIbrar Ies Python libraries contain a collection of built-
in modules that allow us to perform many 
actions without writing detailed programs 
for it. Each library in Python contains a large 
number of modules that one can import and 
use.  
NumPy, Pandas and Matplotlib are three 
well-established Python libraries for scientific 
and analytical use. These libraries allow us 
to manipulate, transform and visualise data 
easily and efficiently.  
NumPy, which stands for ‘Numerical 
Python’, is a library we discussed in class 
XI. Recall that, it is a package that can 
be used for numerical data analysis and 
“If you don't think carefully, you 
might believe that programming 
is just typing statements in a 
programming language.”
—  W. Cunningham
Chapter
 2
Data Handling Using 
Pandas - I
In this chapter
 » Introduction to 
Python Libraries
 » Series
 » DataFrame
 » Importing and 
Exporting Data 
between CSV Files 
and DataFrames
 » Pandas Series Vs 
NumPy ndarray
Chapter 2.indd   27 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 28
 
scientific computing. NumPy uses a multidimensional 
array object and has functions and tools for working 
with these arrays.  Elements of an array stay together in 
memory, hence, they can be quickly accessed.
PANDAS (PANel DAta) is a high-level data manipulation 
tool used for analysing data. It is very easy to import 
and export data using Pandas library which has a very 
rich set of functions. It is built on packages like NumPy 
and Matplotlib and gives us a single, convenient place 
to do most of our data analysis and visualisation work. 
Pandas has three important data structures, namely –  
Series, DataFrame and Panel to make the process of 
analysing data organised, effective and efficient.
The Matplotlib library in Python is used for plotting 
graphs and visualisation. Using Matplotlib, with just a 
few lines of code we can generate publication quality 
plots, histograms, bar charts, scatterplots, etc. It is 
also built on Numpy, and is designed to work well with 
Numpy and Pandas. 
You may think what the need for Pandas is when 
NumPy can be used for data analysis. Following are 
some of the differences between Pandas and Numpy:
1. A Numpy array requires homogeneous data, while 
a Pandas DataFrame can have different data types 
(float, int, string, datetime, etc.). 
2. Pandas have a simpler interface for operations like 
file loading, plotting, selection, joining, GROUP 
BY, which come very handy in data-processing 
applications.
3. Pandas DataFrames (with column names) make it 
very easy to keep track of data. 
4. Pandas is used when data is in Tabular Format, 
whereas Numpy is used for numeric array based 
data manipulation.
2.1.1. Installing Pandas
Installing Pandas is very similar to installing NumPy. To 
install Pandas from command line, we need to type in:
pip install pandas
Note that both NumPy and Pandas can be installed 
only when Python is already installed on that system. 
The same is true for other libraries of Python.
n otes Chapter 2.indd   28 11/26/2020   12:32:46 PM
2024-25
Page 3


2.1 Introduct Ion to Python LIbrar Ies Python libraries contain a collection of built-
in modules that allow us to perform many 
actions without writing detailed programs 
for it. Each library in Python contains a large 
number of modules that one can import and 
use.  
NumPy, Pandas and Matplotlib are three 
well-established Python libraries for scientific 
and analytical use. These libraries allow us 
to manipulate, transform and visualise data 
easily and efficiently.  
NumPy, which stands for ‘Numerical 
Python’, is a library we discussed in class 
XI. Recall that, it is a package that can 
be used for numerical data analysis and 
“If you don't think carefully, you 
might believe that programming 
is just typing statements in a 
programming language.”
—  W. Cunningham
Chapter
 2
Data Handling Using 
Pandas - I
In this chapter
 » Introduction to 
Python Libraries
 » Series
 » DataFrame
 » Importing and 
Exporting Data 
between CSV Files 
and DataFrames
 » Pandas Series Vs 
NumPy ndarray
Chapter 2.indd   27 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 28
 
scientific computing. NumPy uses a multidimensional 
array object and has functions and tools for working 
with these arrays.  Elements of an array stay together in 
memory, hence, they can be quickly accessed.
PANDAS (PANel DAta) is a high-level data manipulation 
tool used for analysing data. It is very easy to import 
and export data using Pandas library which has a very 
rich set of functions. It is built on packages like NumPy 
and Matplotlib and gives us a single, convenient place 
to do most of our data analysis and visualisation work. 
Pandas has three important data structures, namely –  
Series, DataFrame and Panel to make the process of 
analysing data organised, effective and efficient.
The Matplotlib library in Python is used for plotting 
graphs and visualisation. Using Matplotlib, with just a 
few lines of code we can generate publication quality 
plots, histograms, bar charts, scatterplots, etc. It is 
also built on Numpy, and is designed to work well with 
Numpy and Pandas. 
You may think what the need for Pandas is when 
NumPy can be used for data analysis. Following are 
some of the differences between Pandas and Numpy:
1. A Numpy array requires homogeneous data, while 
a Pandas DataFrame can have different data types 
(float, int, string, datetime, etc.). 
2. Pandas have a simpler interface for operations like 
file loading, plotting, selection, joining, GROUP 
BY, which come very handy in data-processing 
applications.
3. Pandas DataFrames (with column names) make it 
very easy to keep track of data. 
4. Pandas is used when data is in Tabular Format, 
whereas Numpy is used for numeric array based 
data manipulation.
2.1.1. Installing Pandas
Installing Pandas is very similar to installing NumPy. To 
install Pandas from command line, we need to type in:
pip install pandas
Note that both NumPy and Pandas can be installed 
only when Python is already installed on that system. 
The same is true for other libraries of Python.
n otes Chapter 2.indd   28 11/26/2020   12:32:46 PM
2024-25
Data Han Dling Using Pan Das - i 29
2.1.2. Data Structure in Pandas
A data structure is a collection of data values and 
operations that can be applied to that data. It enables 
efficient storage, retrieval and modification to the data. 
For example, we have already worked with a data 
structure ndarray in NumPy in Class XI. Recall the ease 
with which we can store, access and update data using 
a NumPy array. Two commonly used data structures in 
Pandas that we will cover in this book are: 
• Series 
• DataFrame
2.2 s er Ies A Series is a one-dimensional array containing a 
sequence of values of any data type (int, float, list, 
string, etc) which by default have numeric data labels 
starting from zero. The data label associated with a 
particular value is called its index. We can also assign 
values of other data types as index. We can imagine a 
Pandas Series as a column in a spreadsheet. Example 
of a series containing names of students is given below:
Index Value
0  Arnab
1  Samridhi
2  Ramit
3  Divyam
4  Kritika
2.2.1 Creation of Series
There are different ways in which a series can be created 
in Pandas. To create or use series, we first need to import 
the Pandas library.
(A) Creation of Series from Scalar Values
A Series can be created using scalar values as shown in 
the example below:
>>> import pandas as pd   #import Pandas with alias pd
>>> series1 = pd.Series([10,20,30])  #create a Series
>>> print(series1)  #Display the series
Output:
0    10
1    20
2    30
dtype: int64
Chapter 2.indd   29 11/26/2020   12:32:46 PM
2024-25
Page 4


2.1 Introduct Ion to Python LIbrar Ies Python libraries contain a collection of built-
in modules that allow us to perform many 
actions without writing detailed programs 
for it. Each library in Python contains a large 
number of modules that one can import and 
use.  
NumPy, Pandas and Matplotlib are three 
well-established Python libraries for scientific 
and analytical use. These libraries allow us 
to manipulate, transform and visualise data 
easily and efficiently.  
NumPy, which stands for ‘Numerical 
Python’, is a library we discussed in class 
XI. Recall that, it is a package that can 
be used for numerical data analysis and 
“If you don't think carefully, you 
might believe that programming 
is just typing statements in a 
programming language.”
—  W. Cunningham
Chapter
 2
Data Handling Using 
Pandas - I
In this chapter
 » Introduction to 
Python Libraries
 » Series
 » DataFrame
 » Importing and 
Exporting Data 
between CSV Files 
and DataFrames
 » Pandas Series Vs 
NumPy ndarray
Chapter 2.indd   27 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 28
 
scientific computing. NumPy uses a multidimensional 
array object and has functions and tools for working 
with these arrays.  Elements of an array stay together in 
memory, hence, they can be quickly accessed.
PANDAS (PANel DAta) is a high-level data manipulation 
tool used for analysing data. It is very easy to import 
and export data using Pandas library which has a very 
rich set of functions. It is built on packages like NumPy 
and Matplotlib and gives us a single, convenient place 
to do most of our data analysis and visualisation work. 
Pandas has three important data structures, namely –  
Series, DataFrame and Panel to make the process of 
analysing data organised, effective and efficient.
The Matplotlib library in Python is used for plotting 
graphs and visualisation. Using Matplotlib, with just a 
few lines of code we can generate publication quality 
plots, histograms, bar charts, scatterplots, etc. It is 
also built on Numpy, and is designed to work well with 
Numpy and Pandas. 
You may think what the need for Pandas is when 
NumPy can be used for data analysis. Following are 
some of the differences between Pandas and Numpy:
1. A Numpy array requires homogeneous data, while 
a Pandas DataFrame can have different data types 
(float, int, string, datetime, etc.). 
2. Pandas have a simpler interface for operations like 
file loading, plotting, selection, joining, GROUP 
BY, which come very handy in data-processing 
applications.
3. Pandas DataFrames (with column names) make it 
very easy to keep track of data. 
4. Pandas is used when data is in Tabular Format, 
whereas Numpy is used for numeric array based 
data manipulation.
2.1.1. Installing Pandas
Installing Pandas is very similar to installing NumPy. To 
install Pandas from command line, we need to type in:
pip install pandas
Note that both NumPy and Pandas can be installed 
only when Python is already installed on that system. 
The same is true for other libraries of Python.
n otes Chapter 2.indd   28 11/26/2020   12:32:46 PM
2024-25
Data Han Dling Using Pan Das - i 29
2.1.2. Data Structure in Pandas
A data structure is a collection of data values and 
operations that can be applied to that data. It enables 
efficient storage, retrieval and modification to the data. 
For example, we have already worked with a data 
structure ndarray in NumPy in Class XI. Recall the ease 
with which we can store, access and update data using 
a NumPy array. Two commonly used data structures in 
Pandas that we will cover in this book are: 
• Series 
• DataFrame
2.2 s er Ies A Series is a one-dimensional array containing a 
sequence of values of any data type (int, float, list, 
string, etc) which by default have numeric data labels 
starting from zero. The data label associated with a 
particular value is called its index. We can also assign 
values of other data types as index. We can imagine a 
Pandas Series as a column in a spreadsheet. Example 
of a series containing names of students is given below:
Index Value
0  Arnab
1  Samridhi
2  Ramit
3  Divyam
4  Kritika
2.2.1 Creation of Series
There are different ways in which a series can be created 
in Pandas. To create or use series, we first need to import 
the Pandas library.
(A) Creation of Series from Scalar Values
A Series can be created using scalar values as shown in 
the example below:
>>> import pandas as pd   #import Pandas with alias pd
>>> series1 = pd.Series([10,20,30])  #create a Series
>>> print(series1)  #Display the series
Output:
0    10
1    20
2    30
dtype: int64
Chapter 2.indd   29 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 30
Observe that output is shown in two columns -  the 
index is on the left and the data value is on the right. If 
we do not explicitly specify an index for the data values 
while creating a series, then by default indices range 
from 0 through N –  1. Here N is the number of data 
elements.
We can also assign user-defined labels to the index 
and use them to access elements of a Series. The 
following example has a numeric index in random order. 
>>> series2 = pd.Series(["Kavi","Shyam","Ra
vi"], index=[3,5,1])
>>> print(series2)  #Display the series
Output:
3     Kavi
5    Shyam
1     Ravi
dtype: object
Here, data values Kavi, Shyam and Ravi have index 
values 3, 5 and 1, respectively. We can also use letters 
or strings as indices, for example:
>>> series2 = pd.Series([2,3,4],index=["Feb","M
ar","Apr"])
>>> print(series2) #Display the series
Output:
Feb    2
Mar    3
Apr    4
dtype: int64
  
Here, data values 2,3,4 have index values Feb, Mar 
and Apr, respectively.
(B) Creation of Series from NumPy Arrays
We can create a series from a one-dimensional (1D) 
NumPy array, as shown below: 
Activity 2.1
Create a series having 
names of any five 
famous monuments of 
India and assign their 
States as index values.
While importing 
Pandas, is it 
mandatory to always 
use pd as an alias 
name? What would 
happen if we give any 
other name?
Think and Reflect
>>> import numpy as np  # import NumPy with alias np
>>> import pandas as pd
>>> array1 = np.array([1,2,3,4])
>>> series3 = pd.Series(array1)
>>> print(series3)
Output:
0    1
1    2
2    3
3    4
dtype: int32
Chapter 2.indd   30 11/26/2020   12:32:47 PM
2024-25
Page 5


2.1 Introduct Ion to Python LIbrar Ies Python libraries contain a collection of built-
in modules that allow us to perform many 
actions without writing detailed programs 
for it. Each library in Python contains a large 
number of modules that one can import and 
use.  
NumPy, Pandas and Matplotlib are three 
well-established Python libraries for scientific 
and analytical use. These libraries allow us 
to manipulate, transform and visualise data 
easily and efficiently.  
NumPy, which stands for ‘Numerical 
Python’, is a library we discussed in class 
XI. Recall that, it is a package that can 
be used for numerical data analysis and 
“If you don't think carefully, you 
might believe that programming 
is just typing statements in a 
programming language.”
—  W. Cunningham
Chapter
 2
Data Handling Using 
Pandas - I
In this chapter
 » Introduction to 
Python Libraries
 » Series
 » DataFrame
 » Importing and 
Exporting Data 
between CSV Files 
and DataFrames
 » Pandas Series Vs 
NumPy ndarray
Chapter 2.indd   27 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 28
 
scientific computing. NumPy uses a multidimensional 
array object and has functions and tools for working 
with these arrays.  Elements of an array stay together in 
memory, hence, they can be quickly accessed.
PANDAS (PANel DAta) is a high-level data manipulation 
tool used for analysing data. It is very easy to import 
and export data using Pandas library which has a very 
rich set of functions. It is built on packages like NumPy 
and Matplotlib and gives us a single, convenient place 
to do most of our data analysis and visualisation work. 
Pandas has three important data structures, namely –  
Series, DataFrame and Panel to make the process of 
analysing data organised, effective and efficient.
The Matplotlib library in Python is used for plotting 
graphs and visualisation. Using Matplotlib, with just a 
few lines of code we can generate publication quality 
plots, histograms, bar charts, scatterplots, etc. It is 
also built on Numpy, and is designed to work well with 
Numpy and Pandas. 
You may think what the need for Pandas is when 
NumPy can be used for data analysis. Following are 
some of the differences between Pandas and Numpy:
1. A Numpy array requires homogeneous data, while 
a Pandas DataFrame can have different data types 
(float, int, string, datetime, etc.). 
2. Pandas have a simpler interface for operations like 
file loading, plotting, selection, joining, GROUP 
BY, which come very handy in data-processing 
applications.
3. Pandas DataFrames (with column names) make it 
very easy to keep track of data. 
4. Pandas is used when data is in Tabular Format, 
whereas Numpy is used for numeric array based 
data manipulation.
2.1.1. Installing Pandas
Installing Pandas is very similar to installing NumPy. To 
install Pandas from command line, we need to type in:
pip install pandas
Note that both NumPy and Pandas can be installed 
only when Python is already installed on that system. 
The same is true for other libraries of Python.
n otes Chapter 2.indd   28 11/26/2020   12:32:46 PM
2024-25
Data Han Dling Using Pan Das - i 29
2.1.2. Data Structure in Pandas
A data structure is a collection of data values and 
operations that can be applied to that data. It enables 
efficient storage, retrieval and modification to the data. 
For example, we have already worked with a data 
structure ndarray in NumPy in Class XI. Recall the ease 
with which we can store, access and update data using 
a NumPy array. Two commonly used data structures in 
Pandas that we will cover in this book are: 
• Series 
• DataFrame
2.2 s er Ies A Series is a one-dimensional array containing a 
sequence of values of any data type (int, float, list, 
string, etc) which by default have numeric data labels 
starting from zero. The data label associated with a 
particular value is called its index. We can also assign 
values of other data types as index. We can imagine a 
Pandas Series as a column in a spreadsheet. Example 
of a series containing names of students is given below:
Index Value
0  Arnab
1  Samridhi
2  Ramit
3  Divyam
4  Kritika
2.2.1 Creation of Series
There are different ways in which a series can be created 
in Pandas. To create or use series, we first need to import 
the Pandas library.
(A) Creation of Series from Scalar Values
A Series can be created using scalar values as shown in 
the example below:
>>> import pandas as pd   #import Pandas with alias pd
>>> series1 = pd.Series([10,20,30])  #create a Series
>>> print(series1)  #Display the series
Output:
0    10
1    20
2    30
dtype: int64
Chapter 2.indd   29 11/26/2020   12:32:46 PM
2024-25
Informat Ics Pract Ices 30
Observe that output is shown in two columns -  the 
index is on the left and the data value is on the right. If 
we do not explicitly specify an index for the data values 
while creating a series, then by default indices range 
from 0 through N –  1. Here N is the number of data 
elements.
We can also assign user-defined labels to the index 
and use them to access elements of a Series. The 
following example has a numeric index in random order. 
>>> series2 = pd.Series(["Kavi","Shyam","Ra
vi"], index=[3,5,1])
>>> print(series2)  #Display the series
Output:
3     Kavi
5    Shyam
1     Ravi
dtype: object
Here, data values Kavi, Shyam and Ravi have index 
values 3, 5 and 1, respectively. We can also use letters 
or strings as indices, for example:
>>> series2 = pd.Series([2,3,4],index=["Feb","M
ar","Apr"])
>>> print(series2) #Display the series
Output:
Feb    2
Mar    3
Apr    4
dtype: int64
  
Here, data values 2,3,4 have index values Feb, Mar 
and Apr, respectively.
(B) Creation of Series from NumPy Arrays
We can create a series from a one-dimensional (1D) 
NumPy array, as shown below: 
Activity 2.1
Create a series having 
names of any five 
famous monuments of 
India and assign their 
States as index values.
While importing 
Pandas, is it 
mandatory to always 
use pd as an alias 
name? What would 
happen if we give any 
other name?
Think and Reflect
>>> import numpy as np  # import NumPy with alias np
>>> import pandas as pd
>>> array1 = np.array([1,2,3,4])
>>> series3 = pd.Series(array1)
>>> print(series3)
Output:
0    1
1    2
2    3
3    4
dtype: int32
Chapter 2.indd   30 11/26/2020   12:32:47 PM
2024-25
Data Han Dling Using Pan Das - i 31
The following example shows that we can use letters 
or strings as indices:
>>> series4 = pd.Series(array1, index = ["Jan", 
"Feb", "Mar", "Apr"])
>>> print(series4)
Jan    1
Feb    2
Mar    3
Apr    4
dtype: int32
When index labels are passed with the array, then 
the length of the index and array must be of the same 
size, else it will result in a ValueError. In the example 
shown below, array1 contains 4 values whereas there 
are only 3 indices, hence ValueError is displayed.
>>> series5 = pd.Series(array1, index = ["Jan", 
"Feb", "Mar"])
ValueError: Length of passed values is 4, index 
implies 3
(C) Creation of Series from Dictionary
Recall that Python dictionary has key: value pairs and 
a value can be quickly retrieved when its key is known. 
Dictionary keys can be used to construct an index for a 
Series, as shown in the following example. Here, keys of 
the dictionary dict1 become indices in the series. 
>>> dict1 = {'India': 'NewDelhi', 'UK': 
'London', 'Japan': 'Tokyo'}
>>> print(dict1)  #Display the dictionary
{'India': 'NewDelhi', 'UK': 'London', 'Japan': 
'Tokyo'}
>>> series8 = pd.Series(dict1) 
>>> print(series8)  #Display the series
India    NewDelhi
UK         London
Japan       Tokyo
dtype: object
2.2.2 Accessing Elements of a Series
There are two common ways for accessing the elements 
of a series: Indexing and Slicing.
(A) Indexing
Indexing in Series is similar to that for NumPy arrays, 
and is used to access elements in a series. Indexes 
are of two types: positional index and labelled index. 
Positional index takes an integer value that corresponds 
to its position in the series starting from 0, whereas 
labelled index takes any user-defined label as index.
n otes Chapter 2.indd   31 11/26/2020   12:32:47 PM
2024-25
Read More
14 docs

FAQs on NCERT Textbook: Data Handling using Pandas - I - Informatics Practices for Class 12 - Humanities/Arts

1. What is data handling in the context of Pandas?
Ans. Data handling in the context of Pandas refers to the process of organizing, analyzing, and manipulating data using the Pandas library in Python. This includes tasks like data cleaning, transformation, aggregation, and visualization, enabling users to derive insights from complex datasets efficiently.
2. How do you import data into a Pandas DataFrame?
Ans. You can import data into a Pandas DataFrame using various methods. The most common method is to use the `read_csv()` function to read data from a CSV file. For example, `df = pd.read_csv('file_path.csv')` will create a DataFrame `df` from the specified CSV file. Other methods include reading from Excel files with `read_excel()` or from SQL databases using `read_sql()`.
3. What are some common functions used for data analysis in Pandas?
Ans. Some common functions used for data analysis in Pandas include `describe()`, which provides summary statistics of the DataFrame, `groupby()`, which allows for aggregation based on specific categories, and `pivot_table()`, which creates a spreadsheet-style pivot table for data summarization. Additionally, `loc[]` and `iloc[]` are used for indexing and selecting data.
4. How can you handle missing data in a Pandas DataFrame?
Ans. You can handle missing data in a Pandas DataFrame using methods like `dropna()`, which removes rows or columns with missing values, and `fillna()`, which fills in missing values with a specified value or method (like forward or backward filling). It is important to choose the method based on the nature of your data and the analysis you are performing.
5. What is the significance of data visualization in data handling using Pandas?
Ans. Data visualization is significant in data handling as it helps to convey complex data insights in an understandable format. Using libraries like Matplotlib or Seaborn alongside Pandas, you can create graphs and charts to visually represent data trends, distributions, and relationships, making it easier to interpret the results and communicate findings effectively.
Related Searches

study material

,

NCERT Textbook: Data Handling using Pandas - I | Informatics Practices for Class 12 - Humanities/Arts

,

Important questions

,

Sample Paper

,

Previous Year Questions with Solutions

,

video lectures

,

Exam

,

Semester Notes

,

Viva Questions

,

pdf

,

Objective type Questions

,

NCERT Textbook: Data Handling using Pandas - I | Informatics Practices for Class 12 - Humanities/Arts

,

MCQs

,

shortcuts and tricks

,

ppt

,

NCERT Textbook: Data Handling using Pandas - I | Informatics Practices for Class 12 - Humanities/Arts

,

Free

,

mock tests for examination

,

Summary

,

practice quizzes

,

past year papers

,

Extra Questions

;