Open App

Humanities/Arts Exam > Humanities/Arts Notes > Informatics Practices for Class 12 > Chapter Notes: Data Handling using Pandas - II

Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts PDF Download

Table of contents
Introduction
Descriptive Statistics
Data Aggregations
Sorting a DataFrame
GROUP BY Functions
Altering the Index
Other DataFrame Operations
Handling Missing Values
Import and Export of Data between Pandas and MySQL

Chapter Notes - Data Handling using Pandas - II

Introduction

Pandas is a well-established Python library used for manipulation, processing, and analysis of data.
Basic operations on Series and DataFrame, such as creating and accessing data, were discussed in the previous chapter.
Pandas provides powerful and useful functions for advanced data analysis.
This chapter focuses on advanced DataFrame features, including sorting data, answering analytical questions, cleaning data, and applying various functions.
The chapter uses example data of marks scored by students in unit tests to demonstrate these features.
Topics covered include descriptive statistics, data aggregations, sorting, GROUP BY functions, altering the index, other DataFrame operations, handling missing values, and importing/exporting data between Pandas and MySQL.

Descriptive Statistics

Descriptive statistics are used to quantitatively summarize data, providing a basic understanding of the dataset.
Statistical methods applicable to a DataFrame include max, min, count, sum, mean, median, mode, quartiles, and variance.
These methods are applied to the example DataFrame containing student marks.

Calculating Maximum Values

The DataFrame.max() function calculates the maximum values from the DataFrame, regardless of data types.
It returns the maximum value for each column by default.
To focus on numeric columns only, set the parameter numeric_only=True in the max() method.
For row-wise maximum values, use max(axis=1), which returns the maximum value for each row across specified columns.
Note that in max(), axis=0 (default) operates column-wise, while axis=1 operates row-wise, unlike most Python functions where axis=0 is row-wise.

Calculating Minimum Values

The DataFrame.min() function displays the minimum values from the DataFrame, regardless of data types.
It returns the minimum value for each column by default.
To calculate minimum values for specific rows or students, filter the DataFrame using conditions (e.g., df.loc[df.Name == 'Mishti']) and apply min() to selected columns.
Like max(), min() uses axis=0 for column-wise operations and axis=1 for row-wise operations.

Calculating Sum of Values

The DataFrame.sum() function calculates the sum of values in the DataFrame, regardless of data type.
It returns the sum for each column by default, but summing text values (e.g., names) may not be meaningful.
To sum values for a specific column, specify the column name (e.g., df['Maths'].sum()).
To calculate the total marks for a specific student, filter the DataFrame by student name and apply sum() to relevant columns.
For row-wise sums (e.g., total marks per unit test for a student), use sum(axis=1).

Calculating Number of Value

The DataFrame.count() function returns the total number of non-null values for each column or row.
By default, it counts values column-wise (axis=0).
To count values row-wise, use count(axis=1).
This function is useful for understanding the completeness of data in each column or row.

Data Aggregations

Data aggregation involves applying functions like sum, mean, or count to grouped data to summarize it.
Pandas supports aggregation through functions like groupby() combined with aggregate functions.
Aggregation is useful for deriving insights, such as average marks per student or total marks per subject.

Sorting a DataFrame

Sorting arranges data in a specified order, either ascending or descending.
The DataFrame.sort_values() function is used to sort a DataFrame by one or more columns.
Sorting can be applied to numeric or categorical columns, with options to specify ascending or descending order.

GROUP BY Functions

The groupby() function groups data based on one or more columns, allowing aggregation operations on the grouped data.
It is used to perform operations like sum, mean, or count on groups (e.g., average marks per student or per unit test).
Grouping is a powerful tool for summarizing data and answering analytical questions.

Altering the Index

Altering the index refers to changing the labels of rows or columns in a DataFrame.
The reset_index() function resets the index to default integer values, moving the current index to a column.
The set_index() function sets a specified column as the new index.
These functions are useful for reorganizing data or preparing it for specific analyses.

Other DataFrame Operations

Pandas provides additional operations like filtering, merging, joining, and concatenating DataFrames.
Filtering allows selecting rows based on conditions (e.g., df[df['Name'] == 'Raman']).
Merging and joining combine multiple DataFrames based on common columns or indices.
Concatenation stacks DataFrames vertically or horizontally.

Handling Missing Values

Missing values (represented as NaN in Pandas) are a common issue in data analysis and must be handled properly.
Two primary strategies for handling missing values are:
- Dropping the rows or columns containing missing values.
- Filling or estimating missing values with appropriate values.
Missing values can arise in datasets, such as when a student misses a test (e.g., Raman’s missing marks in Unit Test 4).

Checking Missing Values

The isnull() function checks for missing values in a DataFrame, returning True for missing values and False otherwise.
It can be applied to the entire DataFrame or specific columns (e.g., df['Science'].isnull()).
The any() function, used with isnull(), checks if any missing values exist in a column or the entire DataFrame.
The isnull().sum() function counts the number of missing values per column.
The isnull().sum().sum() function calculates the total number of missing values in the DataFrame.

Dropping Missing Values

The dropna() function removes rows or columns containing missing values.
By default, it drops rows with any missing values (how='any').
Setting inplace=True modifies the original DataFrame; otherwise, a new DataFrame is returned.
Dropping is suitable when missing values are few, as it reduces the dataset size.
Example: Dropping Raman’s Unit Test 4 row removes the row with missing values, affecting percentage calculations.

Estimating Missing Values

Estimating missing values involves replacing them with approximations, such as the previous value, next value, mean, or a constant (e.g., 0).
The fillna(num) function replaces missing values with a specified value (e.g., fillna(0) replaces NaN with 0).
The fillna(method='pad') function replaces missing values with the previous value in the column.
The fillna(method='bfill') function replaces missing values with the next value in the column.
Estimating missing values alters the dataset and affects analysis results, providing an approximation rather than exact values.

Import and Export of Data between Pandas and MySQL

In real-world scenarios, data is often stored in files (e.g., CSV) or databases, requiring import to Pandas DataFrames or export from DataFrames to databases.
Pandas supports importing data from MySQL databases and exporting DataFrames to MySQL tables.
A connection to MySQL is established using the pymysql driver and sqlalchemy library.
Install pymysql using pip install pymysql and sqlalchemy using pip install sqlalchemy.
The create_engine() function establishes a connection to MySQL using a connection string with parameters: driver, username, password, host, port, and database name.
Syntax: engine = create_engine('mysql+pymysql://username:password@host:port/database').
Example connection string: mysql+pymysql://root:smsm@localhost:3306/CARSHOWROOM.

Importing Data from MySQL to Pandas

Importing data involves reading a MySQL table or query result into a Pandas DataFrame.
Three functions are used for importing:

pandas.read_sql_query(query, sql_conn): Reads an SQL query into a DataFrame using the connection identifier.
pandas.read_sql_table(table_name, sql_conn): Reads an entire SQL table into a DataFrame.
pandas.read_sql(sql, sql_conn): Reads either an SQL query or table into a DataFrame.

Example: df = pd.read_sql_query('SELECT * FROM INVENTORY', engine) loads the INVENTORY table into a DataFrame.

Exporting Data from Pandas to MySQL

Exporting data involves writing a Pandas DataFrame to a MySQL table.
The DataFrame.to_sql() function is used: df.to_sql(table, sql_conn, if_exists='fail', index=False).
Parameters:

table: Name of the MySQL table to write to.
sql_conn: Connection identifier from create_engine().
if_exists: Specifies behavior if the table exists:
- 'fail': Raises a ValueError if the table exists (default).
- 'replace': Replaces the table’s contents with the DataFrame.
- 'append': Appends the DataFrame to the existing table (column names must match).
index: If True, includes the DataFrame index as a column; if False, ignores the index.

Example: df.to_sql('showroom_info', engine, if_exists='replace', index=False) creates or replaces the showroom_info table with the DataFrame’s contents.

The document Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts is a part of the Humanities/Arts Course Informatics Practices for Class 12.

All you need of Humanities/Arts at this link: Humanities/Arts

	Informatics Practices for Class 12 14 docs

Informatics Practices for Class 12

14 docs

Join Course for Free

FAQs on Data Handling using Pandas - II Chapter Notes - Informatics Practices for Class 12 - Humanities/Arts

1. What is the process to calculate the number of values in a Pandas DataFrame?

Ans. To calculate the number of values in a Pandas DataFrame, you can use the `.count()` method, which returns the count of non-null entries for each column. If you want to get the total number of entries in the DataFrame, you can use the `.shape` attribute, which provides a tuple of the number of rows and columns.

2. How can I calculate the mean of a column in a Pandas DataFrame?

Ans. You can calculate the mean of a specific column in a Pandas DataFrame using the `.mean()` method. For example, if your DataFrame is named `df` and you want to calculate the mean of a column named `column_name`, you would use `df['column_name'].mean()`.

3. What are quartiles and how do I calculate them using Pandas?

Ans. Quartiles are values that divide a dataset into four equal parts. You can calculate quartiles in Pandas using the `.quantile()` method. For example, to find the first quartile (25th percentile), you would use `df['column_name'].quantile(0.25)`.

4. What is the method to calculate the standard deviation of a DataFrame in Pandas?

Ans. You can calculate the standard deviation of a DataFrame using the `.std()` method. For example, to calculate the standard deviation of a column named `column_name`, you would use `df['column_name'].std()`. This method computes the standard deviation of the values, excluding any missing values.

5. How can I check for missing values in a Pandas DataFrame?

Ans. To check for missing values in a Pandas DataFrame, you can use the `.isnull()` method combined with `.sum()`. For example, `df.isnull().sum()` will return the count of missing values for each column in the DataFrame. This allows you to identify which columns have missing data.

About this Document

4.68/5 Rating

Oct 20, 2025 Last updated

Related Exams

Humanities/Arts

Document Description: Chapter Notes: Data Handling using Pandas - II for Humanities/Arts 2025 is part of Informatics Practices for Class 12 preparation. The notes and questions for Chapter Notes: Data Handling using Pandas - II have been prepared according to the Humanities/Arts exam syllabus. Information about Chapter Notes: Data Handling using Pandas - II covers topics like Introduction, Descriptive Statistics, Data Aggregations, Sorting a DataFrame, GROUP BY Functions, Altering the Index, Other DataFrame Operations, Handling Missing Values, Import and Export of Data between Pandas and MySQL and Chapter Notes: Data Handling using Pandas - II Example, for Humanities/Arts 2025 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Chapter Notes: Data Handling using Pandas - II.

Introduction of Chapter Notes: Data Handling using Pandas - II in English is available as part of our Informatics Practices for Class 12 for Humanities/Arts & Chapter Notes: Data Handling using Pandas - II in Hindi for Informatics Practices for Class 12 course. Download more important topics related with notes, lectures and mock test series for Humanities/Arts Exam by signing up for free. Humanities/Arts: Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts

Description

Full syllabus notes, lecture & questions for Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts - Humanities/Arts | Plus excerises question with solution to help you revise complete syllabus for Informatics Practices for Class 12 | Best notes, free PDF download

Information about Chapter Notes: Data Handling using Pandas - II

In this doc you can find the meaning of Chapter Notes: Data Handling using Pandas - II defined & explained in the simplest way possible. Besides explaining types of Chapter Notes: Data Handling using Pandas - II theory, EduRev gives you an ample number of questions to practice Chapter Notes: Data Handling using Pandas - II tests, examples and also practice Humanities/Arts tests

	Informatics Practices for Class 12 14 docs

Informatics Practices for Class 12

14 docs

Join Course for Free

Download as PDF

Explore Courses for Humanities/Arts exam

mock tests for examination

Semester Notes

Viva Questions

Objective type Questions

Important questions

MCQs

Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts

Sample Paper

practice quizzes

ppt

pdf

video lectures

Summary

Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts

Exam

Free

shortcuts and tricks

past year papers

study material

Data Handling using Pandas - II Chapter Notes | Informatics Practices for Class 12 - Humanities/Arts

Previous Year Questions with Solutions

Extra Questions

;

Additional Information about Chapter Notes: Data Handling using Pandas - II for Humanities/Arts Preparation

Chapter Notes: Data Handling using Pandas - II Free PDF Download

The Chapter Notes: Data Handling using Pandas - II is an invaluable resource that delves deep into the core of the Humanities/Arts exam. These study notes are curated by experts and cover all the essential topics and concepts, making your preparation more efficient and effective. With the help of these notes, you can grasp complex subjects quickly, revise important points easily, and reinforce your understanding of key concepts. The study notes are presented in a concise and easy-to-understand manner, allowing you to optimize your learning process. Whether you're looking for best-recommended books, sample papers, study material, or toppers' notes, this PDF has got you covered. Download the Chapter Notes: Data Handling using Pandas - II now and kickstart your journey towards success in the Humanities/Arts exam.

Importance of Chapter Notes: Data Handling using Pandas - II

The importance of Chapter Notes: Data Handling using Pandas - II cannot be overstated, especially for Humanities/Arts aspirants. This document holds the key to success in the Humanities/Arts exam. It offers a detailed understanding of the concept, providing invaluable insights into the topic. By knowing the concepts well in advance, students can plan their preparation effectively. Utilize this indispensable guide for a well-rounded preparation and achieve your desired results.

Chapter Notes: Data Handling using Pandas - II

Chapter Notes: Data Handling using Pandas - II Notes offer in-depth insights into the specific topic to help you master it with ease. This comprehensive document covers all aspects related to Chapter Notes: Data Handling using Pandas - II. It includes detailed information about the exam syllabus, recommended books, and study materials for a well-rounded preparation. Practice papers and question papers enable you to assess your progress effectively. Additionally, the paper analysis provides valuable tips for tackling the exam strategically. Access to Toppers' notes gives you an edge in understanding complex concepts. Whether you're a beginner or aiming for advanced proficiency, Chapter Notes: Data Handling using Pandas - II Notes on EduRev are your ultimate resource for success.

Chapter Notes: Data Handling using Pandas - II Humanities/Arts Questions

The "Chapter Notes: Data Handling using Pandas - II Humanities/Arts Questions" guide is a valuable resource for all aspiring students preparing for the Humanities/Arts exam. It focuses on providing a wide range of practice questions to help students gauge their understanding of the exam topics. These questions cover the entire syllabus, ensuring comprehensive preparation. The guide includes previous years' question papers for students to familiarize themselves with the exam's format and difficulty level. Additionally, it offers subject-specific question banks, allowing students to focus on weak areas and improve their performance.

Study Chapter Notes: Data Handling using Pandas - II on the App

Students of Humanities/Arts can study Chapter Notes: Data Handling using Pandas - II alongwith tests & analysis from the EduRev app, which will help them while preparing for their exam. Apart from the Chapter Notes: Data Handling using Pandas - II, students can also utilize the EduRev App for other study materials such as previous year question papers, syllabus, important questions, etc. The EduRev App will make your learning easier as you can access it from anywhere you want. The content of Chapter Notes: Data Handling using Pandas - II is prepared as per the latest Humanities/Arts syllabus.

Education Revolution

Signup to see your scores go up
within 7 days!

Continue with Google

Takes less than 10 seconds to signup