Table of contents |
|
Introduction to Python Libraries |
|
Series |
|
DataFrame |
|
Importing and Exporting Data between CSV Files and DataFrames |
|
Pandas Series Vs NumPy ndarray |
|
pip install pandas
Index Value
0 Anab
1 Sanridhi
2 Rani t
3 Di vyam
4 Kritika
import pandas as pd
series1 = pd.Series([10, 20, 30])
print(series1)
# Output:
# 0 10
# 1 20
# 2 30
# dtype: int64
series2 = pd.Series(["Kavi", "Shyam", "Ravi"], index=[3, 5, 1])
print(series2)
# Output:
# 3 Kavi
# 5 Shyam
# 1 Ravi
# dtype: object
series2 = pd.Series([2, 3, 4], index=["Feb", "Mar", "Apr"])
print(series2)
# Output:
# Feb 2
# Mar 3
# Apr 4
# dtype: int64
import numpy as np
import pandas as pd
array1 = np.array([1, 2, 3, 4])
series3 = pd.Series(array1)
print(series3)
# Output:
# 0 1
# 1 2
# 2 3
# 3 4
# dtype: int32
series4 = pd.Series(array1, index=["Jan", "Feb", "Mar", "Apr"])
print(series4)
# Output:
# Jan 1
# Feb 2
# Mar 3
# Apr 4
# dtype: int32
series5 = pd.Series(array1, index=["Jan", "Feb", "Mar"])
# Output: ValueError: Length of passed values is 4, index implies 3
dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'}
series8 = pd.Series(dict1)
print(series8)
# Output:
# India NewDelhi
# UK London
# Japan Tokyo
# dtype: object
seriesNum = pd.Series([10, 20, 30])
seriesNum[2]
# Output: 30
seriesMonths = pd.Series([2, 3, 4], index=["Feb", "Mar", "Apr"])
seriesMonths["Mar"]
# Output: 3
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
seriesCapCntry['India']
# Output: 'NewDelhi'
seriesCapCntry[1]
# Output: 'WashingtonDC'
seriesCapCntry[[3, 2]]
# Output:
# France Paris
# UK London
# dtype: object
seriesCapCntry[['UK', 'USA']]
# Output:
# UK London
# USA WashingtonDC
# dtype: object
seriesCapCntry.index = [10, 20, 30, 40]
seriesCapCntry
# Output:
# 10 NewDelhi
# 20 WashingtonDC
# 30 London
# 40 Paris
# dtype: object
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
seriesCapCntry[1:3]
# Output:
# USA WashingtonDC
# UK London
# dtype: object
seriesCapCntry['USA':'France']
# Output:
# USA WashingtonDC
# UK London
# France Paris
# dtype: object
seriesCapCntry[::-1]
# Output:
# France Paris
# UK London
# USA WashingtonDC
# India NewDelhi
# dtype: object
import numpy as np
seriesAlph = pd.Series(np.arange(10, 16, 1),
index=['a', 'b', 'c', 'd', 'e', 'f'])
seriesAlph[1:3] = 50
seriesAlph
# Output:
# a 10
# b 50
# c 50
# d 13
# e 14
# f 15
# dtype: int32
seriesAlph['c':'e'] = 500
seriesAlph
# Output:
# a 10
# b 50
# c 500
# d 500
# e 500
# f 15
# dtype: int32
Series attributes are properties accessed using the Series name.
Common attributes include:
seriesCapCntry
# Output:
# India NewDelhi
# USA WashingtonDC
# UK London
# France Paris
# dtype: object
seriesTenTwenty.head(2)
# Output:
# 0 10
# 1 20
# dtype: int32
seriesTenTwenty.head()
# Output:
# 0 10
# 1 20
# 2 30
# 3 40
# 4 50
# dtype: int32
seriesTenTwenty.count()
# Output: 10
seriesTenTwenty.tail(2)
# Output:
# 8 90
# 9 100
# dtype: int32
seriesTenTwenty.tail()
# Output:
# 5 60
# 6 70
# 7 80
# 8 90
# 9 100
# dtype: int32
ResultDF
# Output:
# Arnab Ramit Sanridhi Riya Milika
# Maths 90 92 89 81 94
# Science 91 81 91 71 95
# Hindi 97 96 88 67 99
# English 95 86 95 80 95
DataFrames can be created from dictionaries where keys become column labels and values are lists or Series.
Example:
data = {
'Arnab': [90, 91, 97, 95],
'Ramit': [92, 81, 96, 86],
'Sanridhi': [89, 91, 88, 95],
'Riya': [81, 71, 67, 80],
'Milika': [94, 95, 99, 95]
}
ResultDF = pd.DataFrame(data, index=['Maths', 'Science', 'Hindi', 'English'])
ResultDF.loc['Maths'] = [90, 92, 89, 81, 94]
ResultDF['Preeti'] = [89, 78, 76, 99]
ResultDF.loc['Maths'] = 0
# Output:
# Arnab Ramit Sanridhi Riya Milika Preeti
# Maths 0 0 0 0 0 0
# Science 91 81 91 71 95 78
# Hindi 97 96 88 67 99 76
# English 95 86 95 80 95 99
ResultDF[:] = 0
ResultDF = ResultDF.drop('Science', axis=0)
# Output:
# Arnab Ramit Sanridhi Riya Milika
# Maths 90 92 89 81 94
# Hindi 97 96 88 67 99
# English 95 86 95 80 95
ResultDF = ResultDF.drop(['Sanridhi', 'Ramit', 'Riya'], axis=1)
# Output:
# Arnab Milika
# Maths 90 94
# Hindi 97 99
# English 95 95
ResultDF = ResultDF.drop('Hindi', axis=0)
ResultDF.rename(columns={'Arnab': 'Student1', 'Ramit': 'Student2', 'Sanridhi': 'Student3', 'Milika': 'Student4'})
DataFrame elements can be accessed using label-based or boolean indexing.
Label-Based Indexing:
ResultDF.loc['Science']
# Output:
# Arnab 91
# Ramit 81
# Sanridhi 91
# Riya 71
# Milika 95
# Name: Science, dtype: int64
ResultDF.loc[:, 'Arnab']
# Output:
# Maths 90
# Science 91
# Hindi 97
# Name: Arnab, dtype: int64
ResultDF.loc[['Science', 'Hindi']]
# Output:
# Arnab Ramit Sanridhi Riya Milika
# Science 91 81 91 71 95
# Hindi 97 96 88 67 99
ResultDF.loc['Maths'] > 90
# Output:
# Arnab False
# Ramit True
# Sanridhi False
# Riya False
# Milika True
# Name: Maths, dtype: bool
ResultDF.loc[:, 'Arnab'] > 90
# Output:
# Maths False
# Science True
# Hindi True
# Name: Arnab, dtype: bool
ResultDF.loc['Maths':'Hindi']
# Output:
# Arnab Ramit Sanridhi Riya Milika
# Maths 90 92 89 81 94
# Science 91 81 91 71 95
# Hindi 97 96 88 67 99
dFrame1 = pd.DataFrame([[1, 2, 3], [4, 5], [6]], columns=['C1', 'C2', 'C3'], index=['R1', 'R2', 'R3'])
dFrame2 = pd.DataFrame([[10, 20], [30], [40, 50]], columns=['C2', 'C5'], index=['R4', 'R2', 'R5'])
dFrame1 = dFrame1.append(dFrame2)
# Output:
# C1 C2 C3 C5
# R1 1.0 2.0 3.0 NaN
# R2 4.0 5.0 NaN NaN
# R3 6.0 NaN NaN NaN
# R4 NaN 10.0 NaN 20.0
# R2 NaN 30.0 NaN NaN
# R5 NaN 40.0 NaN 50.0
ForestArea = {
'Assam': pd.Series([78438, 2797, 10192, 15116], index=['GeoArea', 'VeryDense', 'ModeratelyDense', 'OpenForest']),
'Kerala': pd.Series([38852, 1663, 9407, 9251], index=['GeoArea', 'VeryDense', 'ModeratelyDense', 'OpenForest']),
'Delhi': pd.Series([1483, 6.72, 56.24, 129.45], index=['GeoArea', 'VeryDense', 'ModeratelyDense', 'OpenForest'])
}
ForestAreaDF = pd.DataFrame(ForestArea)
# Output:
# Assam Kerala Delhi
# GeoArea 78438 38852 1483.00
# VeryDense 2797 1663 6.72
# ModeratelyDense 10192 9407 56.24
# OpenForest 15116 9251 129.45
Pandas provides functions to import data from and export data to CSV files.
marks = pd.read_csv("C:/NCERT/ResultData.csv", sep=",", header=0)
# Output:
# RollNo Name Eco Maths
# 0 1 Arnab 18 57
# 1 2 Kritika 23 45
# 2 3 Divyam 51 37
# 3 4 Vivaan 40 60
# 4 5 Aaroosh 18 27
marks1 = pd.read_csv("C:/NCERT/ResultData.csv", sep=",", names=['RNo', 'Student Name', 'Sub1', 'Sub2'])
ResultDF.to_csv('C:/NCERT/resultout.csv')
ResultDF.to_csv('C:/NCERT/resultonly.txt', sep='@', header=False, index=False)
# Output in resultonly.txt:
# 90@92@89@81@94
# 91@81@91@71@95
# 97@96@88@67@99
1. What is a Pandas Series and how do you create one? | ![]() |
2. How do you create a DataFrame in Pandas? | ![]() |
3. What operations can be performed on rows and columns in DataFrames? | ![]() |
4. How can you access DataFrame elements through slicing? | ![]() |
5. How do you export a DataFrame to a CSV file? | ![]() |